Definition, Goal of Data Engineering Transaction Concept and Main Issues

Expertise and role of Data Scientist and Data Engineer
Definition , goal of Data engineering

Transaction concept and main issues
Four schedules on transaction to transfer EUR 50 from A to B. You should be
able to explain the process. Choose the best schedules and explain your
reason.
OLTP and OLAP
ACID Requirements
Distributed Database. Definition and Advantages
Cube Diagram in multidimensional data. ( Sales volume as function of product, month
and region. You should be able to explain about multidimensional data.
Possible dimention on data retrieving.
Expertise and role of Data Scientist and Data Engineer
Data scientists spending time and brainpower on applying data science and
analytic results to critical business issues - helping an organization turn data
into information - information into knowledge and insights - and valuable,
actionable insights into better decision making and game changing
strategies.
Data engineers are the designers, builders and managers of the information
or "big data" infrastructure. They develop the architecture that helps analyse
and process data in the way the organization needs it. And they make sure
those systems are performing smoothly.
Data Engineering Goal : The goal is to use the available data or generate
more data, and to thereby understand the process being investigated.
Transaction Concept : A transaction is a unit of program execution that

accesses and possibly updates various data items.
Two main issues to deal with:

1. Failures of various kinds, such as hardware failures and system crashes
2. Concurrent execution of multiple transactions
OLTP vs OLAP
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking, manufacturing,

payroll, registration, accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Visual on OLAP
ACID Requirements
1. Atomicity Either all operations of the transaction are properly
reflected in the database or none are.
2. Consistency Execution of a (single) transaction preserves the
consistency of the database.
3. Isolation Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from
other concurrently executed transactions.
4. Durability. After a transaction completes successfully, the changes it

has made to the database persist, even if there are system failures.
Distributed Database. Definition and Advantages

A distributed database (DDB) is a collection of multiple logically related
database distributed over a computer network, and a distributed
database management system as a software system that manages a
distributed database while making the distribution transparent to the
user.
Advantages
Management of distributed data with different levels of
transparency
The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally and stored with possible replication
Users do not have to worry about operational details of the network
Replication transparency:
It allows to store copies of a data at multiple sites as shown
in the above diagram.

This is done to minimize access time to the required data.
Fragmentation transparency:
Allows to fragment a relation horizontally (create a subset

of tuples of a relation) or vertically (create a subset of
columns of a relation).
Increased reliability and availability:

Improved performance:
A distributed DBMS fragments the database to keep data
closer to where it is needed most.

This reduces data management (access and modification)
time significantly.
Easier expansion (scalability):
Allows new nodes (computers) to be added anytime

without chaining the entire configuration.
1) In a distributed database, data can be stored in different systems like

personal computers, servers, mainframes, etc.
2) A user doesnt know where the data is located physically. Database
presents the data to the user as if it were located locally.
3) Database can be accessed over different networks.
4) Data can be joined and updated from different tables which are located on
different machines.
5) Even if a system fails the integrity of the distributed database is
maintained.
6) A distributed database is secure.

Definition, Goal of Data Engineering Transaction Concept and Main Issues

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Definition, Goal of Data Engineering Transaction Concept and Main Issues

Uploaded by

Copyright:

Available Formats

Expertise and role of Data Scientist and Data Engineer

Definition , goal of Data engineering

Expertise and role of Data Scientist and Data Engineer

Transaction Concept : A transaction is a unit of program execution that

Two main issues to deal with:

OLTP (on-line transaction processing)

Major task of traditional relational DBMS

Day-to-day operations: purchasing, inventory, banking, manufacturing,

OLAP (on-line analytical processing)

Major task of data warehouse system

Data analysis and decision making

4. Durability. After a transaction completes successfully, the changes it

Distributed Database. Definition and Advantages

It allows to store copies of a data at multiple sites as shown

in the above diagram.

Allows to fragment a relation horizontally (create a subset

Increased reliability and availability:

A distributed DBMS fragments the database to keep data

closer to where it is needed most.

Easier expansion (scalability):

Allows new nodes (computers) to be added anytime

1) In a distributed database, data can be stored in different systems like

You might also like