You are on page 1of 26

Overview of Distributed Databases

Topics
Concepts Benefits & Problems Architectures

Why distributed
Corresponds to the organizational structure of distributed enterprises
Electronic commerce Manufacturing control systems

Divide and Conquer Readings


Lecture Notes Oszu & Valduriez Selected sections of chapters 1 & 4
CS544 Module 1 Shazia Sadiq ITEE/UQ 1

Work towards a common goal

Economic Benefits
More computing power More discipline in smaller but autonomous groups
CS544 Module 1 Shazia Sadiq ITEE/UQ 2

Why distributed
To solve complex problems by dividing into smaller pieces, and assigning them to different software groups, which work on different computers To produce a system which, although runs on multiple processing elements, can work effectively towards a common goal

Distributed Database Systems


Union of two opposed technologies
Database Technology Network Technology

Centralize

Distribute

CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

Concepts
Distributed Computing Systems
A number of autonomous processing elements, not necessarily homogeneous, that are interconnected by a computer network and that cooperate in performing their assigned tasks

What is being distributed ?


Processing logic
Inventory Personnel Sales

Distributed Database
A collection of multiple, logically interrelated databases, distributed over a computer network

Function
Printing Email

Distributed Database Management System


A software that manages a distributed database, while making the distribution transparent to the user

Data
5 CS544 Module 1 Shazia Sadiq ITEE/UQ 6

CS544 Module 1 Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Misconceptions
Distributed Computing is not ...
Program 1

Centralized Database

Multi-processor system Backend processor Computer network Parallel Computing

Program 2 Program 3

Database

Site 1 Centralized Database

CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

Networked Architecture
Database 3
Program 1 Program 2 Program 3

Distributed database architecture


Database 3
Program 1 Program 2

Database

Database

Database 2

Site 1

Site 2

Program 3

Database 2

Site 1

Site 2

Database

Communications Network

Communications Network

Site 4

Site 3

Database

Site 4

Site 3

Database

Networked Architecture with Centralized Database


CS544 Module 1 Shazia Sadiq ITEE/UQ 9 CS544 Module 1 Shazia Sadiq ITEE/UQ

Distributed Database Architecture


10

Benefits of DDBs
Transparency Reliability Performance Expansion

Transparency
Separation of high-level semantics from lowlevel implementation issues Extension of the Data Independence concept in centralized databases Basic Concepts
Sydney Perth

Fragmentation Replication

Sydney Data PerthData Communications Network Darwin Data


Darwin

PerthData

Sydney Data BrisbaneData


Brisbane

A Distributed Application

CS544 Module 1 Shazia Sadiq ITEE/UQ

11

CS544 Module 1 Shazia Sadiq ITEE/UQ

12

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Transparency Types in DDBMSs


Network Replication Fragmentation
Language Transparency Fragmentation Transparency Replication Transparency Network Transparency Data Independence Data

Network Transparency
Protect the user from the operational details of the network. Users do not have to specify where the data is located
Location Transparency Naming Transparency

Layers of Transparency

CS544 Module 1 Shazia Sadiq ITEE/UQ

13

CS544 Module 1 Shazia Sadiq ITEE/UQ

14

Replication Transparency
Replicas (copies of data) are created for performance and reliability reasons Users should be made unaware of the existence of these copies Replication causes update problems

Fragmentation Transparency
Fragments are parts of a relation
Vertical Fragments: Subset of Columns Horizontal Fragment: Subset of tuples

Fragments are also created for performance and reliability reasons Users should be made unaware of the existence of fragments

CS544 Module 1 Shazia Sadiq ITEE/UQ

15

CS544 Module 1 Shazia Sadiq ITEE/UQ

16

Reliability
Replicated components (data & software) eliminate single points of failure Failure of a single site or communication link will not bring down entire system Managing the reachable data requires Distributed Transaction Support which provides correctness in the presence of failures Data Localization

Performance
Reduces contention for CPU and I/O services Reduces communication overhead

Parallelism
Inter-query & Intra-query parallelism The Multiplex approach

CS544 Module 1 Shazia Sadiq ITEE/UQ

17

CS544 Module 1 Shazia Sadiq ITEE/UQ

18

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Expansion
Expansion by adding processing and storage power to the network Replacing a Mainframe vs. Adding more micro-computers to the network
19

Problem Areas
Database Design Query Processing Directory Management Concurrency Control Deadlock Management Reliability issues Operating System Support Heterogeneous Databases
20

Go to 28
CS544 Module 1 Shazia Sadiq ITEE/UQ CS544 Module 1 Shazia Sadiq ITEE/UQ

Database Design
How to place database and applications Fragmentation or Replication
Disjoint Fragments Full Replication Partial Replication

Query Processing
Determining Factors
Distribution of Data Communication Costs Lack of locally available data

Objective
Maximum utilisation of inherent parallelism

Design Issues
Fragmentation Allocation

CS544 Module 1 Shazia Sadiq ITEE/UQ

21

CS544 Module 1 Shazia Sadiq ITEE/UQ

22

Directory Management
Contains information on data descriptions plus locations Problems similar to database design
global or local centralised or distributed single copy or multiple copies

Concurrency Control
Integrity of the database as in centralised plus consistency of multiple copies of the database - Mutual Consistency Approaches
Pessimistic: Synchronising before Optimistic: Checking after

Strategies
Locking Time stamping

CS544 Module 1 Shazia Sadiq ITEE/UQ

23

CS544 Module 1 Shazia Sadiq ITEE/UQ

24

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Deadlock Management
Similar solutions as for centralised
Prevention Avoidance Detection Recovery

Reliability issues
Reliability is an advantage, but does not come automatically On failure of a site
Database at operational sites must remain consistent and up to date Database at failed site must recover after failure

CS544 Module 1 Shazia Sadiq ITEE/UQ

25

CS544 Module 1 Shazia Sadiq ITEE/UQ

26

Operating System Support


Single processor systems
Memory management File system and access methods Crash recovery Process management

Heterogeneous Databases
Databases differ in
Data Model Data Language

Translation mechanism for data and programs Multi database systems


Building a distributed DBMS from a number of autonomous, centralised databases

Distributed environments (in addition to above)


Dealing with multiple layers of network software

CS544 Module 1 Shazia Sadiq ITEE/UQ

27

CS544 Module 1 Shazia Sadiq ITEE/UQ

28

Architectural Models
Architecture defines structure
Components Functionality of Components Interactions and Interrelationships
External View

Classification
1. Autonomy 2. Distribution 3. Heterogeneity
Distribution

...

External View

Conceptual Schema

Internal Schema Autonomy

The ANSI/SPARC Architecture for Centralised DBMS

Heterogeneity
CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

29

30

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Autonomy
Distribution of control (not data)
Design Communication Execution

Distribution
Distribution of data Physical distribution over multiple sites Alternatives
Non-distributed / Centralised Client Server Peer-to-peer / Fully distributed

Alternatives
Tight integration Semi-autonomous Total isolation / Fully Autonomous
CS544 Module 1 Shazia Sadiq ITEE/UQ

31

CS544 Module 1 Shazia Sadiq ITEE/UQ

32

Heterogeneity
Different database systems Major differences
Data models (Relational, Hierarchical) Data languages (SQL, QBE) Transaction Management Protocols
Autonomy (A)

Architectural Alternatives
A0: Tight integration A1: Semi-autonomous A2: Total isolation
Distribution

Distribution (D)
D0: Non-distributed D1: Client Server D2: Peer -to-peer

(A2. D2, H1)

Autonomy

(A0. D0, H0)

Heterogeneity (H)
H0: Homogeneous H1: Heterogeneous
Heterogeneity

3*3*2 = 18 Alternatives
Some alternatives are meaningless or not practical!

CS544 Module 1 Shazia Sadiq ITEE/UQ

33

CS544 Module 1 Shazia Sadiq ITEE/UQ

34

Architectural Alternatives
(A0, D0, H0)
A collection of logically integrated DBMSs on the same site, also called Composite Systems

Architectural Alternatives
(A0, D1, H0)
Client Server distribution

(A0, D2, H0)


Fully distributed

(A0, D0, H1)


Providing integrated access to heterogeneous systems on a single machine

(A1, D0, H0)


Semi-autonomous systems, also called Federated Systems. Each DBMS knows how to participate in the federation

CS544 Module 1 Shazia Sadiq ITEE/UQ

35

CS544 Module 1 Shazia Sadiq ITEE/UQ

36

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Architectural Alternatives
(A1, D0, H1)
Heterogeneous Federated DBMSs.

Architectural Alternatives
(A2, D0, H1)
Heterogeneous Multi-databases. Similar to (A1, D0, H1), but with full autonomy

(A1, D1, H1)


Distributed Heterogeneous Federated DBMSs

(A2, D1, H1), (A2, D2, H1)


Distributed Multi-database Systems

(A2, D0, H0)


Multi-database Systems. Complete homogeneity in component systems is unlikely

CS544 Module 1 Shazia Sadiq ITEE/UQ

37

CS544 Module 1 Shazia Sadiq ITEE/UQ

38

Major DBMS Architectures


(Ax, D1, Hy)
Client Server Architecture

Client Server Architectures


Distribute the functionality between client and server to better manage the complexity of the DBMS Two-level Architecture
User

(A0, D2, H0)


Peer-to-peer Architecture

(A2, Dx, Hy)


Multi-database Architecture

Typical Scenario
1.Client parses a query, decomposes into independent site queries, and sends to appropriate server 2.Each server processes local query and sends the result relation to client 3.Client combines the results of sub -queries

Client
Response Request

Server

CS544 Module 1 Shazia Sadiq ITEE/UQ

39

CS544 Module 1 Shazia Sadiq ITEE/UQ

40

Peer-to-peer Architecture
Extension of the ANSI/SPARC Architecture to include Global Conceptual Schema (GCS), where GCS is the union of all LCSs
...

Multi-database Architecture
GCS exists as a union of some LCSs only
GES1 GES2 GCS

...

GESn

LES11 LES12

... LES1 m

LESn1 LESn2

... LESn m

LCS1 LIS1 ES1 ES2 GCS LCS1 LIS1


CS544 Module 1 Shazia Sadiq ITEE/UQ

... ...
ES1 ES2

LCSn LISn

ESn

or does not exist at all


... ...
LCSn LISn
CS544 Module 1 Shazia Sadiq ITEE/UQ

... ... ...

ESn

LCS2 LIS2

LCS1 LIS1

LCS2 LIS2

LCSn LISn

41

42

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Distributed Database Design, Query and Transactions


Topics Next Lecture
Distributed Database Design Distributed DBMS Distributed Query Processing Distributed Transaction Management

Distributed Database Design, Query and Transactions

Readings
Lecture Notes Oszu & Valduriez Selected sections of chapters 5, 7 & 12

The lecture notes for these topics contain many more slides than would actually be used for this lecture. These slides have been included for the benefit of the diverse student group
CS544 Module 1 Shazia Sadiq ITEE/UQ 43 CS544 Module 1 Shazia Sadiq ITEE/UQ 44

Distributed Database Design


Designing the Network Distribution of the DBMS Software Distribution of Application Programs Distribution of Data
Level of Sharing Access Pattern Level of Knowledge
Dynamic

Framework of Distribution
Sharing
Data + Programs

Data

Complete Knowledge

Knowledge

Access Pattern

Static

Partial Knowledge

In all cases (except no sharing) the distributed environment has new problems, not relevant in the centralized setup
CS544 Module 1 Shazia Sadiq ITEE/UQ 45 CS544 Module 1 Shazia Sadiq ITEE/UQ 46

Design Strategies
Bottom Up Strategy

Taxonomy of Global Information Sharing


Peer-to-peer Architecture (A0, D2, H0) Local systems are homogeneous, and the global system has control over local data and processing Multi-database Architecture (A2, Dx, Hy ) Fully autonomous, heterogeneous local systems with no global schema

ES1

ES2 GCS

...

ESn

ES1

ES2

... ... ...

ESn

Top Down Strategy


LCS1 LCS2 LIS2

LCS1

LCS2 LIS2

LCSn LISn

... ...

LCSn LISn

LIS1

Choice of Strategy depends on the Architectural Model


CS544 Module 1 Shazia Sadiq ITEE/UQ 47

LIS1

Distributed Databases
CS544 Module 1 Shazia Sadiq ITEE/UQ

Multi-Databases
48

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Top Down Design


Requirements Analysis Objectives Conceptual Design View Integration View Design GCS

Bottom Up Design
Integration of pre-existing local schemas into a global conceptual schema Local systems will most likely be heterogeneous Problems of integration and interoperability
GCS LCS

GCS

Access Information

External Schema

User Input

Distribution Design

LCS LCS Physical Design Physical Schema

Feedback

Observation & Monitoring 49 CS544 Module 1 Shazia Sadiq ITEE/UQ 50

CS544 Module 1 Shazia Sadiq ITEE/UQ

Distributed Design Issues


Fragmentation
Reasons Types Degree Correctness Database Information Communication Information Application Information Computer System Information

Why Fragment ?
Application views are subset of relations - a fragment is a subset of a relation If there is no fragmentation
Entire table is at one site - high volume of remote data accesses Entire table is replicated - problems in executing updates

Allocation
Partitioned Partially Replicated Fully Replicated

Parallel execution of query, since sub-queries operate on fragments - higher concurrency


51 CS544 Module 1 Shazia Sadiq ITEE/UQ 52

CS544 Module 1 Shazia Sadiq ITEE/UQ

Drawbacks of Fragmentation
Application views do not require mutually exclusive fragments - queries on multiple fragments will suffer performance degradation (costly joins and unions) Semantic integrity control - checking for dependency of attributes that belong to several fragments may require access to several sites
Horizontal

Types of Fragmentation

Vertical

Hybrid

CS544 Module 1 Shazia Sadiq ITEE/UQ

53

CS544 Module 1 Shazia Sadiq ITEE/UQ

54

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Horizontal Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980

Vertical Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980

PERSON-FRAGMENT1
NAME John Smith Alex Brown ADDRESS 44, Here St 1, High St PHONE 3456 7890 3678 1234 SAL 34000 48000

PERSON
NAME John Smith Alex Brown Harry Potter PHONE 9976 4321 8765 1237 9851 1238 SAL 98000 65800 23980 55 Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238

PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings SAL 34000 48000 98000 65800 23980

PERSON-FRAGMENT2
NAME Harry Potter Jane Morgan Peter Jennings CS544 Module 1 Shazia Sadiq ITEE/UQ ADDRESS 99, Magic St 87, Riverview 65, Flag Rd

Primary Key [NAME] is always included


CS544 Module 1 Shazia Sadiq ITEE/UQ 56

Hybrid Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980

Degree of Fragmentation
Individual Tuples Individual Columns

PERSON
NAME John Smith Alex Brown ADDRESS 44, Here St 1, High St PHONE 3456 7890 3678 1234

PERSON
NAME Harry Potter Jane Morgan Peter Jennings SAL 98000 65800 23980

Hybrid Fragmentation

No Fragmentation
58

CS544 Module 1 Shazia Sadiq ITEE/UQ

57

CS544 Module 1 Shazia Sadiq ITEE/UQ

Correctness of Fragmentation
Rules ensure that the database does not undergo semantic change during fragmentation Completeness
Each data item found in a relation R will be found in one or more of Rs fragments R1 , R2 , Rn

Integrity Constraints Issues


Consider an example: the following global schema R1=ABCD, R2=DEFG, R3=FGHIJ A B C D E F G H I J

Reconstruction
The dependency constraints on the data can be preserved by ensur ing that every relation R can be reconstructed from its fragments using the operator ,such that R = Ri, RiF R

We propose the following design; R11=ABC R21=DEF R31=FGH Site 1 R12=ABD R22=DG R32=FGIJ Site2

Disjointness
A relation R is decomposed into disjoint horizontal fragments, such that if a data item di Rj, then di Rk where j k. In vertical fragments, disjointness does not apply to primary key attributes

Obviously: R11 R12=R1 R21 R22=R2 R31 R32=R3

CS544 Module 1 Shazia Sadiq ITEE/UQ

59

CS544 Module 1 Shazia Sadiq ITEE/UQ

60

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Integrity Constraints Issues


Let r2(R2), r3(R3) be the following r2 (D E F G ) 1 2 3 1 2 4 3 1 r21 (D E F) 1 2 3 2 4 3 r22 (D G) 1 1 2 1
CS544 Module 1 Shazia Sadiq ITEE/UQ

Integrity Constraints Issues


R1=ABCD, R2=DEFG, R3=FGHIJ global schema r2 (D E F G ) 1 2 3 1 2 4 3 1 r3 (F G H I J) 3 1 0 20 3 2 1 1 1 r2*r3 ( D E F G H I J) 1 2 3 1 0 2 0 2 4 3 1 0 2 0 Find all EFH

r3 (F G H I J) 3 1 0 20 3 2 1 1 1 r31 (F G H) 3 1 0 3 2 1 r32 (F G I J) 3120 3211


61

E F H 2 3 0 4 3 0

CS544 Module 1 Shazia Sadiq ITEE/UQ

62

Integrity Constraints Issues


We execute the same query in our DDB; R11=ABC R21=DEF R31=FGH Site 1 r21 (D E F) 1 2 3 2 4 3
Find all EFH

Integrity Constraints Issues


R1=ABCD, ABC DEF R2=DEFG, ABD DG FGIJ R3=FGHIJ

R12=ABD R22=DG R32=FGIJ Site2

r31 (F G H) 3 1 0 3 2 1

E 2 4 2 4

F H 3 0 3 0 3 1 3 1
63

FGH

Often, more constraints must be maintained

CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

64

Versions of HFs
Primary Horizontal Fragmentation
Selection operation on the owner relation of a database schema minterm predicate SELECT * FROM owner WHERE m i
DEPT
DEPT CSEE CHEMISTRY ARCHIT

Example of Derived HFs


EMP
BUILDING GP-SOUTH HAWKEN HAWKEN NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings Melissa Dean DEPT CSEE CHEMISTRY CHEMISTRY CSEE CSEE ARCHIT SAL 34000 48000 98000 65800 23980 60000

Derived Horizontal Fragmentation


Defined on the member relation of a link according to the selection operation specified for its owner SELECT * FROM member WHERE member.join -attr IN (SELECT owner.join-attr minterm FROM owner WHERE m j)
predicate
CS544 Module 1 Shazia Sadiq ITEE/UQ 65

Relations Fragments: Co -located employees


EMP
NAME Alex Brown Harry Potter Melissa Dean DEPT CHEMISTRY CHEMISTRY ARCHIT SAL 48000 98000 60000

EMP
NAME John Smith Jane Morgan Peter Jennings DEPT CSEE CSEE CSEE SAL 34000 65800 23980

CS544 Module 1 Shazia Sadiq ITEE/UQ

66

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Approaches for VFs


Vertical fragmentation is more complex than horizontal fragmentation due to the total number of alternatives available
HF: number of minterm predicates VF: number of non-primary key attributes
d ate plic Re ully F

Allocation
3

Update Problems

Heuristic Approaches
Grouping: Bottom Up Splitting: Top Down

ned titio Par

2 ated plic Re y iall art P

1 Performance of read-only queries


CS544 Module 1 Shazia Sadiq ITEE/UQ 67 CS544 Module 1 Shazia Sadiq ITEE/UQ 68

Allocation Problem
Find the optimal distribution of F = {F1, F2, Fn} to S = {S1, S2, Sm }, on which a set of applications Q = {q 1, q 2, q q} are running
Optimality can be defined through: Minimal Cost Performance
Cost of storing and querying a fragment at a site Cost of updating a fragment at all sites Cost of data communication
CS544 Module 1 Shazia Sadiq ITEE/UQ

Guidelines for Distributed Design


Given a database schema, how to decompose it into fragments and allocate to distributed sites, while minimizing remote data access and update problems
Most active 20% of queries account for 80% of the total data accesses Qualitative aspect of the data guides the fragmentation activity Quantitative aspect of the data guides the allocation activity

Minimize response time Maximize system throughput

69

CS544 Module 1 Shazia Sadiq ITEE/UQ

70

Information Requirements
Fragmentation
Database Information Application Information

Information requirements for HF


Database Information
Links between relations (relations in RDB)
Department
Owner/Source

Employees
Member/target

Allocation
Communication Network Information Computer System Information

Application Information
Predicates in user queries
Simple: p1 : DEPT = CSEE, p2 : SAL > 30000 Conjunctive: minterm predicate: m1= p 1 p2 DEPT=CSEE SAL>30000

CS544 Module 1 Shazia Sadiq ITEE/UQ

71

Minterm selectivity: Number of tuples returned against a given minterm Access Frequency: Access frequency of user queries and/or minterms CS544 Module 1
Shazia Sadiq ITEE/UQ

72

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Information requirements for VF


Application Information
Place in one fragment, attributes usually accessed together. Affinity: The bond between two attributes on the basis of how they are accessed by applications. Affinity is measured by Usage/Affinity Matrix
Usage of an attribute A in query q: Use (q, A) = 1 if used = 0 if not used Affinity of two attributes Ai and Aj for a set of queries: Aff (Ai, Aj)
CS544 Module 1 Shazia Sadiq ITEE/UQ

Information Requirements for Allocation


Database Information
Fragment Selectivity: The number of tuples of a fragment Fi that need to be accessed in order to process a query qj The size of a fragment: Size (Fi) = Cardinality (Fi) * Length-in-bytes (Fi)

Usage/Affinity Matrix used in Vertical Partitioning Algorithms - Clustering Algorithm - Partitioning Algorithm
73

Network Information
Channel capacities, distances, protocol overhead Communication cost per frame between sites Si and Sj
CS544 Module 1 Shazia Sadiq ITEE/UQ 74

Information Requirements for Allocation


Application Information
Read Access: The number of read accesses that a query qj makes to a fragment Fi during execution Update Access: The number of update accesses that a query qj makes to a fragment Fi during execution

Distributed DBMS
Components of Distributed DBMS Query Processing and Optimization Transaction Management Concurrency Control Reliability Issues

Site Information
Knowledge of storage and processing capacity The unit cost of storing data at Sk Cost of processing one unit of work at Sk
CS544 Module 1 Shazia Sadiq ITEE/UQ 75 CS544 Module 1 Shazia Sadiq ITEE/UQ 76

DDBMS Components
User

Distributed Query Processing


The Distributed Query Problem Layers of Query Processing Query Decomposition Data Localization Query Optimization

DATA PROCESSOR
External Schema Global Conceptual Schema User Interface Handler Local Query Processor Semantic Data Controller Global Query Optimizer Distributed Execution Monitor Runtime Support Processor Local Recovery Manager Local Conceptual Schema

System Log

GD/D

Local Internal Schema

USER PROCESSOR

CS544 Module 1 Shazia Sadiq ITEE/UQ

77

CS544 Module 1 Shazia Sadiq ITEE/UQ

78

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Distributed Query Processing


Distributed Query Processor
Map high level query on a distributed database, into a sequence of database operations on fragments Database Operations + Communication Operations

A Simple Query Problem


EMP (ENO, ENAME, TITLE) ASG (ENO, PNO, RESP, DUR)
Find the names of employees who are managing a project

Distributed Query Optimizer


Minimization of cost function based
Computing resources (CPU, I/O) Communication networks

SELECT FROM WHERE AND


79 CS544 Module 1 Shazia Sadiq ITEE/UQ

ENAME EMP, ASG EMP.ENO = ASG.ENO RESP = Manager;


80

CS544 Module 1 Shazia Sadiq ITEE/UQ

Algebraic Transformations
1. Ename ( RESP = Manager EMP.ENO = ASG.ENO (EMP X ASG)) 2. Ename (EMP
ENO ( RESP = Manager

A Distributed Query Problem


Ename (EMP
ENO

( RESP = Manager (ASG)))

(ASG)))

Centralized
Query 2 is better since it avoids the costly Cartesian Product

Distributed
Relational Algebra is not sufficient to express execution strategies
Operations for exchanging data Selection of sites to process

EMP and ASG are fragmented as EMP1 = ENO E3 (EMP) EMP2 = ENO < E3 (EMP) ASG1 = ENO E3 (ASG) ASG2 = ENO < E3 (ASG) ASG1, ASG2, EMP1, EMP2 are stored at sites 1, 2, 3, and 4. The result is expected at site 5 !
81 CS544 Module 1 Shazia Sadiq ITEE/UQ 82

CS544 Module 1 Shazia Sadiq ITEE/UQ

Equivalent Strategies
Site 5

Layers of Query Processing


CALCULUS QUERY ON DISTRIBUTED RELATIONS QUERY DECOMPOSITION GLOBAL SCHEMA

Strategy A Total Cost = 460

Result = EMP`1 U EMP`2 EMP`1


ENO

Site 3

Site 4
(ASG1)

EMP`2
ENO

EMP`1 = EMP1

EMP`2 = EMP2

(ASG2)

ALGEBRAIC QUERY ON DISTRIBUTED RELATIONS

Site 1
ASG1 =

ASG`1

ASG`2

CONTROL SITE

Site 2
RESP = Manager

DATA LOCALIZATION FRAGMENTQUERY GLOBAL OPTIMIZATION

FRAGMENT SCHEMA

ASG1

ASG2 =

RESP = Manager

ASG2

Site 5

STATS ON FRAGMENTS

Strategy B Total Cost = 23000


CS544 Module 1 Shazia Sadiq ITEE/UQ

Result = (EMP1 U EMP2)

ENO ( RESP = anager (ASG1 M

U ASG2)

OPTIMIZED FRAGMENT QUERY WITH COMMUNICATIONOPERATIONS

ASG1

ASG2

EMP1

EMP2

LOCAL SITES

LOCAL OPTIMIZATION OPTIMIZED LOCAL QUERIES

LOCAL SCHEMA

Total cost is determined through tuple accesses and transfers


83 CS544 Module 1 Shazia Sadiq ITEE/UQ

Site 1

Site 2

Site 3

Site 4

Go to 97

84

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Query Decomposition
The same techniques as in centralized since information on data distribution is not used Step 1: Normalization Step 2: Analysis Step 3: Elimination of Redundancy Step 4: Rewriting

Step 1: Normalization
Transform arbitrary complex queries into a normal form Two Normal Forms
Conjunctive ( P11 P12 P1n) ... (P m1 P m1 P mn ) Disjunctive ( P 11 P 12 P 1n) ... (P m1 P m1 P mn)

Example equivalence rules Example transformation

P 1 (P2 P 3 ) (P1 P 2 ) (P1 P 3 ) (P 1 P 2 ) P 1 P2 ...

TITLE = Manager AND SAL > 30000 AND PROJ = 12 OR PROJ = 34 (TITLE = Manager SAL > 30000 PROJ = 12) (TITLE = Manager SAL > 30000 PROJ = 34)
CS544 Module 1 Shazia Sadiq ITEE/UQ CS544 Module 1 Shazia Sadiq ITEE/UQ

85

86

Step 2: Analysis
Determine if further processing of the normalized query is possible.
Type incorrectness
If attributes or relation names are not defined in the global schema If operations are being applied to attributes of the wrong type SELECT FROM WHERE AND ENAME, DUR EMP, ASG EMP.ENO = ASG.ENO RESP = Manager

Step 3: Elimination of Redundancy


Simplification of the query qualification (WHERE clause) using well known idempotency rules
P false false P PP P 1 (P 1 P 2 ) P 1 P 1 P false etc.

Query Graph
Nodes: Result, Operands Edges: Join, Projection
Disconnected graph indicates incorrectness
RESP = Manager

Semantic incorrectness
If components do not contribute to the generation of the result Query graphs - disconnected graphs show incorrectness

ASG
EMP.ENO = ASG.ENO

DUR

Example redundancy
RESULT

SAL > 30000 AND (SAL 3000) SAL > 30000


87 CS544 Module 1 Shazia Sadiq ITEE/UQ 88

EMP
CS544 Module 1 Shazia Sadiq ITEE/UQ

ENAME

Step 4: Rewriting
Step 4.1: Transformation
Transform relational calculus query into relational algebra using Operator Tree By applying Transformation Rulesmany different but equivalent trees may be found
Ename

Data Localization
Localize the query using data distribution information Algebraic query on global relations Algebraic query on physical fragments Two Step Process
Localization Reduction

ENO

RESP = Manager
EMP ASG

Step 4.2: Restructuring


Restructuring the tree to improve performance

Operator Tree
SELECT FROM WHERE AND ENAME EMP, ASG EMP.ENO = ASG.ENO RESP = Manager
89

CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

90

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Data Localization
Step 1: Map distributed query to fragment query using Localization program
EMP1 = SAL > 30000 (EMP) and EMP2 = SAL 30000 (EMP) then localization program for EMP = EMP1 EMP2

Query Optimization
Query optimization has been independent of fragment characteristics such as cardinality and replication Main objectives of the optimizer (NP-hard !)
Find a strategy close to optimal Avoid Bad strategies

Step 2: Simplify and restructure the fragment query using Reduction techniques
Basic Idea is to determine the intermediate results in the query that produce empty (horizontal) or useless (vertical) relations, and remove the subtrees that produce them
CS544 Module 1 Shazia Sadiq ITEE/UQ 91

Choice of optimal strategy requires prediction of execution costs


Local Processing (CPU, I/O) + Communication
CS544 Module 1 Shazia Sadiq ITEE/UQ 92

Distributed Query Optimization


Global Optimization
Join Ordering Semijoin based Algorithms
FRAGMENTQUERY GLOBAL OPTIMIZATION STATS ON FRAGMENTS

Components of the Optimizer


Search Space
The set of alternative, but equivalent query execution plans Abstracted by means of operator trees

Search Strategy
Explores the search space and selects the best plan using the cost model Defines how the plans are examined

Local Optimization
INGRES Algorithm System R Algorithm

OPTIMIZED FRAGMENT QUERY WITH COMMUNICATION OPERATIONS

LOCAL OPTIMIZATION

LOCAL SCHEMA

Distributed Cost Model


Predicts the cost of a given execution plan

Most try to optimize the ordering of joins


CS544 Module 1 Shazia Sadiq ITEE/UQ

OPTIMIZED LOCAL QUERIES

93

CS544 Module 1 Shazia Sadiq ITEE/UQ

94

Distributed Cost Model


Cost Functions
Total Time =
TCPU * # insts + TI/O * #I/Os + T MSG * # msgs + TTR * #bytes

Cardinalities of intermed. relations


Formulas for estimating the cardinalities of the results of the basic relational algebra operations
Selection, Projection, Cartesian product, Join, Semi Join, Union, Difference

Response Time
TCPU * seq-#insts + TI/O * seq-#I/Os + TMSG * seq-#msgs + TTR * seq-#bytes

Database Statistics
Estimate the size of the intermediate relations Based on statistical information: Length of attributes, distinct values, number of tuples in the fragment ...
CS544 Module 1 Shazia Sadiq ITEE/UQ 95

Examples
card ( (R)) = card (R) card (R X S) = card (R) * card (S)

CS544 Module 1 Shazia Sadiq ITEE/UQ

96

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Semijoins
The dominant factor in the time required to process a query is the time spent on data transfer between sites ( not the computation or retrieval from secondary storage) Example suppose that table r is stored at site 1, and s at 2 to evaluate ( r join A=a(s)) at site 1
q r 1 2 s

Semijoin
Definition: let r(R) and s(S) be two relations. The semijoin of r with s denoted is the relation R (r s), that is, r s is the portion of r that joins with s. Some important transformations; R(r s)= R(r) R (s)=r R (s)

1) site 1 asks site 2 to transmit s and then computes the selection itself 2) site 1 transmits the selection condition to site 2 and site 2 sends selected records to site 1 ( presumably much smaller then s)
CS544 Module 1 Shazia Sadiq ITEE/UQ 97 CS544 Module 1 Shazia Sadiq ITEE/UQ 98

Semijoin Example
(1) We can transfer s to site 1 and compute join at site 1 ( 24 values) (2) We compute r= B(r) at site 1, send r to site 2, compute s =(r join s) and send s to site 1 (15 values) (3) We compute (r s) as (r s)
r ( A B )@S1 s ( B 1 4 4 1 5 4 1 6 7 2 4 10 2 6 10 3 7 11 3 8 11 3 9 12 r ( B ) 4 5 6 7 8 9 C 13 14 13 14 15 14 15 15 D )@S2 16 16 17 16 17 16 18 16

Distributed Transaction Managmnt


Transaction Management - Revisited Distributed Transactions Distributed Execution Monitor Advanced Transaction Models

Thus, r s can be computed knowing only s


CS544 Module 1 Shazia Sadiq ITEE/UQ

s ( B C D ) 4 13 16 4 14 16 7 13 17

99

CS544 Module 1 Shazia Sadiq ITEE/UQ

100

Centralized transaction processing


Data Access read-tem (X), write-item (X)
read-item, write-item begintransaction
read- item (X) write- i t e m ( X )

Transaction States
endtransaction

Buffers
Buffer
Disk Block containing X

Active

Partially Committed

commit

Committed

Database Access Operations

Database

Interleaving

T1
read -item (X); X:=X-N; write-item (X); read -item (Y); Time Y:=Y+N; write-item (Y);

T2
read -item (X); X:=X+M; write-item (X);

Well-defined Repetitive High volume Update Controlled


Transition

abort

abort

Failed

Terminated

STATE

CS544 Module 1 Shazia Sadiq ITEE/UQ

101

CS544 Module 1 Shazia Sadiq ITEE/UQ

102

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Desirable properties
ACID properties - enforced by concurrency control and recovery methods of the DBMS
Atomicity Consistency Isolation Durability

Distributed Transactions
Database Consistency
No semantic integrity constraints are violated

Transaction Consistency
No incorrect updates during concurrent database accesses

Mutual Consistency
No difference in the values of a data item in all the replicas to which it belongs

CS544 Module 1 Shazia Sadiq ITEE/UQ

103

CS544 Module 1 Shazia Sadiq ITEE/UQ

104

Example
Consider the following example:

Example
A manager wants to query all the stores, find the inventory of toothbrushes and have them moved from store to store in order to balance the inventory
This transaction will have a local component* Ti at the ith store, and a global component T0 at the central office: 1. Component T0 is created at the central office 2. T0 sends message to all stores instructing them to create Tis 3. Each Ti executes query to determine number of toothbrushes and returns result to T0 4. T0 takes these numbers, determines (by some algorithm) the shipmen ts required, and sends instructions to all affected stores. For example Store 10 to ship 500 toothbrushes to store 7 5. Stores receiving these instructions update their inventory and perform the shipment
* A distributed transaction will consist of communicating transaction components, each at a different site CS544 Module 1 Shazia Sadiq ITEE/UQ 106

A chain of department stores has many individual stores. Each store has a database of sales and inventory at that store. There may also be a central office database with data about employees, chain-wide inventory, credit card customers, information about suppliers such as outstanding orders etc. There is a copy of all the stores sales data in a data warehouse, which is used to analyse and predict sales

Source: [MUW2002]

CS544 Module 1 Shazia Sadiq ITEE/UQ

105

Example
What can go wrong?
A bug in the algorithm causes store 10 to ship more toothbrushes than available in its inventory causing T10 to abort. However, T7 detects no problem and hence updates its inventory in anticipation of the incoming shipment and commits
Atomicity of T is compromised ( T10 never completed) Database is left in an inconsistent state

Problems
Managing the commit/abort decision in distributed transactions? (One component wants to abort while others want to commit)
Two phase commit

After returning the result (step 3) of its inventory count to T0, the machine at store 10 crashes. The instructions to ship are not received from T0.
What should T10 do when its site recovers?
107 CS544 Module 1 Shazia Sadiq ITEE/UQ

Assuring serializability of transactions that involve components at several sites?


Distributed locking

CS544 Module 1 Shazia Sadiq ITEE/UQ

108

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Commit in Centralized Databases


All database access operations of the transaction have been executed successfully and their effect on the database has been recorded in the log System log entries: [start -transaction, T]; [read-item, T, X]; [write-item, T, X, old-value, new-value]; [commit, T]; After the commit point, the transaction T is said to be committed
CS544 Module 1 Shazia Sadiq ITEE/UQ 109

Commit in Distributed Databases?


Distributed DBMSs may encounter problems not found in a centralized environment Dealing with multiple copies of the data Failure of an individual site Failure of communication links Recording the effect of database access operations in the (local) system log is not a sufficient condition for distributed commit
Source: [MUW2002]

CS544 Module 1 Shazia Sadiq ITEE/UQ

110

A Distributed Execution Monitor


Begin-Transaction, Read, Write, Commit, Abort Results

Classification of Transactions
Distribution
Transactions in Centralized DBMSs Transactions in Distributed DBMSs

Begin-transaction
Basic housekeeping by TM

Read
If data item is local, then TM retrieves, else selects one copy With other SCs and data processors

Distributed Execution Monitor


Transaction Manager (TM)

Duration
Short-life transactions Long-life / long duration transactions

Write
TM coordinates the updating of the data item value at all sites where it resides

Commit
TM coordinates the physical updating of all databases that contain a copy of each data item for which the write was issued

Processing
On -line / interactive transactions Batch transactions

With other TMs

Scheduler (SC)

Abort
TM ensures that no effect of the transaction is reflected in the database

Grouping of Operations
Flat transactions Nested transactions

to data processors
CS544 Module 1 Shazia Sadiq ITEE/UQ

Go to 124
111 CS544 Module 1 Shazia Sadiq ITEE/UQ 112

Selected Adv. Transaction Models


Nested Transactions:
Grouping of operations into hierarchical structures

Nested Transactions
A set of sub-transactions that may recursively contain other sub-transactions
Begin-transaction Reservation Begin-transaction Airline End. {Airline} Begin-transaction Hotel End. {Hotel} Begin-transaction Car End. {Car} End

Workflow (long duration) Transactions:


Relaxation of the ACID Properties

CS544 Module 1 Shazia Sadiq ITEE/UQ

113

CS544 Module 1 Shazia Sadiq ITEE/UQ

114

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Types of Nested Transactions


Closed Nested Transaction
Sub-transaction begins after the root and finishes before Commit of sub-transaction is conditional upon the commit of the root Top-level atomicity

Advantages of Nested TM
Higher level of concurrency
Objects can be released after sub-transaction

Independent recovery of sub-transactions


Damage is limited to a smaller part, making it less costly to recover

Open Nested Transactions


Relaxation of top-level atomicity Partial results of sub-transactions visible
Sagas Split transactions

Creating new transactions from existing ones


115 CS544 Module 1 Shazia Sadiq ITEE/UQ 116

CS544 Module 1 Shazia Sadiq ITEE/UQ

Workflow (long duration) Transcts


Generic definition:
A workflow is an automation of a business process

Sagas: Open and Long-duration Transactional Model


A collection of actions that form a long duration transaction

start A0 A1 abort A4 A2 abort A5 complete A3

In the context of transactions:


Workflow is a (high level) activity, that consists of a set of tasks with a well-defined precedence relationship between them

A collection of actions A graph whose nodes are either actions, or one of {Abort, Complete } called the terminal nodes An indication of the start called the start node

Workflow tasks (sub-transactions) are allowed to commit individually (open nesting semantics), permitting partial results to be visible outside the workflow Various concepts introduced to overcome problem of dealing with sub-transaction commit
Compensating tasks Critical tasks Contingency tasks
CS544 Module 1 Shazia Sadiq ITEE/UQ 117

Successful path: A0, A1, A2, A3 Unsuccessful paths: A0, A1, A4 A0, A1, A2, A5

Optional for this course


CS544 Module 1 Shazia Sadiq ITEE/UQ 118

Concurrency Control in Sagas


Concurrency control is managed by two facilities
1. Each action A, is itself a (short) transaction
Uses conventional concurrency control such as locking

Distributed Concurrency Control


Distributed Concurrency Control through Timestamps Distributed Concurrency Control through Locking

2. The overall transaction which can be any path from the start node to one of the terminal nodes
Uses the concept of Compensating transactions A Compensating transaction rolls back the effect of a committed action in a way that does not depend on what happened to the database between the time of the action and the time of the compensating transaction If A is any action, A -1 is its compensating transaction, and is any sequence of legal actions and compensating transactions, then A A-1
119

CS544 Module 1 Shazia Sadiq ITEE/UQ

CS544 Module 1 Shazia Sadiq ITEE/UQ

120

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Centralized Timestamp Ordering


For each item accessed by conflicting operations in a schedule, the order of item access does not violate the serializability order. Each item has:
read-TS (X) = TS (T) write -TS (X) = TS (T) where T is the transaction that most recently read (wrote) X
CS544 Module 1 Shazia Sadiq ITEE/UQ 121

Distributed Timestamp Ordering


Begin-Transaction, Read, Write, Commit, Abort Results

Distributed Execution Monitor


Transaction Manager (TM)

With other SCs and data processors

With other TMs

Scheduler (SC)

TM is responsible for assigning a timestamp to each new transaction and attaching it to each database operation passed to the scheduler SC is responsible for keeping track of read write timestamps and performing serializability checks

to data processors
CS544 Module 1 Shazia Sadiq ITEE/UQ 122

Centralized 2 Phase Locking


A transaction follows the two -phase protocol if all locking operations precede the first unlocking operation

Distributed 2 Phase Locking


Begin-Transaction, Read, Write, Commit, Abort Results

Distributed Execution Monitor


Transaction Manager (TM)

With other LMs and data processors

read -lock (X) write-lock (X) write-lock (Y)

read -lock (Y) unlock (X) unlock (Y)

With other TMs

Lock Manager (LM)

Phase 1: Growing

Phase 2: Shrinking
to data processors

Differences TM sends request to lock managers of ALL participating sites Operations are passed to data processors by local as well as REMOTE participating lock managers

CS544 Module 1 Shazia Sadiq ITEE/UQ

123

CS544 Module 1 Shazia Sadiq ITEE/UQ

124

Distributed Locking
Main difference from Centralized Locking

Centralized 2PL
One site is designated as the lock site or primary site The primary site is delegated the responsibility for lock management Transaction managers at participating sites deal with the primary site lock manager rather than their own
DP at Participating Sites Coordinating TM Primary Site LM

Replication Control Logical Elements --- Global Locks

How do we manage locks for objects across many sites?


Centralized Primary Copy Fully Distributed
CS544 Module 1 Shazia Sadiq ITEE/UQ 125 CS544 Module 1 Shazia Sadiq ITEE/UQ

2 3 4 5

Performance bottleneck at the primary site Vulnerable to single site failure

126

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Primary Copy 2PL


The function of the primary site is distributed (thus avoiding bottleneck at the primary site) Lock managers at a number of sites Each lock manager responsible for managing locks for a given set of data elements The copy of the data element at this site is called the primary copy, that is each logical element still has a single site responsible for maintaining global lock Access to the data element will require access to transaction site as well as the primary copy site
CS544 Module 1 Shazia Sadiq ITEE/UQ 127

Distributed 2PL
Lock managers at each site If no replication, Distributed 2PL is the same as Primary copy 2PL When there is replication, a transaction must acquire global lock (No copy of a data element is primary) When can a transaction assume it has a (shared or exclusive) global lock on a data element?

CS544 Module 1 Shazia Sadiq ITEE/UQ

128

Distributed CC Based on Voting


Majority Locking
If a data item Q is replicated in n different sites, then a lock-request message must be sent to more than one-half of the n sites in which Q is stored. The requesting transaction does not operate on Q until it has obtained a lock on a majority of replicas for Q

Deadlocks - Revisited
Locking-based concurrency control may result in deadlocks!
Deadlock: Each transaction in a set (of >2 transactions) is waiting for an item which has locked another transaction in the set
Partial Schedule S

Read One, Write All Locking


When a transaction needs to read/shared lock a data item Q, it requests a lock from the lock manager at one site that contains a replica of Q When a transaction needs to write/exclusive lock a data item Q, it requests a lock on Q from the lock manager of all sites that contain a replica of Q Go to 135
129

T1
read-lock (Y); read-item (Y);

T2
read-lock (X); read-item (X);

T1

T2

write-lock (X); write-lock (Y);

Wait-for Graph for S


CS544 Module 1 Shazia Sadiq ITEE/UQ 130

CS544 Module 1 Shazia Sadiq ITEE/UQ

Distributed Deadlock Detection


Each site maintains a local waits-for graph. A global deadlock might exist even if the local graphs contain no cycles:
T1
SITE A

Distributed Deadlock Detection


Three Solutions: Centralized
One site designated as deadlock detector Lock managers at other sites periodically send local waitfor graphs to deadlock detector, that forms a global wait -for graph Length of interval is a system design decision Approach vulnerable to failure and has high communication overhead

T2 T1 T2
SITE B GLOBAL

T2

T1

Hierarchical Timeout
131 CS544 Module 1 Shazia Sadiq ITEE/UQ 132

CS544 Module 1 Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Distributed Deadlock Detection


Three Solutions: Centralized Hierarchical
Sites are organized into a hierarchy of deadlock detectors Local graphs are sent to parent in the hierarchy Reduces communication overhead and dependence on central deadlock detection site DDox

Distributed Deadlock Detection


Three Solutions: Centralized Hierarchical Timeout
Any transaction that has not completed after an appropriate amount of time is assumed to have gotten deadlocked, and is aborted

DD11

DD14

Site 1

DD21

Site 2

DD22

Site 3

DD23

Site 4

DD24

Timeout
CS544 Module 1 Shazia Sadiq ITEE/UQ 133 CS544 Module 1 Shazia Sadiq ITEE/UQ 134

Distributed Reliability Issues


Failures in Distributed Systems Distributed Reliability Protocols

Failures in Distributed Systems


Transaction Failures Site (System) Failures Media Failures Communication Failures

CS544 Module 1 Shazia Sadiq ITEE/UQ

135

CS544 Module 1 Shazia Sadiq ITEE/UQ

136

Centralized Recovery
Main Techniques
Deferred Update Immediate Update
Stable Log
Read, Write

Local Recovery Manager


Log Buffers
Read, Write

Concepts
Disk Caching System Log Checkpointing

Local Recovery Manager


Fetch, Flush

Stable Database

Database Buffer Manager


Read, Write

Database Buffers (Volatile Database)

Read, Write

Secondary Storage
CS544 Module 1 Shazia Sadiq ITEE/UQ 137 CS544 Module 1 Shazia Sadiq ITEE/UQ

Main Memory
138

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Distributed Reliability Protocols


Commit Protocols
Maintain atomicity of distributed transactions.
Atomic Commitment: Ensures that the effects of the transaction on the distributed database is all-or-nothing, even though the distributed transaction may involve multiple sites

Commit Protocols
1 Phase Commit (1PC)
Ready: All operations of the transaction completed except commit
Initial execute Ready commit Committed

abort

Recovery Protocols
The procedure a failed site has to go through to recover its sta te after restart.
Independent Recovery Protocols: Protocols that determine how to terminate a transaction that was executing at a failed site without having to consult any other site.

Failed

2 Phase Commit (2PC)


Prepare: Global Commit Rule
Initial execute Ready prepare Prepared commit Committed

Termination Protocols
The procedure that the operational sites go through when a participating site fails.
Non-blocking Termination Protocols: Protocols that determine how to allow the transaction at the operational site to terminate, without waiting for the failed site to recover. CS544 Module 1
Shazia Sadiq ITEE/UQ

Global abort: If one participant votes to abort Global Commit: If all participants vote to commit
CS544 Module 1 Shazia Sadiq ITEE/UQ

abort

Failed

139

140

Centralized 2P Commit Protocols


Coordinator Participants

Two-Phase Commit (2PC)


Site at which Xact originates is coordinator; other sites at which it executes are subordinates. When an Xact wants to commit:
Coordinator sends prepare msg to each subordinate. - Subordinate force -writes an abort or prepare log record and then sends a no or yesmsg to coordinator. If coordinator gets unanimous yes votes, force-writes a commit log record and sends commit msg to all subs. Else, force -writes abort log rec, and sends abortmsg. Subordinates force -write abort/commit log rec based on msg they get, then send ack msg to coordinator. Coordinator writes end log rec after getting all acks.

prepare

vote

global commit

commit

Phase 1
CS544 Module 1 Shazia Sadiq ITEE/UQ

Phase 2
141 CS544 Module 1 Shazia Sadiq ITEE/UQ 142

Distributed 2P Commit Protocols


Coordinator Participants

Comments on 2PC
Two rounds of communication: first, voting; then, termination. Both initiated by coordinator. Any site can decide to abort an Xact . Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log. All commit protocol log recs for an Xact contain Xactid and Coordinatorid. The coordinators abort/commit record also includes ids of all subordinates.
CS544 Module 1 Shazia Sadiq ITEE/UQ 144

prepare

vote

global commit decision made independently

Phase 1
CS544 Module 1 Shazia Sadiq ITEE/UQ

No Phase 2
143

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Restart After a Failure at a Site


If we have a commit or abort log rec for Xact T, but not an end rec, must redo/undo T.
If this site is the coordinator for T, keep sending commit/abort msgs to subs until acks received.

Blocking
If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers. T is blocked. Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no.

If we have a prepare log rec for Xact T, but not commit/abort, this site is a subordinate for T.
Repeatedly contact the coordinator to find status of T, then write commit/abort log rec; redo/undo T; and write end log rec.

If we dont have even a prepare log rec for T, unilaterally abort and undo T.
This site may be coordinator! If so, subs may send msgs.
CS544 Module 1 Shazia Sadiq ITEE/UQ 145

CS544 Module 1 Shazia Sadiq ITEE/UQ

146

Link and Remote Site Failures


If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed: If the current site is the coordinator for T, should abort T. If the current site is a subordinate, and has not yet voted yes, it should abort T. If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds.

2PC with Presumed Abort


1. Ack msgs used to let coordinator know when it can forget an Xact (T is kept in the Xact Table until it receives all acks) 2. If coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the Xact. Meantime, if another site inquires about T, the recovery process will direct it to abort T. In the absence of information, T is presumed to have aborted
When coordinator aborts T, it undoes T and removes it from the Xact Table immediately, resulting in a no information state Similarly, Subordinates do not need to send acks on abort. If it had voted yes previously, then a later inquiry will result in no information and consequently abort. Names of subs not recorded in abort log rec.

3. If a subtransaction does no updates, its commit or abort status is irrelevant.


If subXact does not do updates, it responds to prepare msg with reader instead of yes/no. Coordinator subsequently ignores readers. If all subXacts are readers, 2nd phase not needed.

CS544 Module 1 Shazia Sadiq ITEE/UQ

147

CS544 Module 1 Shazia Sadiq ITEE/UQ

148

Commit Protocols
3 Phase Commit Protocol avoids the possibility of blocking in a restricted case of possible failures: no network partition can occur, at any point at least one site must be up, at any point at most K participating sites can fail, Interesting but optional for this course

Distributed Recovery
Two new issues: New kinds of failure, e.g., links and remote sites. If sub-transactions of an Xact execute at different sites, all or none must commit. Need a commit protocolto achieve this. A log is maintained at each site, as in a centralized DBMS, and commit protocol actions are additionally logged.

CS544 Module 1 Shazia Sadiq ITEE/UQ

149

CS544 Module 1 Shazia Sadiq ITEE/UQ

150

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

Next Lecture

Data warehousing and OLAP Technologies

CS544 Module 1 Shazia Sadiq ITEE/UQ

151

Lecture Notes INFS7907 School of Information Technology and Electrical Engineering

You might also like