Professional Documents
Culture Documents
Topics
Concepts Benefits & Problems Architectures
Why distributed
Corresponds to the organizational structure of distributed enterprises
Electronic commerce Manufacturing control systems
Economic Benefits
More computing power More discipline in smaller but autonomous groups
CS544 Module 1 Shazia Sadiq ITEE/UQ 2
Why distributed
To solve complex problems by dividing into smaller pieces, and assigning them to different software groups, which work on different computers To produce a system which, although runs on multiple processing elements, can work effectively towards a common goal
Centralize
Distribute
Concepts
Distributed Computing Systems
A number of autonomous processing elements, not necessarily homogeneous, that are interconnected by a computer network and that cooperate in performing their assigned tasks
Distributed Database
A collection of multiple, logically interrelated databases, distributed over a computer network
Function
Printing Email
Data
5 CS544 Module 1 Shazia Sadiq ITEE/UQ 6
Misconceptions
Distributed Computing is not ...
Program 1
Centralized Database
Program 2 Program 3
Database
Networked Architecture
Database 3
Program 1 Program 2 Program 3
Database
Database
Database 2
Site 1
Site 2
Program 3
Database 2
Site 1
Site 2
Database
Communications Network
Communications Network
Site 4
Site 3
Database
Site 4
Site 3
Database
Benefits of DDBs
Transparency Reliability Performance Expansion
Transparency
Separation of high-level semantics from lowlevel implementation issues Extension of the Data Independence concept in centralized databases Basic Concepts
Sydney Perth
Fragmentation Replication
PerthData
A Distributed Application
11
12
Network Transparency
Protect the user from the operational details of the network. Users do not have to specify where the data is located
Location Transparency Naming Transparency
Layers of Transparency
13
14
Replication Transparency
Replicas (copies of data) are created for performance and reliability reasons Users should be made unaware of the existence of these copies Replication causes update problems
Fragmentation Transparency
Fragments are parts of a relation
Vertical Fragments: Subset of Columns Horizontal Fragment: Subset of tuples
Fragments are also created for performance and reliability reasons Users should be made unaware of the existence of fragments
15
16
Reliability
Replicated components (data & software) eliminate single points of failure Failure of a single site or communication link will not bring down entire system Managing the reachable data requires Distributed Transaction Support which provides correctness in the presence of failures Data Localization
Performance
Reduces contention for CPU and I/O services Reduces communication overhead
Parallelism
Inter-query & Intra-query parallelism The Multiplex approach
17
18
Expansion
Expansion by adding processing and storage power to the network Replacing a Mainframe vs. Adding more micro-computers to the network
19
Problem Areas
Database Design Query Processing Directory Management Concurrency Control Deadlock Management Reliability issues Operating System Support Heterogeneous Databases
20
Go to 28
CS544 Module 1 Shazia Sadiq ITEE/UQ CS544 Module 1 Shazia Sadiq ITEE/UQ
Database Design
How to place database and applications Fragmentation or Replication
Disjoint Fragments Full Replication Partial Replication
Query Processing
Determining Factors
Distribution of Data Communication Costs Lack of locally available data
Objective
Maximum utilisation of inherent parallelism
Design Issues
Fragmentation Allocation
21
22
Directory Management
Contains information on data descriptions plus locations Problems similar to database design
global or local centralised or distributed single copy or multiple copies
Concurrency Control
Integrity of the database as in centralised plus consistency of multiple copies of the database - Mutual Consistency Approaches
Pessimistic: Synchronising before Optimistic: Checking after
Strategies
Locking Time stamping
23
24
Deadlock Management
Similar solutions as for centralised
Prevention Avoidance Detection Recovery
Reliability issues
Reliability is an advantage, but does not come automatically On failure of a site
Database at operational sites must remain consistent and up to date Database at failed site must recover after failure
25
26
Heterogeneous Databases
Databases differ in
Data Model Data Language
27
28
Architectural Models
Architecture defines structure
Components Functionality of Components Interactions and Interrelationships
External View
Classification
1. Autonomy 2. Distribution 3. Heterogeneity
Distribution
...
External View
Conceptual Schema
Heterogeneity
CS544 Module 1 Shazia Sadiq ITEE/UQ
29
30
Autonomy
Distribution of control (not data)
Design Communication Execution
Distribution
Distribution of data Physical distribution over multiple sites Alternatives
Non-distributed / Centralised Client Server Peer-to-peer / Fully distributed
Alternatives
Tight integration Semi-autonomous Total isolation / Fully Autonomous
CS544 Module 1 Shazia Sadiq ITEE/UQ
31
32
Heterogeneity
Different database systems Major differences
Data models (Relational, Hierarchical) Data languages (SQL, QBE) Transaction Management Protocols
Autonomy (A)
Architectural Alternatives
A0: Tight integration A1: Semi-autonomous A2: Total isolation
Distribution
Distribution (D)
D0: Non-distributed D1: Client Server D2: Peer -to-peer
Autonomy
Heterogeneity (H)
H0: Homogeneous H1: Heterogeneous
Heterogeneity
3*3*2 = 18 Alternatives
Some alternatives are meaningless or not practical!
33
34
Architectural Alternatives
(A0, D0, H0)
A collection of logically integrated DBMSs on the same site, also called Composite Systems
Architectural Alternatives
(A0, D1, H0)
Client Server distribution
35
36
Architectural Alternatives
(A1, D0, H1)
Heterogeneous Federated DBMSs.
Architectural Alternatives
(A2, D0, H1)
Heterogeneous Multi-databases. Similar to (A1, D0, H1), but with full autonomy
37
38
Typical Scenario
1.Client parses a query, decomposes into independent site queries, and sends to appropriate server 2.Each server processes local query and sends the result relation to client 3.Client combines the results of sub -queries
Client
Response Request
Server
39
40
Peer-to-peer Architecture
Extension of the ANSI/SPARC Architecture to include Global Conceptual Schema (GCS), where GCS is the union of all LCSs
...
Multi-database Architecture
GCS exists as a union of some LCSs only
GES1 GES2 GCS
...
GESn
LES11 LES12
... LES1 m
LESn1 LESn2
... LESn m
... ...
ES1 ES2
LCSn LISn
ESn
ESn
LCS2 LIS2
LCS1 LIS1
LCS2 LIS2
LCSn LISn
41
42
Readings
Lecture Notes Oszu & Valduriez Selected sections of chapters 5, 7 & 12
The lecture notes for these topics contain many more slides than would actually be used for this lecture. These slides have been included for the benefit of the diverse student group
CS544 Module 1 Shazia Sadiq ITEE/UQ 43 CS544 Module 1 Shazia Sadiq ITEE/UQ 44
Framework of Distribution
Sharing
Data + Programs
Data
Complete Knowledge
Knowledge
Access Pattern
Static
Partial Knowledge
In all cases (except no sharing) the distributed environment has new problems, not relevant in the centralized setup
CS544 Module 1 Shazia Sadiq ITEE/UQ 45 CS544 Module 1 Shazia Sadiq ITEE/UQ 46
Design Strategies
Bottom Up Strategy
ES1
ES2 GCS
...
ESn
ES1
ES2
ESn
LCS1
LCS2 LIS2
LCSn LISn
... ...
LCSn LISn
LIS1
LIS1
Distributed Databases
CS544 Module 1 Shazia Sadiq ITEE/UQ
Multi-Databases
48
Bottom Up Design
Integration of pre-existing local schemas into a global conceptual schema Local systems will most likely be heterogeneous Problems of integration and interoperability
GCS LCS
GCS
Access Information
External Schema
User Input
Distribution Design
Feedback
Why Fragment ?
Application views are subset of relations - a fragment is a subset of a relation If there is no fragmentation
Entire table is at one site - high volume of remote data accesses Entire table is replicated - problems in executing updates
Allocation
Partitioned Partially Replicated Fully Replicated
Drawbacks of Fragmentation
Application views do not require mutually exclusive fragments - queries on multiple fragments will suffer performance degradation (costly joins and unions) Semantic integrity control - checking for dependency of attributes that belong to several fragments may require access to several sites
Horizontal
Types of Fragmentation
Vertical
Hybrid
53
54
Horizontal Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980
Vertical Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980
PERSON-FRAGMENT1
NAME John Smith Alex Brown ADDRESS 44, Here St 1, High St PHONE 3456 7890 3678 1234 SAL 34000 48000
PERSON
NAME John Smith Alex Brown Harry Potter PHONE 9976 4321 8765 1237 9851 1238 SAL 98000 65800 23980 55 Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings SAL 34000 48000 98000 65800 23980
PERSON-FRAGMENT2
NAME Harry Potter Jane Morgan Peter Jennings CS544 Module 1 Shazia Sadiq ITEE/UQ ADDRESS 99, Magic St 87, Riverview 65, Flag Rd
Hybrid Fragmentation
PERSON
NAME John Smith Alex Brown Harry Potter Jane Morgan Peter Jennings ADDRESS 44, Here St 1, High St 99, Magic St 87, Riverview 65, Flag Rd PHONE 3456 7890 3678 1234 9976 4321 8765 1237 9851 1238 SAL 34000 48000 98000 65800 23980
Degree of Fragmentation
Individual Tuples Individual Columns
PERSON
NAME John Smith Alex Brown ADDRESS 44, Here St 1, High St PHONE 3456 7890 3678 1234
PERSON
NAME Harry Potter Jane Morgan Peter Jennings SAL 98000 65800 23980
Hybrid Fragmentation
No Fragmentation
58
57
Correctness of Fragmentation
Rules ensure that the database does not undergo semantic change during fragmentation Completeness
Each data item found in a relation R will be found in one or more of Rs fragments R1 , R2 , Rn
Reconstruction
The dependency constraints on the data can be preserved by ensur ing that every relation R can be reconstructed from its fragments using the operator ,such that R = Ri, RiF R
We propose the following design; R11=ABC R21=DEF R31=FGH Site 1 R12=ABD R22=DG R32=FGIJ Site2
Disjointness
A relation R is decomposed into disjoint horizontal fragments, such that if a data item di Rj, then di Rk where j k. In vertical fragments, disjointness does not apply to primary key attributes
59
60
E F H 2 3 0 4 3 0
62
r31 (F G H) 3 1 0 3 2 1
E 2 4 2 4
F H 3 0 3 0 3 1 3 1
63
FGH
64
Versions of HFs
Primary Horizontal Fragmentation
Selection operation on the owner relation of a database schema minterm predicate SELECT * FROM owner WHERE m i
DEPT
DEPT CSEE CHEMISTRY ARCHIT
EMP
NAME John Smith Jane Morgan Peter Jennings DEPT CSEE CSEE CSEE SAL 34000 65800 23980
66
Allocation
3
Update Problems
Heuristic Approaches
Grouping: Bottom Up Splitting: Top Down
Allocation Problem
Find the optimal distribution of F = {F1, F2, Fn} to S = {S1, S2, Sm }, on which a set of applications Q = {q 1, q 2, q q} are running
Optimality can be defined through: Minimal Cost Performance
Cost of storing and querying a fragment at a site Cost of updating a fragment at all sites Cost of data communication
CS544 Module 1 Shazia Sadiq ITEE/UQ
69
70
Information Requirements
Fragmentation
Database Information Application Information
Employees
Member/target
Allocation
Communication Network Information Computer System Information
Application Information
Predicates in user queries
Simple: p1 : DEPT = CSEE, p2 : SAL > 30000 Conjunctive: minterm predicate: m1= p 1 p2 DEPT=CSEE SAL>30000
71
Minterm selectivity: Number of tuples returned against a given minterm Access Frequency: Access frequency of user queries and/or minterms CS544 Module 1
Shazia Sadiq ITEE/UQ
72
Usage/Affinity Matrix used in Vertical Partitioning Algorithms - Clustering Algorithm - Partitioning Algorithm
73
Network Information
Channel capacities, distances, protocol overhead Communication cost per frame between sites Si and Sj
CS544 Module 1 Shazia Sadiq ITEE/UQ 74
Distributed DBMS
Components of Distributed DBMS Query Processing and Optimization Transaction Management Concurrency Control Reliability Issues
Site Information
Knowledge of storage and processing capacity The unit cost of storing data at Sk Cost of processing one unit of work at Sk
CS544 Module 1 Shazia Sadiq ITEE/UQ 75 CS544 Module 1 Shazia Sadiq ITEE/UQ 76
DDBMS Components
User
DATA PROCESSOR
External Schema Global Conceptual Schema User Interface Handler Local Query Processor Semantic Data Controller Global Query Optimizer Distributed Execution Monitor Runtime Support Processor Local Recovery Manager Local Conceptual Schema
System Log
GD/D
USER PROCESSOR
77
78
Algebraic Transformations
1. Ename ( RESP = Manager EMP.ENO = ASG.ENO (EMP X ASG)) 2. Ename (EMP
ENO ( RESP = Manager
(ASG)))
Centralized
Query 2 is better since it avoids the costly Cartesian Product
Distributed
Relational Algebra is not sufficient to express execution strategies
Operations for exchanging data Selection of sites to process
EMP and ASG are fragmented as EMP1 = ENO E3 (EMP) EMP2 = ENO < E3 (EMP) ASG1 = ENO E3 (ASG) ASG2 = ENO < E3 (ASG) ASG1, ASG2, EMP1, EMP2 are stored at sites 1, 2, 3, and 4. The result is expected at site 5 !
81 CS544 Module 1 Shazia Sadiq ITEE/UQ 82
Equivalent Strategies
Site 5
Site 3
Site 4
(ASG1)
EMP`2
ENO
EMP`1 = EMP1
EMP`2 = EMP2
(ASG2)
Site 1
ASG1 =
ASG`1
ASG`2
CONTROL SITE
Site 2
RESP = Manager
FRAGMENT SCHEMA
ASG1
ASG2 =
RESP = Manager
ASG2
Site 5
STATS ON FRAGMENTS
U ASG2)
ASG1
ASG2
EMP1
EMP2
LOCAL SITES
LOCAL SCHEMA
Site 1
Site 2
Site 3
Site 4
Go to 97
84
Query Decomposition
The same techniques as in centralized since information on data distribution is not used Step 1: Normalization Step 2: Analysis Step 3: Elimination of Redundancy Step 4: Rewriting
Step 1: Normalization
Transform arbitrary complex queries into a normal form Two Normal Forms
Conjunctive ( P11 P12 P1n) ... (P m1 P m1 P mn ) Disjunctive ( P 11 P 12 P 1n) ... (P m1 P m1 P mn)
TITLE = Manager AND SAL > 30000 AND PROJ = 12 OR PROJ = 34 (TITLE = Manager SAL > 30000 PROJ = 12) (TITLE = Manager SAL > 30000 PROJ = 34)
CS544 Module 1 Shazia Sadiq ITEE/UQ CS544 Module 1 Shazia Sadiq ITEE/UQ
85
86
Step 2: Analysis
Determine if further processing of the normalized query is possible.
Type incorrectness
If attributes or relation names are not defined in the global schema If operations are being applied to attributes of the wrong type SELECT FROM WHERE AND ENAME, DUR EMP, ASG EMP.ENO = ASG.ENO RESP = Manager
Query Graph
Nodes: Result, Operands Edges: Join, Projection
Disconnected graph indicates incorrectness
RESP = Manager
Semantic incorrectness
If components do not contribute to the generation of the result Query graphs - disconnected graphs show incorrectness
ASG
EMP.ENO = ASG.ENO
DUR
Example redundancy
RESULT
EMP
CS544 Module 1 Shazia Sadiq ITEE/UQ
ENAME
Step 4: Rewriting
Step 4.1: Transformation
Transform relational calculus query into relational algebra using Operator Tree By applying Transformation Rulesmany different but equivalent trees may be found
Ename
Data Localization
Localize the query using data distribution information Algebraic query on global relations Algebraic query on physical fragments Two Step Process
Localization Reduction
ENO
RESP = Manager
EMP ASG
Operator Tree
SELECT FROM WHERE AND ENAME EMP, ASG EMP.ENO = ASG.ENO RESP = Manager
89
90
Data Localization
Step 1: Map distributed query to fragment query using Localization program
EMP1 = SAL > 30000 (EMP) and EMP2 = SAL 30000 (EMP) then localization program for EMP = EMP1 EMP2
Query Optimization
Query optimization has been independent of fragment characteristics such as cardinality and replication Main objectives of the optimizer (NP-hard !)
Find a strategy close to optimal Avoid Bad strategies
Step 2: Simplify and restructure the fragment query using Reduction techniques
Basic Idea is to determine the intermediate results in the query that produce empty (horizontal) or useless (vertical) relations, and remove the subtrees that produce them
CS544 Module 1 Shazia Sadiq ITEE/UQ 91
Search Strategy
Explores the search space and selects the best plan using the cost model Defines how the plans are examined
Local Optimization
INGRES Algorithm System R Algorithm
LOCAL OPTIMIZATION
LOCAL SCHEMA
93
94
Response Time
TCPU * seq-#insts + TI/O * seq-#I/Os + TMSG * seq-#msgs + TTR * seq-#bytes
Database Statistics
Estimate the size of the intermediate relations Based on statistical information: Length of attributes, distinct values, number of tuples in the fragment ...
CS544 Module 1 Shazia Sadiq ITEE/UQ 95
Examples
card ( (R)) = card (R) card (R X S) = card (R) * card (S)
96
Semijoins
The dominant factor in the time required to process a query is the time spent on data transfer between sites ( not the computation or retrieval from secondary storage) Example suppose that table r is stored at site 1, and s at 2 to evaluate ( r join A=a(s)) at site 1
q r 1 2 s
Semijoin
Definition: let r(R) and s(S) be two relations. The semijoin of r with s denoted is the relation R (r s), that is, r s is the portion of r that joins with s. Some important transformations; R(r s)= R(r) R (s)=r R (s)
1) site 1 asks site 2 to transmit s and then computes the selection itself 2) site 1 transmits the selection condition to site 2 and site 2 sends selected records to site 1 ( presumably much smaller then s)
CS544 Module 1 Shazia Sadiq ITEE/UQ 97 CS544 Module 1 Shazia Sadiq ITEE/UQ 98
Semijoin Example
(1) We can transfer s to site 1 and compute join at site 1 ( 24 values) (2) We compute r= B(r) at site 1, send r to site 2, compute s =(r join s) and send s to site 1 (15 values) (3) We compute (r s) as (r s)
r ( A B )@S1 s ( B 1 4 4 1 5 4 1 6 7 2 4 10 2 6 10 3 7 11 3 8 11 3 9 12 r ( B ) 4 5 6 7 8 9 C 13 14 13 14 15 14 15 15 D )@S2 16 16 17 16 17 16 18 16
s ( B C D ) 4 13 16 4 14 16 7 13 17
99
100
Transaction States
endtransaction
Buffers
Buffer
Disk Block containing X
Active
Partially Committed
commit
Committed
Database
Interleaving
T1
read -item (X); X:=X-N; write-item (X); read -item (Y); Time Y:=Y+N; write-item (Y);
T2
read -item (X); X:=X+M; write-item (X);
abort
abort
Failed
Terminated
STATE
101
102
Desirable properties
ACID properties - enforced by concurrency control and recovery methods of the DBMS
Atomicity Consistency Isolation Durability
Distributed Transactions
Database Consistency
No semantic integrity constraints are violated
Transaction Consistency
No incorrect updates during concurrent database accesses
Mutual Consistency
No difference in the values of a data item in all the replicas to which it belongs
103
104
Example
Consider the following example:
Example
A manager wants to query all the stores, find the inventory of toothbrushes and have them moved from store to store in order to balance the inventory
This transaction will have a local component* Ti at the ith store, and a global component T0 at the central office: 1. Component T0 is created at the central office 2. T0 sends message to all stores instructing them to create Tis 3. Each Ti executes query to determine number of toothbrushes and returns result to T0 4. T0 takes these numbers, determines (by some algorithm) the shipmen ts required, and sends instructions to all affected stores. For example Store 10 to ship 500 toothbrushes to store 7 5. Stores receiving these instructions update their inventory and perform the shipment
* A distributed transaction will consist of communicating transaction components, each at a different site CS544 Module 1 Shazia Sadiq ITEE/UQ 106
A chain of department stores has many individual stores. Each store has a database of sales and inventory at that store. There may also be a central office database with data about employees, chain-wide inventory, credit card customers, information about suppliers such as outstanding orders etc. There is a copy of all the stores sales data in a data warehouse, which is used to analyse and predict sales
Source: [MUW2002]
105
Example
What can go wrong?
A bug in the algorithm causes store 10 to ship more toothbrushes than available in its inventory causing T10 to abort. However, T7 detects no problem and hence updates its inventory in anticipation of the incoming shipment and commits
Atomicity of T is compromised ( T10 never completed) Database is left in an inconsistent state
Problems
Managing the commit/abort decision in distributed transactions? (One component wants to abort while others want to commit)
Two phase commit
After returning the result (step 3) of its inventory count to T0, the machine at store 10 crashes. The instructions to ship are not received from T0.
What should T10 do when its site recovers?
107 CS544 Module 1 Shazia Sadiq ITEE/UQ
108
110
Classification of Transactions
Distribution
Transactions in Centralized DBMSs Transactions in Distributed DBMSs
Begin-transaction
Basic housekeeping by TM
Read
If data item is local, then TM retrieves, else selects one copy With other SCs and data processors
Duration
Short-life transactions Long-life / long duration transactions
Write
TM coordinates the updating of the data item value at all sites where it resides
Commit
TM coordinates the physical updating of all databases that contain a copy of each data item for which the write was issued
Processing
On -line / interactive transactions Batch transactions
Scheduler (SC)
Abort
TM ensures that no effect of the transaction is reflected in the database
Grouping of Operations
Flat transactions Nested transactions
to data processors
CS544 Module 1 Shazia Sadiq ITEE/UQ
Go to 124
111 CS544 Module 1 Shazia Sadiq ITEE/UQ 112
Nested Transactions
A set of sub-transactions that may recursively contain other sub-transactions
Begin-transaction Reservation Begin-transaction Airline End. {Airline} Begin-transaction Hotel End. {Hotel} Begin-transaction Car End. {Car} End
113
114
Advantages of Nested TM
Higher level of concurrency
Objects can be released after sub-transaction
A collection of actions A graph whose nodes are either actions, or one of {Abort, Complete } called the terminal nodes An indication of the start called the start node
Workflow tasks (sub-transactions) are allowed to commit individually (open nesting semantics), permitting partial results to be visible outside the workflow Various concepts introduced to overcome problem of dealing with sub-transaction commit
Compensating tasks Critical tasks Contingency tasks
CS544 Module 1 Shazia Sadiq ITEE/UQ 117
Successful path: A0, A1, A2, A3 Unsuccessful paths: A0, A1, A4 A0, A1, A2, A5
2. The overall transaction which can be any path from the start node to one of the terminal nodes
Uses the concept of Compensating transactions A Compensating transaction rolls back the effect of a committed action in a way that does not depend on what happened to the database between the time of the action and the time of the compensating transaction If A is any action, A -1 is its compensating transaction, and is any sequence of legal actions and compensating transactions, then A A-1
119
120
Scheduler (SC)
TM is responsible for assigning a timestamp to each new transaction and attaching it to each database operation passed to the scheduler SC is responsible for keeping track of read write timestamps and performing serializability checks
to data processors
CS544 Module 1 Shazia Sadiq ITEE/UQ 122
Phase 1: Growing
Phase 2: Shrinking
to data processors
Differences TM sends request to lock managers of ALL participating sites Operations are passed to data processors by local as well as REMOTE participating lock managers
123
124
Distributed Locking
Main difference from Centralized Locking
Centralized 2PL
One site is designated as the lock site or primary site The primary site is delegated the responsibility for lock management Transaction managers at participating sites deal with the primary site lock manager rather than their own
DP at Participating Sites Coordinating TM Primary Site LM
2 3 4 5
126
Distributed 2PL
Lock managers at each site If no replication, Distributed 2PL is the same as Primary copy 2PL When there is replication, a transaction must acquire global lock (No copy of a data element is primary) When can a transaction assume it has a (shared or exclusive) global lock on a data element?
128
Deadlocks - Revisited
Locking-based concurrency control may result in deadlocks!
Deadlock: Each transaction in a set (of >2 transactions) is waiting for an item which has locked another transaction in the set
Partial Schedule S
T1
read-lock (Y); read-item (Y);
T2
read-lock (X); read-item (X);
T1
T2
T2 T1 T2
SITE B GLOBAL
T2
T1
Hierarchical Timeout
131 CS544 Module 1 Shazia Sadiq ITEE/UQ 132
DD11
DD14
Site 1
DD21
Site 2
DD22
Site 3
DD23
Site 4
DD24
Timeout
CS544 Module 1 Shazia Sadiq ITEE/UQ 133 CS544 Module 1 Shazia Sadiq ITEE/UQ 134
135
136
Centralized Recovery
Main Techniques
Deferred Update Immediate Update
Stable Log
Read, Write
Concepts
Disk Caching System Log Checkpointing
Stable Database
Read, Write
Secondary Storage
CS544 Module 1 Shazia Sadiq ITEE/UQ 137 CS544 Module 1 Shazia Sadiq ITEE/UQ
Main Memory
138
Commit Protocols
1 Phase Commit (1PC)
Ready: All operations of the transaction completed except commit
Initial execute Ready commit Committed
abort
Recovery Protocols
The procedure a failed site has to go through to recover its sta te after restart.
Independent Recovery Protocols: Protocols that determine how to terminate a transaction that was executing at a failed site without having to consult any other site.
Failed
Termination Protocols
The procedure that the operational sites go through when a participating site fails.
Non-blocking Termination Protocols: Protocols that determine how to allow the transaction at the operational site to terminate, without waiting for the failed site to recover. CS544 Module 1
Shazia Sadiq ITEE/UQ
Global abort: If one participant votes to abort Global Commit: If all participants vote to commit
CS544 Module 1 Shazia Sadiq ITEE/UQ
abort
Failed
139
140
prepare
vote
global commit
commit
Phase 1
CS544 Module 1 Shazia Sadiq ITEE/UQ
Phase 2
141 CS544 Module 1 Shazia Sadiq ITEE/UQ 142
Comments on 2PC
Two rounds of communication: first, voting; then, termination. Both initiated by coordinator. Any site can decide to abort an Xact . Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log. All commit protocol log recs for an Xact contain Xactid and Coordinatorid. The coordinators abort/commit record also includes ids of all subordinates.
CS544 Module 1 Shazia Sadiq ITEE/UQ 144
prepare
vote
Phase 1
CS544 Module 1 Shazia Sadiq ITEE/UQ
No Phase 2
143
Blocking
If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers. T is blocked. Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no.
If we have a prepare log rec for Xact T, but not commit/abort, this site is a subordinate for T.
Repeatedly contact the coordinator to find status of T, then write commit/abort log rec; redo/undo T; and write end log rec.
If we dont have even a prepare log rec for T, unilaterally abort and undo T.
This site may be coordinator! If so, subs may send msgs.
CS544 Module 1 Shazia Sadiq ITEE/UQ 145
146
147
148
Commit Protocols
3 Phase Commit Protocol avoids the possibility of blocking in a restricted case of possible failures: no network partition can occur, at any point at least one site must be up, at any point at most K participating sites can fail, Interesting but optional for this course
Distributed Recovery
Two new issues: New kinds of failure, e.g., links and remote sites. If sub-transactions of an Xact execute at different sites, all or none must commit. Need a commit protocolto achieve this. A log is maintained at each site, as in a centralized DBMS, and commit protocol actions are additionally logged.
149
150
Next Lecture
151