Professional Documents
Culture Documents
Systems
Bab 11
Crash Recovery
(Chap. 18 – Ramakrishnan)
Bab 11 - 1
DBMS – Arif Djunaidy – FTIF ITS
Review: The ACID properties
Bab 11 - 4
DBMS – Arif Djunaidy – FTIF ITS
Handling the Buffer Pool
Bab 11 - 5
DBMS – Arif Djunaidy – FTIF ITS
More on Steal and Force
• STEAL (why enforcing Atomicity is hard)
– To steal frame F: Current page in F (say P) is written
to disk; some Xact holds lock on P.
o What if the Xact with the lock on P aborts?
o Must remember the old value of P at steal time (to support
UNDOing the write to page P).
• NO FORCE (why enforcing Durability is hard)
– What if system crashes before a modified page is
written to disk?
– Write as little as possible, in a convenient place, at
commit time, to support REDOing modifications.
Bab 11 - 6
DBMS – Arif Djunaidy – FTIF ITS
Basic Idea: Logging
Bab 11 - 7
DBMS – Arif Djunaidy – FTIF ITS
Write-Ahead Logging (WAL)
• The Write-Ahead Logging Protocol:
Must force the log record for an update (to a stable storage)
before the corresponding data page gets to disk.
guarantee Atomicity
Must write all log records for a Xact before commit
guarantee Durability
Bab 11 - 8
DBMS – Arif Djunaidy – FTIF ITS
DB RAM
WAL & the Log LSNs pageLSNs flushedLSN
Bab 11 - 9
DBMS – Arif Djunaidy – FTIF ITS
Log Records
Possible log record types:
LogRecord fields: • Update
prevLSN • Commit
XID • Abort
type • End (signifies end of
pageID commit or abort)
update length
• Compensation Log
records offset
Records (CLRs)
only before-image
– for UNDO actions
after-image
Bab 11 - 10
DBMS – Arif Djunaidy – FTIF ITS
Other Log-Related State
• Transaction Table:
– One entry per active Xact.
– Contains XID, status
(running/commited/aborted), and lastLSN.
• Dirty Page Table:
– One entry per dirty page in buffer pool.
– Contains recLSN -- the LSN of the log record
which first caused the page to be dirty.
Bab 11 - 11
DBMS – Arif Djunaidy – FTIF ITS
An Example of Log &
Transactions Table
Bab 11 - 12
DBMS – Arif Djunaidy – FTIF ITS
Normal Execution of an Xact
Bab 11 - 13
DBMS – Arif Djunaidy – FTIF ITS
Checkpointing
• Periodically, the DBMS creates a checkpoint, in order to minimize the time
taken to recover in the event of a system crash. Write to log:
o begin_checkpoint record: Indicates when chkpt began.
o end_checkpoint record: Contains current Xact table and dirty page table. This is a
`fuzzy checkpoint’:
- Other Xacts continue to run; so these tables accurate only as of the time of the begin_checkpoint
record
- No attempt to force dirty pages to disk; effectiveness of checkpoint limited by oldest unwritten
change to a dirty page. (So it’s a good idea to periodically flush dirty pages to disk!)
o Store LSN of chkpt record in a safe place (master record).
Bab 11 - 14
DBMS – Arif Djunaidy – FTIF ITS
The Big Picture: What’s Stored
Where
LOG RAM
DB
LogRecords
prevLSN Xact Table
XID Data pages lastLSN
type each status
pageID with a
pageLSN Dirty Page Table
length
offset recLSN
before-image master record
after-image flushedLSN
Bab 11 - 15
DBMS – Arif Djunaidy – FTIF ITS
Simple Transaction Abort
Bab 11 - 16
DBMS – Arif Djunaidy – FTIF ITS
Abort, cont.
Bab 11 - 17
DBMS – Arif Djunaidy – FTIF ITS
Transaction Commit
Bab 11 - 18
DBMS – Arif Djunaidy – FTIF ITS
Crash Recovery: Big Picture
Oldest log
rec. of Xact
active at
• Start from a checkpoint
crash (found via master record).
Smallest
recLSN in
• Three phases. Need to:
dirty page – Figure out which Xacts
table after committed since checkpoint,
Analysis
which failed (ANALYSIS).
– REDO all actions.
Last chkpt o (repeat history)
Bab 11 - 20
DBMS – Arif Djunaidy – FTIF ITS
Recovery: The REDO Phase
• We repeat History to reconstruct state at crash:
– Reapply all updates (even of aborted Xacts!), redo CLRs.
• Scan forward from log rec containing smallest recLSN in
D.P.T. For each CLR or update log rec LSN, REDO the
action unless:
– Affected page is not in the Dirty Page Table, or
– Affected page is in D.P.T., but has recLSN > LSN, or
– pageLSN (in DB) ≥ LSN.
• To REDO an action:
– Reapply logged action.
– Set pageLSN to LSN. No additional logging!
Bab 11 - 21
DBMS – Arif Djunaidy – FTIF ITS
Recovery: The UNDO Phase
ToUndo={ l | l a lastLSN of a “loser” Xact}
Repeat:
– Choose largest LSN among ToUndo.
– If this LSN is a CLR and undonextLSN==NULL
o Write an End record for this Xact.
Bab 11 - 22
DBMS – Arif Djunaidy – FTIF ITS
Example of Recovery
LSN LOG
RAM 00 begin_checkpoint
05 end_checkpoint
Xact Table 10 update: T1 writes P5 prevLSNs
lastLSN 20 update T2 writes P3
status
30 T1 abort
Dirty Page Table
recLSN 40 CLR: Undo T1 LSN 10
flushedLSN 45 T1 End
50 update: T3 writes P1
ToUndo 60 update: T2 writes P5
CRASH, RESTART
Bab 11 - 23
DBMS – Arif Djunaidy – FTIF ITS
Example: Crash During
Restart!
LSN LOG
00,05begin_checkpoint, end_checkpoint
RAM 10update: T1 writes P5
20update T2 writes P3
Xact Table 30T1 abort
lastLSN
40,45CLR: Undo T1 LSN 10, T1 End
status
Dirty Page Table 50update: T3 writes P1
recLSN 60update: T2 writes P5 undonextLSN
flushedLSN
CRASH, RESTART
70CLR: Undo T2 LSN 60
ToUndo
80,85CLR: Undo T3 LSN 50, T3 end
CRASH, RESTART
90,95CLR: Undo T2 LSN 20, T2 end
Bab 11 - 24
DBMS – Arif Djunaidy – FTIF ITS
Additional Crash Issues
• What happens if system crashes during
Analysis? During REDO?
• How do you limit the amount of work in
REDO?
– Flush asynchronously in the background.
– Watch “hot spots”!
• How do you limit the amount of work in
UNDO?
– Avoid long-running Xacts.
Bab 11 - 25
DBMS – Arif Djunaidy – FTIF ITS
Summary of
Logging/Recovery
• Recovery Manager guarantees Atomicity
& Durability.
• Use WAL to allow STEAL/NO-FORCE w/o
sacrificing correctness.
• LSNs identify log records; linked into
backwards chains per transaction (via
prevLSN).
• pageLSN allows comparison of data page
and log records.
Bab 11 - 26
DBMS – Arif Djunaidy – FTIF ITS
Summary, Cont.
Bab 11 - 27
DBMS – Arif Djunaidy – FTIF ITS