Professional Documents
Culture Documents
Chapter-II
Transaction Processing and Concurrency Control
Transaction Examples
A transaction is a user-defined concept. For example, making an airline reservation
may involve reservations for the departure and return. To the user, the combination
of the departure and the return is a transaction, not the departure and the return
separately. Most travelers do not want to depart without returning. The implication
for DBMSs is that a transaction is a user-defined set of database operations. A
transaction can involve any number of reads and writes to a database. To provide
the flexibility of user-defined transactions, DBMSs cannot restrict transactions to
only a specified number of reads and writes to a database.
An information system may have many different kinds of transactions. The
following table depicts transactions of an order entry system.
BEGIN TRANSACTION
Display greeting
Get account number, pin, type, and amount
SELECT account number, type, and balance
If balance is sufficient then
UPDATE account by posting debit
UPDATE account by posting credit
INSERT history record
Display final message and issue cash
Else
Write error message
End If
On Error: ROLLBACK
COMMIT
Transaction Properties
DBMSs ensure that transactions obey certain properties. The most important
properties are ACID properties (Atomic, Consistent, Isolated and Durable).
Atomic means that a transaction can’t be subdivided. Either all the work in the
transaction is completed or nothing is done. For example, the ATM transaction will
not debit an account without also crediting a corresponding account.
Consistent means that if applicable constraints are true before the transaction starts, the
constraints will be true after the transaction terminates. For example, if a users
account is balanced before a transaction, then the account is balanced after the
transaction. Otherwise, the transaction is rejected and no changes take effect.
Isolated means that changes resulting from a transaction are not revealed to other users
until the transaction terminates. For example, your significant other will not know
that you are withdrawing money until your ATM transaction completes.
Durable means that any changes resulting from a transaction are permanent. No failure
will erase any changes after a transaction terminates. For example, if a banks
computer experiences a failure seconds after your transaction completes the result
of your transaction are still recorded on the bank database.
To ensure that, transactions meet the ACID properties, DBMSs provide certain
services that are transparent to database developers and programmers. The DBMSs
Transaction State
In the absence of failure, all transactions complete successfully. However, as we noted
earlier, a transaction may not always complete its execution successfully. Such a
transaction is termed aborted. If we are to ensure the atomicity property, an aborted
transaction must have no effect on the state of the database. Thus, any changes that
the aborted transaction made to the database must be undone. Once the changes
caused by an aborted transaction have been undone, we say that the transaction has
been rolled back. It is the part of the recovery scheme to manage transaction aborts.
Once a transaction has committed, we cannot undo its effects by aborting it. The
only way to undo the effects of a committed transaction is to execute a
compensation transaction. For Ex: if a transaction added 100$ to an account, the
compensation transaction would subtract 100$ from the account. However, it is not
always to create such a compensation transaction. Therefore, the responsibility of
writing and executing a compensation transaction is left to the user and it is not
handled by the database system.
A transaction must be in one of the following states:
Active: The initial state, the transaction stays in this state while it is executing.
Partially Committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been restored to
its state prior to the start of the transaction.
Part. Committe
Committed
d
Active
Aborte
Failed d
Concurrency Control
Interference problems
There are three problems that can result because of simultaneous access to the
database:
Lost update
Uncommitted dependency
Incorrect summary
1. Lost Update
This is the most serious interference problem because changes to databases are
inadvertently lost. In a lost update, one users update overwrites another user update,
which is depicted by the time line as below:
Read SR(10) T1
T2 Read SR(10)
If SR > 0 then SR = SR-1 T3
T4 If SR > 0 then SR = SR-1
Write SR(9) T5
T6 Write SR(9)
The timeline shows two concurrent transactions trying to update the seats remaining (SR)
field of the same flight record. Assume that the value of SR is 10 before the
transaction begin. After time T2 both transactions have stored the value of 10 for
SR in local buffers. After time T4 both transactions have made changes to their
local copy of SR. however, each transaction changes the value to 9, unaware the
activity of the other transaction. After time T6, the value of SR on the database is 9.
But the value after finishing both transactions should be 8 not 9, one of the changes
has been lost. Some students become confused about the lost update problem
because of the actions performed on local copies of the data. The calculations at
times T3 and T4 are performed in memory buffers specific to each transaction.
Even though transaction A has changed the value of SR, transaction B performs the
calculation with its own local copy of SR having a value of 10. The write operation
performed by transaction A is not known to transaction B unless transaction B
reads the value again. A lost update involves two or more transactions trying to
change the same part of the database.
2. Uncommitted Dependency
It occurs when one transaction reads data written by another transaction before
other transaction commits. It is also known as dirty read because it is caused by one
transaction reading dirty data. In the figure, transaction A reads the SR fields,
changes its local copy of the SR field, and writes the new value back to the
database. Transaction B then reads the changed value. Before transaction A
commits, however an error is detected and transaction A issues a rollback. The
rollback could have been issued as a result of the user canceling the transaction or a
result of failure. The value used by B is wrong. The
Transaction A Time Transaction B
Read SR(10) T1
SR = SR-1 T2
Write SR(9) T3
T4 Read SR(9)
Rollback T5
Read SR1(10) T1
SR1 = SR1-1 T2
Write SR1(9) T3
T4 Read SR1(9)
T5 Sum=Sum+SR1
T6 Read SR2(5)
T7 Sum=Sum+SR2
Read SR2(5) T8
SR2=SR2-1 T9
Write SR2(4) T10
Locks
Locks provide a way to prevent other users from accessing part of the database
being used. Before accessing part of the database, a lock must be obtained. Other
users must wait if trying to obtain conflicting lock on the same part of the database.
Following table shows conflicts for two kinds of locks.
A shared lock (S) must be obtained before reading the database, while an exclusive locks
(X) must be obtained before writing. As shown in the above table, any number of
users can hold a shared lock on the same part of the database whereas only one user
can hold an exclusive lock.
The concurrency control manager is the part of the DBMS responsible for
managing locks. The concurrency control manager maintains a hidden table to
record locks held by various transactions. A lock record contains a transaction
identifier, a record identifier, a kind, and a count, as explained in the following
table
Table Index
Page
Record
Field
The entire database is the coarsest lock that can be held. If an exclusive lock is held on
the entire database, no other users can access the database until the lock is released.
On the other extreme, an individual field is the finest lock that can be held. Locks
also can be held on parts of the database not generally seen by users. For example,
locks can be held on indexes and pages (physical records).
Locking granularity is a trade-off between overhead and waiting. Holding locks at a
fine level decreases waiting among users but increases system overhead because
more locks must be obtained. Holding locks at a coarser level reduces the number
of locks but increases the amount of waiting. In some DBMSs, the concurrency
control manages to detect the pattern of usage and promotes locks if needed. For
example, the concurrency control manager initially can grant record locks to a
transaction in anticipation that only a few records will be locked. If the transaction
continues to request locks, the concurrency control component can promote the
record locks to a lock on the entire table.
Deadlocks
Using locks to prevent interference problems can lead to deadlocks. A deadlock is a
problem of mutual waiting. One transaction has a resource that another transaction
needs, and a second transaction holds a resource that the first transaction needs.
Figure depicts a deadlock among two transactions trying to reserve seats on a flight
involving more than one part.
XLock SR1 T1
T2 XLock SR2
XLock SR2 (wait) T3
T4 XLock SR1 (wait)
SR(9) Figure: Example
T5 deadlock ProblemSum=Sum+SR1
The first condition follows from the usage of locks as explained previously. The second
condition, if new locks are acquired after releasing locks, a group of transaction can
operate on different states of a data item, leading to lost update problems. The
second condition is usually simplified so that at least exclusive locks are held until
the end of the transactions. At the commit points, locks of a transaction are
released. The figure shows the 2PL with simplified second condition.
Locks Held
Growing Phase
Shrinking Phase
BOT EOT
Time
Optimistic Approaches
The use of locks and 2PL is a pessimistic approach to concurrency control. Locking
assumes that every transaction conflicts. Optimistic concurrency control
approaches assumes that conflicts are rare. If conflicts are rare it is more efficient to
check for conflicts rather than use of locks to force waiting. In this transactions are
permitted to access the database without acquiring locks. Instead, the concurrency
control manager checks whether a conflicts has occurred. The check can be
Recovery Management
Recovery management is a service to restore a database to a consistent state after a
failure. This section describes the kinds of failures to prevent, the tools of recovery
management, and the recovery processes that use the tools.
Data Storage Devices and Failure Types
From the perspective of database failures, volatility is an important characteristic of
data storage devices. Main memory is volatile because it loses its state if power is
lost. In contrast, a hard disk is nonvolatile because it retains its state if power is
lost. This distinction is important because DBMSs cannot depend on volatile
memory to recover data after failures. Even nonvolatile devices are not completely
reliable. For example, certain failures make the contents of a hard disk unreadable.
To achieve high reliability, DBMSs may replicate data on several kinds of
nonvolatile storage media such as hard disk, magnetic tape, and an optical disk.
Using a combination of non-volatile devices improves reliability because different
kinds of devices usually have independent failure rates.
Some failures affect main memory only, while others affect both volatile and non-
volatile memory. The following table shows four kinds of failures along with their
effect and frequency. The first two kinds of failures affect memory of one
executing transaction.
When writing code, one often checks for error conditions such as invalid account number
or cancellation of the transaction by the user. A program-detected failure usually
leads to aborting the transaction with the specified message to the user. The SQL
ROLL BACK statement is used to abort a transaction if an abnormal condition
occurs. Recall that the ROLL BACK statement causes all changes made by the
transaction to be removed from the database. Program-detected failures are usually
the most common and least harmful.
Abnormal termination has an effect similar to a program-detected failure but a
different cause. The transaction aborts, but the error message is unintelligible to the
Recovery Tools
The recovery manager uses redundancy and control of the timing of database writes
to restore a database after a failure. Three tools discussed in this section-transaction
log, check point, and database backup-are forms of redundancy. The last tool-force
writing-allows the recovery manager to control when database writes are recorded.
Transaction Log
It a like a shadow following a database. Any change to the database is also recorded
in the log. A typical log (following table) contains a transaction identifier, the
database action, the time, a row identifier, a column name, and values (odd and
new). The old and new values are sometimes called the before and after images,
respectively. If the database action is insert, the log only contains the new value.
Similarly, if the database action is delete, the log only contains the old value.
Besides insert, update, and delete actions, log records are created for the begin and
the end of the transaction. The log is usually stored as a hidden database table not
available to normal users.
The recovery manager can perform two operations on the log. In an undo operation,
the database reverts to a previous state by substituting the old value for whatever
value is stored in the database. In a redo operation, the recovery component
reestablishes a new state by substituting the new value for whatever value is stored
in the database. To undo (redo) a transaction, the undo (redo) operation is applied
to all log records of a specified transaction except for the begin and commit
records.
Force Writing
The ability to control when data are transferred to nonvolatile storage is known as
force writing. Without the ability to control the timing of write operations to
nonvolatile storage, recovery is not possible. Force writing means that the DBMS,
not the operating system, controls when data are written to nonvolatile storage.
Normally, when a program issues a write command, the operating system puts the
data in a buffer. For efficiency, the data are not written to disk until the buffer is
full. Typically, there is some small delay between the arrival of data in a buffer and
the transferring of the buffer to disk. With force writing, the operating system
allows the DBMS to transfer the data directly to disk without the intermediate use
of the buffer.
The recovery manager uses force writing at checkpoint time and the end of the
transaction. At checkpoint time, in addition to inserting a checkpoint record, all log
and sometimes all database buffers are force written to disk. This force writing can
add considerable overhead to the checkpoint process. At the end of the transaction,
the recovery manager force writes any log records of a transaction remaining in
memory.
Database Backup
A backup is a copy of all or part of a disk. The backup is used when the disk
containing the database or log is damaged. A backup is usually made on a magnetic
tape because it is less expensive and more reliable than disk. Periodically, a backup
should be made for both the database and the log. To save time, most backup
schedules include less frequent massive backups to copy the entire contents of a
disk and more frequent incremental backups to copy only the changed part.
Recovery Processes
Immediate Update
In the immediate update approach, database updates are written to the disk when
they occur. Database writes occur at checkpoint time and when buffers are full. However,
it is essential that database writes occur after writes of the corresponding log records.
This usage of the log is known as the write ahead log protocol. If log records were
written after corresponding database records, recovery would not be possible if a failure
occurred between the time of writing the database records and the log records.
Recovery from a local failure is easy because only a single transaction is affected.
All log records of the transaction are found. The undo operation is then applied to each
log record of the transaction. If a failure occurs during the recovery process, the undo
operation is applied again. The effect of applying the undo operator multiple times is the
same as applying undo one time. After completing the undo operations, the recovery
manager may offer the user the chance to restart the aborted transaction.
Recovery from a system failure is more difficult because all active users are
affected. To help you understand recovery from a system failure, the figure shows the
progress of a number of transactions with respect to the end of a transaction, most recent
checkpoint, and the failure.
Checkpoint Failure
Time
T1
T2
T3
T4
T5
To understand the amount of work necessary, remember that log records are stable at
checkpoint time and end of transaction and database changes are stable at
checkpoint time. Although other database writes occur when a buffer fills, the
timing of other writes is unpredictable. T1 transactions require no work because
both log and database changes are stable before the failure. T2 transactions must be
redone from the checkpoint because only database changes prior to the checkpoint
are stable. T3 transactions must be redone entirely because database changes are
not guaranteed to be stable even though some changes may be recorded on disk. T4
and T5 transactions must be undo entirely because some database changes after the
checkpoint may be recorded on disk.
After a system failure, the checkpoint table and the log are used to restore
transactions to a consistent state. Using the most recent checkpoint record, the
recovery manager locates the log record written at the time of the checkpoint.
Active transactions are classified, as specified in the above table. The recovery
manager applies the undo operator to all T4 and T5 transactions and the redo
operator to all T2 and T3 transactions. Finally, all T4 and T5 transactions are
restarted.
Deferred Update
In the deferred update approach, database updates are written to disk only after a
transaction commits. No database writes occur at checkpoint time except for already
committed transactions. The advantage of the deferred update approach is that undo
operations are not necessary. However, it may be necessary to perform more redo
operations than in the immediate update approach.
T4 and T5 transactions (not yet committed) do not require undo operations because no
database changes are written to disk until after a transaction commits. T2 and T3
transactions (committed after the checkpoint) require redo operations because it is
not known whether all database changes are stable. T2 transactions (started before
the checkpoint) must be redone from their first log record rather than just from the
checkpoint as in the immediate approach. Thus, the deferred approach requires
more restart work for T2 transactions than does the immediate update approach.
However, the deferred update approach requires no restart work for T4 and T5
transactions, while the immediate update approach must undo T4 and T5
transactions.
TP Monitor
Short for Transaction Processing monitor, a program that monitors a transaction
as it passes from one stage in a process to another. The TP monitor's purpose is to ensure
that the transaction processes completely or, if an error occurs, to take appropriate
actions. TP monitors are especially important in three-tier architectures that employ load
balancing because a transaction may be forwarded to any of several servers. In fact, many
TP monitors handle the entire load balancing operations, forwarding transactions to
different servers based on their availability.
Clients are bound, serviced, and released using stateless servers that minimize overhead.
The database sees only the controlled set of processing routines as clients. TP monitor
technology maps numerous client requests through application services routines to
improve system performance. The TP monitor technology (located as a server) can also
take the application transitions logic from the client. This reduces the number of upgrades
required by these client platforms. In addition, TP monitor technology includes numerous
management features, such as restarting failed processes, dynamic load balancing, and
enforcing consistency of distributed data. It is easily scalable by adding more servers to
meet growing numbers of users.