You are on page 1of 6

Q. no:-60. What are the various classification of fault.

Also explain the characteristics of


fault ?
Ans:- Cristian provides a useful classification of fault .A request to server
can change the state of resource and may produce a result for a client.
Omission failure :- A server omits to respond to a request.
Response failure :- A server respond incorrectly to request.
Value failure:- returns wrong value.
State transition failure :- Has wrong effect on resources ( for example
, set wrong values in data items ).
Timing failures :- Cristian’s classification also deals with timing failures.
These refer to any response that is not available to a client
Within specificed real-time interval.A timing failure can describe a response
that is either too late or too early .For example, an overloaded server may
provide responces that arrives too late. Process control applications requires
responces from services within a specified time. Servers that are used in
applications that require timely responces must be able to recover within the
time specified .they must not depend on protocols like the atomic
commitment protocol that can require an unbounded time to complete.
Server crash failure :- Cristian defines server crash failure as a repeated
Omission failure .Most servers failures cause a server to stop sending
Messages , so it will appear to its client to have stopped.clients can’t
Distinguish for a certain between a server failure, a server that responds
Very slowly and a breakdown of communication with the server . Timesout
Combined with retransmission of request messages are generally used to
Detect server failures- when a process does not reply to after a numbers of
Attempts to communicate with it, it is assumed to have failed.
Crash failure – repeated omission failure, a server repeatedly fails to
respond to a requests until it is restarted.
Amnesia crash – A server starts in its initial state ,having forgotten its state
at the time of crash.
Pause crash – A server restarts in a state before the crash.
Halting crash- server never restarts .
The provision of a fault-tolerant service can be simplified if it can be
assumed that the servers on it depends crash cleanly. A server either
functions correctly or else it crashes. A fail stop server is one that when it
is about to fail changes to a state that permits other servers to detect that
a failure has occurred and then stops.
Byzantine failure :- Byzantine failure behaviour is commonly used to
describe the worst possible failure semantics of a server. It models a situation in which
most computers works correctly but some faulty computers work as maliciously as
possible.faulty computer can for example, send contradictory messages to different
reciepents or impersonate each other .
The Byzantine generals problem describes a situation in which N divisions of the
Byzantine army are camped outside an enemy city . Each division is commanded by a
general (that is , a server). The generals communicate with one another only by
messenger. Each general has to vote yes or no to a plan of action.However, some of the
generals may be bad and will try to prevent the good generals from reaching agreement.
An algorithm solves the Byzantine general problem if it gets all the good generals to
agree yes or no within a bounded time. Byzantine agreement algorithms send more
messages and use more active servers than an atomic commit protocol. Their task is for
each server to send the same messages to all of the servers, so that the good servers will
all produce the same response within a bounded time. This is equivalent to an atomic
multicast that functions correctly even when some of the bad servers exhibit timing faults
that cause them to delay their response indefinitely. There are three servers A, B, C (two
good servers and one bad server), the good servers votes yes and the bad server sends yes
to one of the good servers and no to the other. One of the good servers (A) will receives
(yes, yes, yes) and other (B) will receives (yes, yes, no), but in both cases the majority
vote is yes and the outcome for the good servers is yes. This shows that two good servers
can compensate for one bad server. In general 2N+1 servers can tolerate N bad servers.

Q.62:- Define stable storage, primary and backup servers?

Ans:- Stable storage is an example of group masking at the disk block level. It is a
generic approach designed by Lampson to ensure that any essential permanent data will
be recoverable after any single system failure, including system failures during a disk
write operation and damage to any single disk block.
The stable storage service uses group masking by duplicating a careful storage service. A
careful storage service is defined to have only omission failure semantics that is, the Read
operation uses a checksum stored with each block to mask value failures by converting
them to omission failures. The unit of storage used by stable storage is called a stable
block. Each stable block is represented by two careful disk blocks that hold the contents
of the stable block in duplicate. If possible, the pair of blocks are located in different disk
drives to reduce the chances that both will be damaged in a single mechanical failure.
The read operation of the stable storage service reads one of the pair of representative
careful blocks. If an omission failure occurs, then it reads the other representative. This
enables the stable storage read operation to mask the omission failures of the careful
storage service.
A stable storage service guarantees that the following invariant is maintained for each
pair of blocks:
• Not more than one of the pair is bad.
• If both are good, they both have the most recent data, except during the execution
of a write operation.
The stable storage service write operation maintains this invariant by writing the data in
each of the two representative careful blocks sequentially, ensuring that the first write is
successful before commencing the second.
When a stable storage server is restarted after a crash a recovery procedure is a invoked.
At this point, the pair of blocks representing each stable block will be in one of the
following states:
1. both good and the same;
2. both good and different;
3. one good , one bad
The recovery procedure is designed to maintain the above invariant. It inspects the
pairs of blocks and does the following in each the above cases:
1. nothing;
2. copies one block of the pair to the other block of the pair;
3. copies the good block to the bad block
Stable storage operations are more costly in disk space and in time than convention disk
service operations.

PRIMARY AND BACKUP SERVERS

Some systems are designed to provide fault tolerance by means of duplicated hardware. A
service is represented by a pair of servers,
Each of which has crash failure semantics. Both servers execute all the client requests
and the reply of either server may be used. If one of the server computers crashes, the
other continues to provide the service. The duplication of hardware enables recovery
from a crash to be immediate. If the system will be used for a critical real-time
application (for example controlling a reactor, monitoring a patient’s heart beat, directing
an ambulance to its destination) then it is essential that recovery can be done within a
specified time limit.
Most application can tolerate some delay during recovery .If this is the case then it is
more appropriate to design the primary/backup pair so that the backup server is relatively
inactive during the normal operation of the primary- which is generally most of the time.
This enables the backup server computer to be used for other productive work.
Brog describe Auragen, a fault tolerant version of a distributed UNIX based on
primary/backup pairs. It is designed for a transaction- processing environment in which
occasional delays for recovery are acceptable. Check pointing is relatively infrequent so
that backup servers use only a small fraction of the productive capacity of the computers
on which they run, check pointing is achieved transparently, enabling any user program
to become fault tolerant without recompilation.
In Auragen, each primary process has a backup running on a different computer. The
backup process has sufficient information to start executing when its primary fails.
When the primary fails, the backup first reads in the checkpoint and then executes the
same messages that were executed by the primary and when it catches up it takes over as
primary. The receipt of a message by a primary can result in its sending other messages
elsewhere. During recovery its backup must avoid resending such messages .To make this
possible, a count of the number of messages sent by its primary since the last checkpoint
must be available.
Each request is a three-way totally ordered atomic multicast. This ensures that primary
and backup receive all the same messages in the same order as one another. It also
enables the sender’s backup to count the number of messages it has sent. Each server
performs an appropriate action on receipt of a message. This action is determined by its
role as follows:
Primary --- Execute the operation requested and return a reply
Backup --- Save the message in a log for subsequent use in recovery
Sender’s backup --- Count the number of messages since the last checkpoint by the
sender
Q.68:- Explain the design and implementation of distributed shared memory?

Ans:- STRUCTURE
Three main approaches have been implemented, which view DSM as being composed
respectively of bytes, shared objects or immutable data items.

Byte-oriented :- This is the view illustrated above by the Mother system. It is also the
view of, for example, the Ivy system. It allows application (and language
implementation) to impose whatever data structures they want on the shared memory.
Bytes-oriented DSM is accessed as ordinary virtual memory.
Shared object: An advantage of viewing shared memory as a collection of objects is that
synchronization can be applied at the level of the object operation. Orca views DSM as
collection of shared objects, and automatically serializes operations upon any given
object.
Immutable data: Both Agora Linda view DSM as a collection of immutable data items
that all processes can read. Processes have to replace data items in the DSM in lieu of
modifying them. For example, Linda uses the tuple model. This comprises the notion of
tuple space—a collection of immutable tuples with typed data items in their fields—and a
set of primitives for accessing tuple space that can be added to any base language.
Processes may place tuples in tuple space, and read or extract them from tuple space.

SYNCHRONIZATION MODEL

Many applications apply constraints concerning the values stored in shared memory. This
is as true of applications based on DSM as it is of applications written for shared memory
multiprocessor ( or indeed for any concurrent programs that share data, such as operating
system kernels and multi- threaded servers). For example, if a and b are two variables
stored in DSM, then a constraint might be that a=b always. If two or more processes
execute the following code;
a:=a+1;
b:=b+1;
then an inconsistency may arise. Suppose a and b are initially zero, and that process1 gets
as far as setting a to 1.Before it can increment b; process 2 sets a to 2 and b to 1.
The constraint has been broken. The solution is to make this code fragment into a critical
section: to synchronize processes to ensure that only one may execute it at a time
In order to use DSM, then a distributed synchronization services needs to be provided,
which includes familiar constructs such as locks and semaphores. Even when DSM is
structured as a set of objects, the implementers of the objects have to be concerned with
synchronization. Synchronization constructs are implemented using message passing
.Special machine instructions such as test-and-set , which are used for synchronization in
shared memory multiprocessors, are in principle applicable to page-based DSM, but their
operation in the distributed case is very insufficient.

CONSISTENCY MODEL
DSM implementations employ caching for enhanced performance. In the terminology
each process has a local replica manager, which holds replicas of some DSM data items.
In most implementations, data items are read using local values for efficiency, but
updates have to be propagated to the other replica managers. In a page based DSM
implementation, the kernels are replica managers: they cache DSM pages, which they
map into the address spaces of local processes sharing DSM. In the orca implementation
the processes themselves act as replica managers (when they execute library code). If
DSM segments are persistent then one or more storage servers (for example, file servers)
will also act as replica managers.
In addition to caching, a DSM implementation could in principle buffer write accesses,
and thus amortize communication costs by spreading them over multiple writes

SEQUENTIAL CONSISTENCY

When a read access is made to memory location, which write accesses to the location are
candidates whose values could be supplied to the read? At the weakest extreme. The
answer is: any write that was issued before the read. This model would be obtained if
replica managers could delay propagating updates to their peers indefinitely. It is too
weak to be useful.
At the strongest extreme, all written values are instantaneously available to all processes:
a read returns the most recent write at the time that the read takes place .Unfortunately,
this definition is problematic in two respects. First, neither writes nor reads take place at a
single point in time , so the meaning of ‘most recent’ is not always clear. Each type of
access has a well-defined point of issue, but they complete at some later time (for
example, after message passing has taken place). Secondly, clocks cannot be
synchronized sufficiently accurately in a distributed system to resolve events at this level
of granularity. So the relation’ before’ (in absolute time ) is not a useful one for these
events. A model called atomic consistency has been formulated to remove these
ambiguities, but it remains impractical for a DSM system; it is described by Mosberger.
Fortunatly it is possible to define useful and practical memory consistency models
logically, so that they can be reasoned about and compared with what programmers
intutatively expect from memories. The strongest and most important such model is
sequential consistency Sequential consistency is similar to one copy serializability but it
applies to memory systems at granularity of individual memory operations rather than
transactions. Under it m the result of any execution is equivalent to one in which the
following conditions obtain:
• all the reads and writes issued by the individual processes concerned are satisfied
in program order—as the programmers of those processes surely expect , and
• the memory operations belonging to different processes occur in some serial order

WEAK CONSISTENCY

The weak consistency model in an attempt to avoid the costs of sequential consistency on
multiprocessors. This model exploits knowledge of synchronization operations in order to
relax memory consistency,
While appearing to the programmer to implement sequential consistency. For example , if
the programmer uses a lock to implement a critical section then a DSM system can
assume that no other process may access that data items accessed under mutual exclusion
within it. It is therefore redundant for the DSM system to propagate updates to these
items until the process leaves the critical section. While items are left with inconsistent
values some of the time they are not accessed a t those points; the execution appears to be
sequentially consistent. Adve and hill describe a generalization of this notion called weak
ordering:’( A DSM system) is weakly ordered w2ith respect to a synchronization model
if and only if it appears sequentially consistent to all software that obeys the
synchronization model’

UPDATE OPTIONS
Two main implementation choices have been devised for propagating updates made by
one process to the others: write-update and write- invalidate. These are applicable to a
variety of DSM consistency models, including sequential consistency. In outline, the
options are as follows:
Write- update: The update made by a process are made locally and multicast to all other
replica managers processing a copy of the data item, which immediately modify the data
read by local processes processors read the local copies of data items, without the need
for communication. In addition to allowing multiple readers several processes may write
the same data item at the same time; this is known as multiple-reader –multiple—writer
sharing.

Write—invalidate: This is commonly implemented in the form of multiple—reader—


single—writer sharing .At any time, a data item may either be accessed in read – only
mode by one or more processes, or it may be read and written by a single process.
An item that is currently accessed in read—only mode can be coped indefinitely to other
processes. When a process attempts to write to it a multicast message is first other
processes. When a process attempts tow write to it a multicast message is first other
copies to invalidate them, and this is acknowledged before the write can take place: the
other processes are thereby prevented from reading.
a

You might also like