You are on page 1of 60

Distributed Systems

Message Passing
Prof. Aniruddha S Rumale
Assistant Professor, Comp. Engg. Dept.
Introduction
•Process means program in execution.
•Two comps in DS are communicating
means two processes communicating with
each other.
•Process communication is necessary to
achieve some common goal.
•DS needs to provide InterProcess
Communication ( IPC)
IPC
IPC requires information sharing among two or more
processes.
1) Original sharing or shared data approach
information to be shared is stored in common
memory area

2) Copy Sharing or message passing


information to be shared is physically copied from
sender process’s address space to address spaces
of all receiver processes in form of message.
Shared P2
P1 common P2 P1
memory

Shared data approach Message-passing approach

•Generally Computers in N/W do not share memory. So IPC


in DS uses Message-passing over shared data
Approach.
•Message-passing system provides message based IPC
protocols by shielding the complex N/W protocols
and also by shielding the multiple heterogeneous
platforms from user.
Features of Message-passing system
1. Simplicity
should be simple and easy to use.
It must be straightforward to construct new
applications and communicate with existing
ones using systems primitives.

2. Uniform Semantics
semantics of remote communications( communicating
processes are on different nodes) should be as close
as possible to local communications( communicating
processes are on same node).
Features of Message-passing system …continue
3.Efficiency
an IPC protocol of a message-passing system can be
made efficient by reducing the number of message
exchanges, as far as practicable, during the
communication process.
Avoiding the cost of establishing and terminating
connections between same pair of processes each
and every message exchange between them.
Minimizing the cost of maintaining the connections.
Piggybacking the acknowledgement of previous
messages with the next message during a connection involving
several message exchanges between sender and
receiver.
Features of Message-passing system …continue
4. Reliabilty
DS is prone to catastrophic events.( node crash,
communication link (s) failures etc…)
A reliable IPC protocol can cope with failure problems
and guarantees the delivery of message.
Handling of lost messages involves ACKs and
retransmissions on the basis of timeout.
Duplicate messages may be sent in the event of
failures or timeouts. A reliable IPC protocol must be
capable of detecting and handling of duplicates. It
involves generating and assigning sequence numbers
to messages.
Features of Message-passing system …continue
5.Correctness
Correctness is a feature related to IPC protocols for
group communication.
Atomicity : message sent to a group of receivers will
be delivered to either all of them or none of them.
Ordered Delivery: messages arrive at all receivers in
order acceptable to an application.
Survivability: guarantees that messages will be
delivered correctly despite partial failures of
processes, machines, or communication links.
Features of Message-passing system …continue
6.Flexibilty
Not all applications require the same degree of
correctness and reliability of IPC protocols. Thus , the IPC
protocol of message passing system must be flexible
enough to cater to the various needs of different
applications.
IPC primitives must also have the flexibility to permit
any kind of control flow between the cooperating
processes, including synchronous and asynchronous
send/receive.
Features of Message-passing system …continue
7. Security
A good message-passing system must be capable of
providing a secure end-to-end communication.
A message in transit on the network should not be
accessible to any user other than those to whom it is
addressed and the sender. This involves
a) authentication of receiver(s) of a message by sender
b) authentication of sender of message by receiver(s)
c) encryption of message before sending it over N/W.
Features of Message-passing system …continue
8. Portability
the message-passing system should itself be
portable.
the applications written by using the primitives of IPC
protocols of message-passing system should be
portable.
Issues in IPC by message-passing
Message is a block of information formatted by a
sending process in such a manner that it is
meaningful to the receiving process.

Structural information Sequence Addresses


Actual
Number Sending
Data or Number Receiving
Or Process
Pointer Of bytes/ Type Process
Message Address
To data elements Address
ID

A typical message structure


Issues in IPC by message-passing…continued
• who is sender and who is receiver?
•How many receivers? One or many?
•Is the message guaranteed to have been accepted by its
receiver(s)?
•Does the sender need to wait for reply?
•How to handle catastrophic events ( node crash, link(s)
failure(s), etc…) occurring during communication?
•If receiver is not ready; what to do with message? Discard or
store in buffer? What to do if buffer is full?
•Can receiver choose order of acceptance to serve outstanding
messages?
Synchronization
Central issue in communication structure is synchronization imposed
on the communicating processes by the communication primitives.
Two semantics Blocking and non-blocking can be used.
A primitive is said to have non-blocking semantics if its invocation
does not block the execution of its invoker.
If execution of invoker is blocked, it is blocking semantics.
These semantics are primarily used for send and receive primitives.
Incase of blocking send primitive, sending process is blocked after
execution of send until it receives an ACK from receiver that the
message is received.
Incase of non-blocking send, process proceeds with its execution as
soon as the message get copied to buffer.( transferred if Null-buffer is
used)
Synchronization…continued
Incase of block-receive, receiving process is blocked until it
receives a message (ACK).
Incase of non-blocking receive, process proceeds with its
execution as soon as receive primitive is executed.
How non-blocking receive knows that message is arrived in
buffer?
1. Polling : a test primitive is provided to allow receiver to
check the buffer status. A periodic execution of test is
carried out called as polling.
2. Interrupt : when buffer get filled and becomes ready to be
used by receiving process a software interrupt notifies this
to receiver. Saves repeated unsuccessful check of polling.
Synchronization…continued
A variant to non-blocking receive primitive is conditional
receive primitive. This returns control to invoking process
immediately, either with a message or with an indicator of no-
message.
In blocking-send primitive, sending process could get
blocked forever if receiver crashes or if message loss due to
other reasons. To avoid this blocking primitives uses time-out
value ( time-stamp, waiting-time) specifying interval of time
after which the operation of blocking-primitive ( blocking-
send) is terminated with an error status.
Time-out value is either default ( system calculated ) or user
defined ( human-time) with respect to communication criteria.
Synchronization…continued Sender Receiver

When both send and receive Send (message) Receive (message)


primitives of a communication Execution Execution
between two processes use suspended suspended
blocking semantics, the
Execution
communication is said to be Blocked
message
Resumed
synchronous; otherwise it is State
said to be asynchronous.
Send (ACK)
Synchronous communication Execution ACK
comparatively easy to Resumed
Executing State
implement than asynchronous
communication. Synchronous mode of communication with both
send and receive having blocking semantics
Synchronization…continued
Synchronous communication is more reliable.
If message get lost or is undelivered, no backward error
recovery is necessary.
It limits concurrency
It is subject to communication deadlocks.
Less flexible than asynchronous communication.
Requires unnecessary waits for ACK.
And it is slower than asynchronous communication.
Buffering Sending
In communication, in some cases receiving process may Process
not be ready to receive a message.Such messages need
to be stored somewhere, usually in the buffer of receiver,
for later reception and processing. Message

Null Buffering ( No Buffering) [ Synchronous]


No temporary storage at receiver to store the message.
The message remains in the sender process’s address
space and execution of send is delayed until the receiver ACK
executes the corresponding receive.
Or message is simply discarded and time-out mechanism Receiving
Process
is used to resend the message after a time-out period.
Null Buffering
Buffering… continued
Single-Message Buffer [ Synchronous] Sender
Null buffer is not suitable for communication in DS if
receiver is not ready, it may require more than two
repeated message transfer of same message. Also
receiver need to wait for time taken to transfer the
message across the N/W. Buffer to
Store
To avoid this a buffer with a capacity to store one One
message is used at receivers end. This is because in Message
synchronous mode an application module may have at
most one message outstanding at a time.
The message buffer may either be located in the kernel’s
address space or in the receiver process’s address
space. Logical path of message transfer involves two Receiver
copy operations.
Buffering… continued
Unbounded capacity buffer [ asynchronous] Sender

In asynchronous communication sender never


wait for receiver to be ready;
causing many pending messages that yet not Buffer to
have been accepted by receiver; Store
Many
and thus requires unbounded capacity buffer to Messages
store all unreceived messages.
Unbounded capacity of buffer is practically
impossible. So in practice asynchronous
communication uses finite bound buffers. Receiver
Buffering… continued
Finite bound or Multiple-message buffer [asynchronous]
1. Unsuccessful communication :
message transfers simply fail whenever there is no more buffer
space.send normally returns an error message to sender : receiver
buffer is full, message can’t be delivered.
makes message passing less reliable.

2. Flow-controlled communication :
sender is blocked until the receiver accepts some messages.
introduces synchronization : may results in deadlocks
Communication based on Finite bound buffer implementation is more
complex to implement and use than null buffer or single message
buffer.
Multidatagram messages
All N/W has upper bound on data to be transmitted at a time,
known as Maximum Transfer Unit (MTU).
If sizeof ( message) > MTU
: fragmentMTU( sizeof (eachfragment)<=MTU)
: number each fragment serially
: if each fragment numbered
: send them in packets ( datagram)
Packet = control information + message data.
If MTU> message data to be send : single-datagram
message, else multidatagram messages.
Multidatagram messages… continued
At receiver side check packets for sequence number
If packet is numbered store it in buffer
Receive all packets with sequence numbers based on
common control information.
report any error to sender for retransmission of
missing packet
arrange packets in order
accepts packets in order
reassemble packets to form complete message
acknowledge the sender
Encoding and Decoding
A message data should be meaningful to the receiving process.
This needs preservation of program objects while transmission from
senders address space to receivers address space.
It is not easy to achieve this on homogeneous systems and it is
impossible to achieve this on heterogeneous systems.
An absolute pointer value loses its meaning when transferred from
one process address space to another.
Different program objects occupy varying amount of storage space.
And to make a message meaningful to the receiver, there must be
some way receiver to identify which program object is stored where
in the message buffer and how much space each program object
occupies.
Encoding and Decoding…continued
Due to problems in transferring program objects in their
original form, they get converted to a more suitable stream
form for transmission.
Conversion process from original to stream form taking place
at sender side is known as encoding.
Conversion of stream form of program objects in to their
original form at receiver side prior to their use is known as
Decoding.
There are two types of representation used for encoding and
decoding. Tagged and untagged representation.
Encoding and Decoding…continued
Tagged representation :
type of each program object along with its value is encoded in the
message.
Receiving process checks the type of each program object in
message due to self-describing nature of coded data.
More expensive than untagged representation.

Untagged representation :
message data only contains program objects. No information about
type of any program object is given.
Receiver must have prior knowledge to decode the received data as
coded data format is not self-describing.
Process Addressing
Addressing ( Naming ) of parties involved in communication is
an important issue in message based communication.

Explicit addressing :
the process with which communication is desired is explicitly
named as a parameter in the communication primitive used.
Eg . Send ( process_id, message)
Receive( process_id, message)
In above we are sending/receiving message to/from a process
identified using process_id
Process Addressing…continued
Implicit Addressing :
a process willing to communicate does not explicitly name a
process for communication in communication primitive used.
Useful in client server communication when client is concerned
with service and not the server from set of server-farm, who is
going to serve its purpose.
This is also known as functional addressing as address used is
of service and not of process.
Eg. Send_any( service_id, message) send message to
any process which can provide the service identified by service_id.
Receive_any(process_id, message) Receive message
from any process and return the process_id on reception of message.
Failure handling
DS is prone to following failures.
sender Receiver
sender Receiver sender Receiver
Send message Send Send message
message received
request request request
message
lost Crash
Send
lost response
Request message is
Lost either because
restarted
Of link failure or
Because receiver Response loss either due Receiver’s node
Node may be down To down sender node or Crashed before
At time of request Failed link of Receiving request
For communication comunication For communication
Failure handling…continued Client Server
To cope with these problems, IPC protocols are
designed based on the idea of internal Request
retransmissions of messages after time-outs.
And prompt return of ACK messages from
receiver.
ACK
Four-message IPC protocol for client-server:
1. Client sends request message to server. Reply
2. On reception of request server sends ACK to
client.
ACK
3. On processing of client’s request server sends
Reply containing results of processing to client. Blocked state
4. On reception of reply client sends ACK to server Executing state
Failure handling…continued Client Server
Three message reliable IPC
protocol for client server : Request
1. Client sends a request message
to server
2. On reception server processes
the request and prepares a reply
and sends it to client; meanwhile Reply
client remain blocked.
3. On reception of reply from server
client resumes its execution and ACK
sends a ACK to server. Server
remain blocked until the ACK Blocked state
from client. Executing state
Failure handling…continued Client Server
Two message IPC protocol for
client server communication : Request
1. Client sends a request message
to server and enters in block
state for time=time-out.
2. On reception of request from
client server processes it and Reply
prepares a reply and sends it to
client.
3. Servers kernel waits for ACK
from clients kernel for time=time-
out; In absence server Blocked state
retransmits the reply to client. Executing state
Failure handling…continued Unsuccessful
client server Request
Request message execution
Send request
Time-out lost
Send request Successful
Retransmit request message request
Time-out
Executions
Send request crash May produce
Retransmit request message Different results
Time-out
lost Send response
ACK
Send request
Retransmit request message

ACK Send response

Example of fault tolerant communication between client-server


Idempotency & handling duplicate request messages
Idempotency means repeatabilty.
Idempotent operation produces same results without any
side effects no matter how many times it is performed with the
same arguments. Eg. sqrt(64)=8 for any repeated execution.
Nonidempotent operations do not necessarily produce the
same results when executed repeatedly with the same
arguments. Eg. See the following code
Debit(amount) { if ((balance-amount) >=( minbalance))
{ balance=balance-amount; return(“success”,balance); }
else return(“failure”,balance); }
// produces different results for amount=100 for every operation
Nonidempotent operation Server Minbalance
client Balance=1000 =200
Debit(100)
Send request
lost
Send request Balance=1000-100
Retransmit Debit(100) =900
Time-out
Send request crash
Retransmit Debit(100)
lost Balance=900-100
ACK( Success,800) =800
Send request
Retransmit Debit(100)
Received Balance=800-100
Balance =700 ACK( Success,700) =700
Desired=900
Example of nonidempotent operation without any measures for fault detection
Idempotency & handling duplicate request messages…continued

Problem of nonidempotency can be solved using by avoiding


orphan executions ( executions of client request done at server side,
results of which won’t reach to client and so may client keep
retransmitting the same request; yielding in wrong result(s) ) of
requests from client.
This can be achieved by using exactly-once semantics, which
ensures that only one execution of server’s operation is
performed for one request.
Requires identification of orphan executions.
Primitives based on exactly-once semantics are most desired
but difficult to implement.
Exactly-once semantics
uses unique identifier for every request that a client Request
ID
makes.
Sets up a reply cache in the kernel’s address space on Execution
the server machine to cache replies. Status
Executed
Before forwarding a request to server, kernel of server /
Not
machine checks to see if a reply already exists in reply received
cache or not.
If yes, that means request is duplicate and already
Result
executed. So previously computed result is extracted obtained
from reply cache and new response is send to client.
Reply-cache
If no, request is forwarded to appropriate server by contents of Exactly-
kernel. once semantics
Exactly-once operation Minbalance Server
=200 Balance=1000
client
Debit(100)
Send request-1
lost
Send request-1 Reqest-1
Retransmit Debit(100) Balance=1000-100
Time-out
=900
Send request-1 crash
Retransmit Debit(100)
Reqest-1
lost
ACK( Success,900) already executed
Send request-1 Balance=900
Retransmit Debit(100)
Received Reqest-1
Balance =900 already executed
ACK( Success,900)
Desired=900 Balance=900
Example of exactly-once operation
Lost and out of sequence packets
Keeping track of lost and out of sequence packets is a issue in
multidatagram messages.
In multidatagram message transmission is said to be complete
iff all packets are received by a process to which it is sent.
Simple way is acknowledge each packet separately. ( stop-
and-wait protocol) . This leads to communication overhead.
Better approach is sending one acknowledgement for
complete multidatagram message when all packets get
received at receiver end.( blast protocol)
Lost and out of sequence packets…continued
In blast protocol a node crash or a link failure may lead to following
problems:
one or more packets of multidatagram message are
lost in communication
the packets are received out of sequence by the
receiver.
Efficient mechanism is to use a bitmap to identify the packets of a
message. In this approach header part of each packet consists of two
extra fields, one of which specifies the total number of packets in
multidatagram message and other is the bitmap field that specifies
the position of this packect in the complete message.
Sender Receiver Type of Message No of Packet Sequence no Rest of
address address message ID packets Or bitmap message
Lost and out of sequence packets…continued
In multidatagram message a suitable buffer is set aside by receiver
using No_of_packets field in first packet.
Bitmap field gives information where exactly a received packet must
be stored in set aside buffer for the particular message.

selective repeat :
After time-out, if all packets are not received, Bitmap ids of non-
received packets are communicated with the sender.
On receiving this information sender sends only those packets that
have not been received by receiver.
The process get repeated until transmission of multidatagram
message won’t get completed. i.e. when all packets of message get
received by receiver this retransmission of select packets stops.
Sender of multidatagram Message
Receiver of multidatagram Message
that consists of Five packets
Send request message Buffer
(M1,5,P1)
lost Time-out For 5

Create buffer for five Packets


Packets of (M1,5,P2)
The Packets and store this
(M1,5,P3) Packet in position 2
Response
Message
Place this packet 1
(M1,5,P4) In position 3
lost Place this packet 2
(M1,5,P5) In position 5
3
Missing packets info.
Retransmit request
For missing packets
4
Resend missing (M1,5,P1)
packets
Place this packet 5
(M1,5,P4) In position 1
Place this packet
In position 4 M1- Messeage ID=1
5=packets in M1
ACK Send ACK P1,P2…= Ith packet
Use of bitmap to keep track of lost and out of sequence packets in multidatagram message transmission
Group communication
Elementary form of communication is one-to-one or unicast
communication.
DS require group communication (in addition to unicast) those
are
1. One to many ( single sender and multiple receiver )
multicast ( no of receivers are predefined and
known)
broadcast( no of receivers are unknown and
undefined)
2. Many to one ( multiple sender and single receiver)
3. Many to many ( multiple sender and multiple receiver)
Group Management
In one to many communication, Receiver processes of a message
form a group; closed and open.
A closed group is one in which only the members of the group can
send a message to the group.
In close group, an outside process cannot send a message to group
as a whole, although it may send a message to an individual member
of group.
Open group is one in which any process in system can send
message to group as a whole.
Usage of close/open group is application specific and any flexible
message passing system must support both types of groups.
Facility of Dynamic creation and deletion of groups is must. And a
process must be allowed to enter or leave the group at any time.
Group Management
Simple mechanism for this is to use centralized group server
to manage groups and their membership information.
Centralized server approach suffers from the problems of
poor reliability and poor scalability common to all centralized
systems.
Replication of group servers adds communication overhead
in keeping group information of all group servers consistent.
Group addressing
Two level naming scheme is normally used.
High level group name is in ASCII string that is independent of the
location information of the processes in group.
Low level group name depends to a large extent on underlying
hardware.
Special N/W address to which multiple machines can listen is called
as multicast address, possible on some N/Ws. Packet sent to
multicast address is delivered to the machines linked to multicast
address.
N/Ws, which can not create multicast address may have broadcasting
facility by declaring a particular address such as zero as broadcast
address. Packet sent to broadcast address is delivered to all
machines in entire N/W.
Message delivery to receiver processes
User uses high level group names in programs.
Centralized group server maintains mapping between high and low
level group addresses ( names) along with the process identifiers of
all processes for each group.
Kernels of sender, receiver , and group server does appropriate
mapping and unmapping operations with rest of other operations like
encoding/decoding to deliver message to correctly to receiver.
Sender is not at all aware of either size of group or actual mechanism
used for group addressing.
Sender simply sends the message to a group by specifying its high
level name, and the OS takes the responsibility to deliver the
message to all the group members.
Buffered and unbuffered multicast
Multicasting is asynchronous communication mechanism due to following
reasons.
1. It is unrealistic to expect a sending process to wait until all the receiving
processes that belong to multicast group are ready to receive the multicast
message.
2. Sending process may not be aware of all receiving processes that belong
to the multicast group.
For an unbuffered multicast, the message is not buffered for the receiving
process and is lost if receiving process is not in a state ready to receive it.
So the message is received only by those processes of multicast group
that are ready to receive it.
For a buffered multicast, the message is buffered for the receiving processes,
so each process of multicast group eventually receive the message.
Same is true for broadcasting communication.
Send to all and bulletin-board semantics
• Send-to-all semantics : a copy of message is sent to each process of the
multicast group and message is buffered until it is accepted by the
process.
Following two factors are ignored by send-to-all semantics.
Relevance of a message to a particular receiver may depend on the
receiver’s state.
Message not accepted within a certain time after transmission may
no longer be useful; their value may depend on sender’s state.
• Bulletin-board semantics : a message to be multicast is addressed to a
channel instead of being sent to every individual process of multicast
group.
Receiving process copies message from channel instead of
removing it when it makes a receive request on the channel.
Process that have receive access right on the channel constitute the
multicast group.Thus channel acts as a bulletin-board.
Flexible reliability in multicast communication
1. The 0-reliability : no response is expected by sender from
any of the receiver.
Useful in asynchronous multicast.
2. The 1-reliability : sender expects the reply from any of
receivers.
3. m-out-of-n-reliable : the multicast group consists of n
receivers and sender expects a response from m ( 1<m<n)
of the n receivers.
Useful for consistency control of replicated information
with m=n/2.
4. All-reliable: sender expects a response message from all
the receivers of the multicast group.

You might also like