A Scalable and Early Congestion Management Mechanism For Mins

A Scalable and Early Congestion Management Mechanism for MINs
Joan-LLuı́s Ferrer, Elvira Baydal, Antonio Robles, Pedro López, José Duato
Parallel Architectures Group
Universidad Politécnica de Valencia
Camino de Vera s/n, 46021 Valencia, Spain
Email: juaferpe@doctor.upv.es;{elvira,arobles,plopez,jduato}@disca.upv.es
Abstract—Several packet marking-based mechanisms have there has been considerable research focused on develop-
been proposed to manage congestion in multistage inter- ing new Congestion Management Mechanisms (CMMs)
connection networks. One of them, the MVCM mechanism able to provide the best network performance. As cluster
obtains very good results for different network configurations
and traffic loads. However, as MVCM applies full virtual interconnects assume a lossless network model, packets
output queuing at origin, its memory requirements may cannot be dropped to deal with congestion. Most CMMs
jeopardize its scalability. Additionally, the applied packet are based on detection & recovery strategies [1,2,3,4,5,8].
marking technique introduces certain delay to detect con- Basically, detection is based on different packet marking
gestion. In this paper, we propose and evaluate the Scalable techniques, that is, input buffer [3] or output buffer [4]
Early Congestion Management mechanism which eliminates
the drawbacks exhibited by MVCM. The new mechanism marking, and input buffer marking plus output buffer
replaces the full virtual output queuing at origin by either validation [8]. The mechanisms were proposed by com-
a partial virtual output queuing or a shared buffer, in paring them with the behavior of the network in absence
order to reduce its memory requirements, thus making the of any congestion control. Later, a comparison study [9]
mechanism scalable. Also, it applies an improved packet among the current proposals analyzed to what extend the
marking technique based on marking packets at output
buffers regardless of their marking at input buffers, which CMM behavior is caused by marking technique or, on
simplifies the marking technique, allowing also a sooner the contrary, it is due to the application of corrective
detection of the root of a congestion tree. actions. Although results showed [9] that the Marking and
Keywords-Interconnection networks, congestion manage- Validation Congestion Management (MVCM) mechanism
ment, message throttling. was the most effective one in both aspects (detection and
recovery) with different network configurations and traffic
I. I NTRODUCTION loads, some drawbacks still remain. In particular, MVCM
applies a Full Virtual Output Queuing (FVOQ) at source
Congestion management is a classic problem within hosts in order to eliminate the occurrence of HOL blocking
the framework of the interconnection networks. Basically, at the origin hosts. This technique forces to dedicate a
congestion can appear when different origin hosts send large memory in each origin host and makes the scalability
information with a too high injection rate competing difficult because the number of the queues depends on the
for the same resources. As the capacity of the queues number of destination hosts. Moreover, the packet marking
and channels are limited, if an output link is demanded technique, based on marking packets at input buffers and
by several packets coming simultaneously from different validating them at output buffers, succeeds in identifying
input links, only one packet can use it while the others the flows responsible for congestion, but introduces certain
have to wait. If this situation remains for long, packets delay in the congestion detection that may penalize its
start to be accumulated at the queues of the affected switch performance.
causing the Head-of-Line blocking (HOL) phenomenon, In this paper we tackle the main weaknesses found
and spreading the congestion through the network, pro- in MVCM, proposing and analyzing a new mechanism,
voking a partial or a total saturation of the network. called Scalable Early Congestion Management (SECM),
Congestion management can be achieved by controlling that eliminates them. The SECM mechanism uses a new
the input traffic in order to avoid the saturation of the packet marking technique based on the independence
links. Notice that, if some of the packets that have to between input and output buffer marking. This makes the
cross the saturated links were stopped at their origin marking technique simpler and faster, allowing to detect
hosts, saturation would be reduced and then, the network congestion sooner. Furthermore, the possibility of using
throughput could be improved. Therefore, it is necessary to either a Shared Buffer (SB) or a Partial Virtual Output
detect the beginning of congestion correctly and quickly, Queuing (PVOQ), instead of a FVOQ at origin hosts,
in order to apply corrective actions only over the hosts has been proposed and analyzed in order to make the
responsible for that situation. mechanism scalable.
In the last few years, Multistage Interconnection Net- The rest of the paper is organized as follows. Section
works (MINs) have become very popular in cluster in- II describes the MVCM mechanism. In Section III, the
terconnects because they are able to meet the increasing MVCM weaknesses and the SECM mechanism are de-
demand of new services. In order to avoid congestion, scribed. The simulation scenario and the evaluation results
are presented in section IV. Finally, in section V some of saturation and congestion persists, a second level will
conclusions are drawn. reduce even more the injection rate. It consists in delaying
the packet injection by inserting Waiting Periods (WPs)
II. MVCM MECHANISM between them meanwhile the maximum window size is
The MVCM mechanism [8] consists in an end-to-end kept to 1. This technique allows to reduce the injection rate
CMM based on the use of Explicit Congestion Notification in a progressive way. As long as the congestion vanishes,
(ECN). This mechanism combines a packet marking & the waiting interval will be decreased until it disappears.
validating technique with the application to the flows All in all, preventive actions only can act over DW
responsible for congestion (hot flows) of a two-level value, whereas imminent actions can reduce DW value and
scheme of corrective actions at their source nodes. Unlike later insert WPs. In order to eliminate the HOL blocking
other approaches [3,4], MVCM was not proposed for any effect that could be caused by the injection rate limitation
standard interconnect in particular. The packet marking applied on hot flows, which would penalize the non-
technique applied is based on the dependency between responsible flows (cold flows), MVCM applies a FVOQ
the packet validation at output buffers and the packet technique at the Network Interface Controllers (NICs) of
marking at input buffers. To this end, switches need to the source nodes. Fig. 1 shows the implementation. In
have buffers associated to both their input and output particular, the memory size should be, in bytes, at least
ports in order to combine them. The main goal of this DWmax*#destinations*packet size.
packet marking strategy is to properly differentiate hot
flows, in order to apply packet injection limitation only
at the source nodes that are really generating congestion
and with the appropriated intensity. This packet marking
technique operates in two steps. First, packets arriving
to an input buffer are marked if the number of stored
packets in the buffer exceeds a certain threshold. This
is performed by activating the Marking Bit (MB) in the
Fig. 1 Full Virtual Output Queuing in the MVCM mechanism.
packet header. Next, when a marked packet is forwarded
through a saturated output link, the mechanism proceeds III. S CALABLE E ARLY C ONGESTION M ANAGEMENT
to validate it by activating a second bit, the Validation Bit MECHANISM
(VB). We assume that an output link is saturated when Although it has been shown that the MVCM mechanism
the number of packets stored in its buffer exceeds another achieves better results than other current proposals [9],
threshold. Once a packet arrives to the destination host, a it exhibits some features that make its scalability diffi-
copy of the MB and VB bits is sent to the origin in the cult and may introduce a certain delay in detecting and
corresponding ACK packet, in order to warn the origin applying corrective actions when a congestion situation
host about the network status and not to overload network appears. In order to remove the drawbacks exhibited by
with additional control packets. MVCM, we propose a new CMM referred to as SECM.
When a source node receives a marked (MB=1 and In what follows, we firstly analyze in more detail the
VB=0) or validated (MB=1 and VB=1) ACK packet, main weaknesses of the MVCM mechanism, in order to
the mechanism will apply corrective actions according to justify the new proposal. Next, we present the SECM
the seriousness of the congestion. To this end, there are mechanism by describing each of its main contributions.
defined two levels of corrective actions. Table 1 relates They will be focused on the organization of the injection
them. buffers at network interfaces (NIs) and the packet marking
TABLE 1. technique. On the other hand, the SECM mechanism will
Actions for MVCM basically maintain the same scheme of corrective actions
Ack bits
MB VB Actions
applied by MVCM, given that it has shown to be quite
0 0 No Actions effective, but extended according to the new capabilities
0 1 Not possible offered by the packet marking technique.
1 0 Moderate-Preventive (DW)
1 1 Imminent-Corrective (DW+WP)
A. MVCM Weaknesses
The first level is based on adjusting the packet injection The main drawback of the MVCM mechanism is its
rate of each flow by using a Dynamic Window (DW) lack of scalability with respect the storage space required
only. The window size is dynamic, allowing to fluctuate at the NIs of the source nodes. This is caused by the fact
between one and a defined maximum value (DWmax) that MVCM assumes that a FVOQ technique is applied at
depending on the congestion situation. The DW establishes the NICs in order to eliminate the HOL blocking effect. If
the maximum number of data packets that a host can send packet injection was carried out through a simple output
without receiving an ACK, that is, the number of allowed buffer, packets (belonging to hot flows) stopped at the head
outstanding packets. Notice that the DW value is reduced of the buffer due to the reduction in their injection rate
or increased by 1 depending on the corrective actions. could delay the advance of the packets belonging to cold
If DW reduction is not enough to stop the beginning flows. This problem can be caused as a consequence of
the application of corrective actions to the hot flows when those packets belonging to the hot flows will also set the
congestion is detected. To solve this problem, the most VB bit when they traverse the root of the congestion tree.
effective solution is to provide a separate output queue Notice that packets only with its MB bit activated cross a
for each packet flow, that is, apply a FVOQ technique. congested area, but maybe they will not pass through the
Despite the fact that FVOQ completely eliminates the congested link, that is packets belonging to the flows h3-
HOL blocking at the source level, the required storage h0. As a consequence, origins can apply harder restrictions
space depends on the number of destinations, increasing on hot flows if both bits are set. However, VB can only
proportionally to the number of hosts. This solution could be set after MB has been set. So, this marking technique,
be acceptable for small system sizes, but in systems with named Marking&Validation Packet Marking (MVPM),
thousands of nodes the memory requirements would be produces a delay in the application of corrective actions to
quite significant. the flows h0-h1 and h2-h1, which are contributing to create
Other weak feature exhibited by MVCM is related the congestion tree. This delay depends on the time needed
to the applied packet marking technique. To properly to fill the output buffer of switch S2 and later any input
appreciate the problem that this technique may cause, let buffer. As a consequence of this delay in the congestion
us analyze in detail how the congestion process evolves detection, the application of corrective actions will be, in
in the network and how the packet marking is carried out. turn, delayed, which may penalize the performance of the
MVCM assumes the existence of switches with both input CMM.
and output buffers, as shown in Fig. 2.
B. Providing Scalability
As commented above, MVCM applies a FVOQ tech-
nique at origin hosts to eliminate HOL blocking. However,
the use of this technique does not scale as the system
size increases. Therefore, FVOQ should be replaced by
an alternative mechanism able to significantly reduce the
storage requirements (making its size independent from
the number of hosts) without penalizing network perfor-
mance. Notice that using a buffer organization different
from FVOQ would mean that every flow has no longer
a dedicated queue to be stored, which could provoke the
apparition of the undesirable HOL blocking effect. There-
Fig. 2 Congestion flows in a switch. fore, it is required to analyze to what extent an alternative
mechanism, whose storage requirements are lower and
Let us assume that hosts h0 and h2 inject traffic toward
independent from the system size, could be applied in
host h1. Additionally, the host h3 injects traffic toward
such a way network performance is hardly penalized. Two
host h0. In this scenario, if the injection rate h0+h2
alternative approaches can be considered. One of them
surpasses the acceptance rate of h1, packets arriving to
would consist in the use of a PVOQ technique, whereas
the switch S2, but not immediately transferred to the host
the second one would be the use of a Shared Buffer (SB).
h1, will begin to be accumulated at the output buffer 1.
Both approaches exhibit some pros and cons that we try
Notice that, in order to prevent switches from becoming a
to analyze in more detail in what follows.
bottleneck, the internal bandwidth of the switch crossbar
usually is higher than the channel bandwidth. As the Applying the PVOQ technique consists in using a num-
ber of queues at the NI of the source hosts smaller than
speedup increment increases the switch complexity, often
a maximum speedup of two is used [6]. Those are the the number of hosts. In particular, the number of queues
packets really provoking congestion and the output link could range from, at least, two queues to the number
becomes the root of the congestion tree. In order to solve of host minus one. Therefore, the storage space could
the problem, once it is detected, the injection rate of these be significantly reduced, becoming independent from the
number of hosts, which would improve the scalability
flows should be reduced. Otherwise, if the situation goes
on, packets will begin to be accumulated at the input of the mechanism. However, when using a number of
buffer of the switch S2. Some of those packets will be bounded queues, packets addressed to different destination
hosts will have to be stored in the same queue, maybe
destined to the congested link (h0-h1 and h2-h1), that
is, hot flows, but others not (h3-h0), cold flows, which causing HOL blocking, which could affect network per-
can suffer HOL blocking due to the FIFO policy applied formance. Notice that, the mechanism chooses the first
by buffers. MVCM tries to differentiate flows that really packet from the head of the queue to be injected into
are provoking the congestion from flows affected by the the network. Therefore, the key issue is to reduce the
number of queues as much as possible but not affecting
congestion. According to the packet marking technique
applied by MVCM, the MB bit of both kind of packets will too much network performance. The method used for
set if the input buffer threshold is exceeded. However, only mapping packet flows to queues is based on the destination
address[7]. Some bits of the destination address indicate
1 This occurs due to the speedup provided by switches the queue to place the packet. This method is referred
to as module mapping, because the queue to map the belonging to every flow, so that its allowed maximum
flow is obtained by applying the module operation to number is not exceeded. Fig. 4 shows this implementation.
its destination address. As an example, for a network
configuration of 512 destinations and 16 queues, the four
least significant bits of the destination ID (9 bits) indicate
the queue to map the flow. Additionally, the MVCM
mechanism imposes the value of DWmax as the queue
size in the FVOQ strategy in order to bound the number of
outstanding packets. Now, in order to reduce even more the
memory size, an array of counters can be applied to control
the outstanding packets and in this way, just one buffer can
be defined per queue. We assume this implementation in Fig. 4 SB with an array of counters.
the PVOQ technique as shown in Fig. 3. Notice that the
number of destination hosts k is larger than the number of Once a packet is injected into the network, its occupied
dedicated queues j. For each flow, its associated counter space is released, remaining available for any other gen-
is increased each time a packet belonging to it is injected erated packet from any flow. As in the case of PVOQ,
into the network from the queue, whereas it is decremented if the buffer size is not large enough, the injection of
when the ACK packet is received. Notice that the size of packets belonging to cold flows could be delayed. This
the counter array 2 is not significant with regard to the would cause an increase in their latency. Notice that, all
space occupied by a single packet. the results in the evaluation section, have been obtained
assuming that the occupied space by a packet in the
memory is immediately released when it is injected into
the network. Again, the key issue is to minimize the buffer
size as long as the latency of the packets belonging to cold
flows is not penalized. Therefore, it will be necessary to
evaluate the behavior of the mechanism when applying
different sizes of SB.
C. Improving Packet Marking Technique
As commented before, the packet marking technique
applied by MVCM introduces certain delay in detecting
Fig. 3 Partial VOQ with an array of counters. congestion. This is due to the fact that packets are vali-
dated at output buffers only if they have been previously
On the other hand, an alternative solution is to use a marked at input buffers. Taking into account how con-
SB at the NI of the origin hosts. The size of the SB gestion trees are spread over the network, packets will
can be selected regardless of the system size and traffic first exceed the detection threshold at output buffers before
load, what contributes to improve the scalability of the exceeding it at input buffers. Therefore, packets must wait
congestion management mechanism. It works as follows. for being marked at input buffers before being validated
The origin host generates packets and stores them in a at output buffers, delaying the congestion detection. This
SB as far as there is free space. The mechanism chooses delay may penalize performance. Thus, we plan to modify
the most appropriated packet from the SB to be injected the marking technique in such a way the apparition of
into the network depending on the time when the packets congestion can be detected earlier than in MVCM. The
were generated and the number of outstanding packets new packet marking technique also assumes the existence
per origin-destination pair. Due to the fact that the SB is of buffers at both input and output sides of switches. As
limited, its memory size has to be defined large enough to in MVCM, two bits are reserved in the packet headers
ensure that, in a congestion situation, the SB has free space to manage the marking technique. They will be referred
to store new generated packets destined to non-congested to as MBin and MBout because they are associated to
areas. This way, cold flows can continue to inject packets marking actions carried out at input and output buffers,
into the network while packets destined to a congested area respectively. Like MVCM, the new packet marking tech-
remains in the buffer due to the corrective actions applied. nique also detects congestion when the buffer occupancy
Moreover, as the corrective actions applied by SECM in the switches exceeds a certain threshold, but now the
continue to be based on a DW, it also needs to control packet marking at output buffers is applied regardless of
the number of outstanding packets per origin-destination what happened at input buffers. Therefore, we will refer to
pair at every time. However, in the SB the space initially the new packet marking technique as Input&Output Packet
occupied by each flow is not limited. Therefore, it is Marking (IOPM). As a consequence of the independence
mandatory to add a counter per destination host in order between output and input buffer marking, all the combi-
to maintain the track of the number of outstanding packets nations for the bits MBin and MBout are possible. Table
2 summarize the status bits and actions for the SECM
2 Counter array size = k ∗ round[log2 (DWmax + 1)] bits mechanism.
TABLE 2.
Actions for SECM the link if there is enough buffer space (measured in credits
Ack bits of 64 bytes) to store the entire packet in the next node.
MBin MBout Actions
0 0 No Actions
We have evaluated the new proposals under the follow
1 0 Moderate-Preventive (DW) network configurations, bidirectional Perfect Shuffle 4-ary
X 1 Imminent-Corrective (DW+WP) 3-fly, 4-ary 4-fly, and 4-ary 5-fly. A deterministic routing
algorithm is used to forward packets in the network.
As shown, if an origin host receives an ACK packet Data packets have a payload between 256 and 512 bytes
marked only at input buffer (MBin=1 and MBout=0), plus 22 bytes of control, resulting packet lengths between
corrective actions will continue to be applied only over 278 bytes and 534 bytes. ACK packet size is 22 bytes.
DW in order to stop the beginning of congestion. But, Switches have 1kB buffers associated at both their input
if a marked ACK packet is received with the value of and output ports. Three different traffic patterns were
MBout=1 regardless of what is the value for MBin, then applied. These patterns provoke network congestion with
imminent actions will be applied (DW+WPs), in the same different intensity levels and are intended to check the
way as they were applied in MVCM. Backing to the proposals under diverse traffic conditions.
example given in Fig. 2, according to IOPM, both hosts
TABLE 3.
(h0 and h2) get and ACK with MBin=1 and MBout=X,
Evaluated traffic patterns
which means that both hosts reduce their DW (compared Pattern I Pattern II Pattern III
to the previous MVCM where probably only h2 receives #Srcs. Dest. Dest. Dest.
448 Uniform Unif+HS+Unif Uniform
the ACK with an imminent correction). Notice that the 64 stop+HS+stop stop+HS+stop+HS
IOPM strategy applied by SECM continues to be able to
accurately identify the hot flows (MBin and MBout are First, pattern I shown in Table 3, has been applied to
set), as MVCM does. However, in MVCM, the hot flows obtain the results of Figs. 6 and 10. In this traffic pattern,
often are identified in the last stages of the MINs, that is, 448 sources generate and inject packets according to a
those placed closer to the destination of the hot flows. In uniform distribution of message destinations. Then, 64
these stages, packets exceeding the output buffer threshold sources create a hot-spot in the network by injecting pack-
will likely exceed also the thresholds at input buffers, ets to a single destination. In particular, hosts that send
which allow us to detect congestion by setting the VB uniform traffic remain injecting packets during the whole
bit (once the MB bit has been set). Indeed, the advantage simulation. Hosts that generate hot-spot traffic remain
of IOPM consists in its capability to detect the beginning inactive until the first 50,000 packets have been received.
of a possible congestion situation at intermediate network Then, they start injecting packets with the same injection
stages, see Fig. 5. As congestion trees usually grow from rate as that of the other hosts, but addressed to only
one destination host (the hot-spot). They stop generating
packets when each one has injected 1,000 packets. Second
traffic pattern, pattern II also shown in Table 3, has been
applied to obtain graphs in Figs. 7 to 9. Hosts injecting
packets to the hot-spot, work as in the traffic pattern I, but
hosts injecting uniform traffic start injecting uniform traffic
and randomly each one injects a set of 100 packets to the
hot-spot and after that, they continue injecting uniform
traffic. Finally, a third traffic pattern (pattern III) has been
applied to obtain the results shown in Figs. 11 and 12. It
Fig. 5 Early detection of the congestion roots.
is similar to the pattern I, but injecting only 100 packets
instead of 1000 packets to the hot-spot but with a doubled
the root toward the leaves (source hosts) [6], output pulse. The second pulse is generated before the first one
buffers will fill before input buffers at each network stage. has been totally consumed.
Therefore, a technique as IOPM, able to set the MBout bit We present different types of results. Figs. 6 to 11 show
regardless of the bit MBin, not only will allow us to detect results for proposals with several injection rates from low
congestion sooner than the MVPM technique at the last load till saturation. In particular, Figs. 6 to 9 present results
stage, but also at the intermediate stages, before the hot for the 4-ary 5-fly network and Fig. 10 for the 4-ary
packets reach the last stages. 3-fly network. Fig. 11 shows the improvement achieved
IV. P ERFORMANCE EVALUATION by the SECM mechanism in a 4-ary 5-fly network due
to the new packet marking technique (IOPM) applied.
A. Network Configuration To justify this improvement, Fig. 12 shows the packet
The improvements proposed have been evaluated by marking actions carried out by both techniques (MVPM
using an interconnection network simulator. A generic and IOPM) when applying a medium traffic load. For the
switch-based cut-through network with point-to-point sake of shortness we only show a subset of the results,
links and buffered credit-based flow control with a link but similar results have been achieved for the 4-ary 4-
bandwidth of 1 byte/cycle. Packets will be transmitted over fly network. All the latencies shown are calculated since
generation (time required to deliver a packet including share the same queue. When congestion appears, packets
the time spent waiting at the origin node). Notice that belonging to flows responsible for congestion are stopped
network latency only considers the time spent traversing because of the injection restriction applied by the CMM.
the network. This causes the HOL blocking phenomenon, stopping also
packets belonging to cold flows, which are not affected by
10000 injection restrictions. As traffic increases, more packets
9000 are affected and more queues are needed to keep the
8000
performance.
Latency (cycles)
7000 1q
6000 4q
8q
5000 16q 10000
4000 32q 9000
64q
3000 128q 8000
Latency (cycles)
2000 256q 7000 1q
1000 FVOQ 4q
6000 8q
40 45 50 55 60 65 70 75 80 85 90 5000 16q
4000 32q
Traffic (bytes/cycle) 64q
(a) PVOQ 3000 128q
2000 256q
10000 1000 FVOQ
9000
25 30 35 40 45 50 55 60
8000
Traffic (bytes/cycle)
Latency (cycles)
7000
6000 3kB (a) PVOQ
5000 6kB
12kB 10000
4000 25kB 9000
3000 50kB 8000
100kB
2000 139kB Latency (cycles) 7000
1000 6000 3kB
6kB
40 45 50 55 60 65 70 75 80 85 90 5000 12kB
4000 25kB
Traffic (bytes/cycle) 50kB
(b) SB 3000 100kB
Fig. 6 (a),(b) with pattern I, packet size=278 bytes in a 4-ary 5-fly. 2000 139kB
1000
25 30 35 40 45 50 55 60
B. Evaluation Results Traffic (bytes/cycle)
(b) SB
First results have been obtained for a fixed packet size Fig. 7 (a),(b) with pattern II, packet size=278 bytes in a 4-ary 5-fly.
of 278 bytes. Curves in Fig. 6 (a) show the network These results are confirmed in Fig. 7 when pattern II
performance when different sizes of PVOQ are applied. is applied. The best throughput for a latency of 5,000 or
The number of the queues vary from 1 till 256 queues. 10,000 cycles continues to be achieved with a 25kB with
For comparison purpose FVOQ (512 queues) is included. SB while with PVOQ it is necessary to dedicate more than
The module mapping technique has been used to allocate 32kB (128 queues * 278 bytes) on the first case and at
a new generated packet into the available queues. On least 64kB in the second one. Even if traffic pattern II is
the other hand, curves in Fig. 6 (b) show the network increased or decreased by 10% of the hot-spot traffic, as
performance when different sizes of SB are used. The Fig. 8 shows, SB still obtains good results with a memory
values of the buffer size vary between 3kB (approximately of 25kB.
11 data packets) and 139kB. Notice that, to define a FVOQ The techniques PVOQ and SB have also been analyzed
in MVCM, at least a 278kB buffer would be needed by applying a variable packet size between 278 bytes and
(DWmax*#destinations*packet size) but by using an array 534 bytes. As it can be appreciated in Fig. 9, for a latency
of counters the size is reduced by half because only one of 5,000 cycles, again the maximum throughput for SB is
buffer is required (The calculated value for DWmax is achieved with a 25kB buffer size while PVOQ requires a
2 [8]). If we compare the results of PVOQ and SB, it storage space greater than 64 kB (128 queues * 534 bytes).
is expected that for a maximum latency of 5,000 cycles Moreover to justify that this performance is achieved in
(notice that this value is 10 times the average latency other network sizes, Fig. 10 shows results for a PVOQ and
for a packet when network is injecting uniform traffic SB in a 4-ary 3-fly network. Again with a SB size of 3kB is
and near the saturation point), it would be enough with enough to achieve the best results in both values of latency
a maximum SB of 25kB, while a PVOQ needs more (5,000 and 10,000 cycles) while with a PVOQ is needed
than 32kB (128 queues * 278 bytes). Moreover, for a more than 4kB (16 queues * 278 bytes) in both cases. In
latency of 10,000 cycles (20 times the average latency), conclusion, SB is a cost-effective solution when compared
a SB size of 25kB still continues to be enough, while to the alternative of applying FVOQ or PVOQ in order to
PVOQ needs more than 64kB (256 queues * 278 bytes) remove the HOL blocking effect at origin. This is because
to achieve similar results. Notice that when PVOQ is the required memory size is reduced, it does not depend
used, the number of needed queues increases as congestion on the traffic load, and it is a fixed parameter that can be
does. In this situation, packets belonging to different flows defined at network implementation. Notice that, these sizes
of SB and PVOQ are related to a high injection rate, but if 10000
9000
a medium injection rate is applied, (as in a normal network 8000
situation), it is enough to define a smaller memory (SB or
Latency (cycles)
7000 1q
PVOQ), while the MVCM mechanism keeps always the 6000 4q
8q
5000
FVOQ. 16q
4000 FVOQ
3000
10000 2000
1000
9000
8000 20 25 30 35 40
Latency (cycles)
7000 Traffic (bytes/cycle)

6000 3kB
6kB (a) PVOQ
5000 12kB
4000 25kB 10000
50kB 9000
3000 100kB 8000
2000 139kB
Latency (cycles)
7000
1000
6000
25 30 35 40 45 50 55 60 5000
Traffic (bytes/cycle) 4000 3kB
(a) +10% hot-spot packets 3000 6kB
12kB
10000 2000 25kB
3kB 1000
9000 6kB
8000 12kB 20 25 30 35 40
Latency (cycles)
7000 25kB
50kB Traffic (bytes/cycle)
6000 100kB (b) SB
5000 139kB
Fig. 10 (a),(b) with pattern II, packet size=278 bytes in a 4-ary 3-fly.
4000
3000
2000 plied in conjunction with a SB (25kB), as proposed by
1000 SECM, and with a FVOQ in order to separately analyze
25 30 35 40 45 50 55 60 its influence. The improvement produced with respect to
Traffic (bytes/cycle) MVCM (MVPM+FVOQ) when the IOPM technique is
(b) -10% hot-spot packets
Fig. 8 SB with pattern II, packet size=278 bytes in a 4-ary 5-fly.
applied (both with FVOQ and SB) is due to the early
marking of packets at output buffers. This is possible
10000 because output buffers fill up earlier than input buffers
9000 when congestion appears and packets can be marked at
8000 output buffers (MBout=1) regardless of the value of MBin.
Latency (cycles)
7000 1q
4q This way, the SECM mechanism can apply corrective
6000 8q
5000 16q actions sooner. With the MVCM mechanism, packets will
4000 32q
64q
not be validated (VB=1) until they have been marked
3000 128q (MB=1). So, MVCM introduces a delay on applying the
2000 256q
1000 FVOQ first and most significant corrective action (DW decrease).
25 30 35 40 45 50 55 60
To justify this behavior, Fig. 12 shows the performance
Traffic (bytes/cycle) of the cold flows when no correction actions are applied.
(a) PVOQ Graphs below show the packet marking processes carried
10000 out by MVPM and IOPM, respectively. It can be observed
9000 that, in MVPM, packet validation is carried out in an
8000 effective way, that is, inside the congestion pulses (b,c).
Latency (cycles)
7000
6000 3kB Although a few marking actions take place before the
6kB congestion period (a), causing a DW reduction, it will
5000 12kB
4000 25kB not affect the injection rate because the initial value
3000 50kB
100kB for DWmax will be immediately recovered if the next
2000 139kB
1000
generated packet is injected to an empty queue. Due to the
25 30 35 40 45 50 55 60
independence between input and output packet marking,
Traffic (bytes/cycle) actions over DW are carried out sooner in IOPM (d) and in
(b) SB a more progressive way (e) than for MVPM, producing a
Fig. 9 (a),(b) with pattern II, packet size(278-534 bytes) in a 4-ary 5-fly. continuous attenuation effect and maintaining a prevention
Next, we analyze the effect of applying the new packet situation. Moreover, if congestion is not so severe and
marking technique provided by SECM. Fig. 11 shows a DW can reduce congestion effects on time, as it is the
comparison between the two packet marking techniques situation, it will not be necessary to insert WP because a
applied by MVCM and SECM mechanisms, such as they DW reduction would be enough to manage the congestion
have been proposed. Curves show network performance situation. Only for IOPM a few WP have been inserted
when the new packet marking technique (IOPM) is ap- (f), but they do not have any significant effect.
30000 process of marking and validation. Furthermore, it allows
an early detection of the congestion process at any stage
25000 MVPM+FVOQ of the network. Thus, origin hosts are warned in advance
Latency (cycles)
IOPM+SB
20000 IOPM+FVOQ and can apply corrective actions sooner.
15000 ACKNOWLEDGMENT
10000 This work was supported by the Spanish program
CONSOLIDER-INGENIO 2010 under Grant CSD2006-
5000 00046, by the Spanish CICYT under Grant TIN2006-
15516-C04-01, and by the European Commission in the
65 70 75 80 85 90 95 100 105 110 context of the SARC integrated project #27648 (FP6).
Traffic (bytes/cycle)
Fig. 11 MVCM vs. SECM performance with pattern III. R EFERENCES
[1] InfiniBand, http://www.infinibandta.org
[2] M.Thottetodi, AR.Lebeck, SS.Mukherjee, Self-Tuned Con-

gestion Control for Multiprocessor Networks, Proc. on Int.
Sym. on HPCA (2001).
[3] J.Renato, Y.Turner and G.Janakiraman, End-to-End Conges-

tion Control for InfiniBand, IEEE INFOCOM (2003).
[4] G.Pfister et al.,Solving Hot Spot Contention Using InfiniBand

Architecture Congestion Control, Proc. on Int. Sym. on HPI-
DC (2005).
[5] J.Duato, I.Johnson, J.Flich, F.Naven, P.Garcia and

T.Nachiondo, A New Scalable and Cost-Effective
Congestion Management Strategy for Lossless Multistage
Interconnection Networks, Proc. on Int. Sym. on HPCA
(2005).
[6] P.J.Garca, J.Flich, J.Duato, I.Johnson, F.J.Quiles, F.Naven,

Dynamic Evolution of Congestion Trees: Analysis and Im-
pact on Switch Architecture, Proc. on Int. Sym. on HiPEAC
(2005).
[7] T.Nachiondo, J.Flich and J.Duato, Efficient Reduction of

HOL Blocking in Multistage Networks, Proc. on Int. Sym.
Fig. 12 MVPM vs. IOPM with pattern III. on IPDPS (2005).
[8] J.LL.Ferrer, E.Baydal, A.Robles, P.López, J.Duato, Conges-

V. C ONCLUSIONS tion Management in MINs through Marked & Validated
Packets Proc. on 15th Int. Conf. Euromicro (2007).
In this paper, we have proposed and evaluated a new
CMM, named Scalable Early Congestion Management, [9] J.LL.Ferrer, E.Baydal, A.Robles, P.López, J.Duato, On the
that solves the MVCM weaknesses that jeopardize its scal- Influence of the Packet Marking and Injection Control
ability. In particular, the possibility of replacing the FVOQ Schemes in Congestion Management for MINs Proc. on 14th
Int. Conf. Euro-Par (2008).
at origins by either a PVOQ or a SB has been evaluated.
Results show that it is not necessary to dedicate a FVOQ
at origin hosts to eliminate HOL blocking. Despite the
fact that applying a PVOQ contributes to improve the
scalability, the use of a SB is the best choice because
its size is independent from the traffic load applied, and
it is a fixed parameter that can be defined at network
implementation. Further, it does not depend on the applied
traffic load. We have shown that a small size SB is enough
to maintain a similar performance level that achieved
when applying a FVOQ. On the other hand, the new
congestion management mechanism applies a new packet
marking technique based on making the packet marking
at output buffers independent from the marking at input
buffers. This independence simplifies the packet marking
technique and reduces the delay introduced by the normal

A Scalable and Early Congestion Management Mechanism For Mins

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Scalable and Early Congestion Management Mechanism For Mins

Uploaded by

Copyright:

Available Formats

A Scalable and Early Congestion Management Mechanism for MINs

7000 Traffic (bytes/cycle)

[2] M.Thottetodi, AR.Lebeck, SS.Mukherjee, Self-Tuned Con-

[3] J.Renato, Y.Turner and G.Janakiraman, End-to-End Conges-

[4] G.Pfister et al.,Solving Hot Spot Contention Using InfiniBand

[5] J.Duato, I.Johnson, J.Flich, F.Naven, P.Garcia and

[6] P.J.Garca, J.Flich, J.Duato, I.Johnson, F.J.Quiles, F.Naven,

[7] T.Nachiondo, J.Flich and J.Duato, Efficient Reduction of

[8] J.LL.Ferrer, E.Baydal, A.Robles, P.López, J.Duato, Conges-

You might also like