Professional Documents
Culture Documents
Competitive Brief:
DCX Architecture and Performance
Reviews the Brocade DCX design and refutes erroneous
claims from Cisco about its architecture & performance
COMPETITIVE BRIEF
INTRODUCTION
Cisco has made numerous claims about the Brocade platform architecture and performance. This
paper responds to their marketing glossy Performance Testing on Brocade 48k (June 2006) and
similar claims they have made regarding the Brocade DCX Backbone.
The most obvious flaw in the Cisco testing is that it was not independent. Even if there were no other
problems with their process and results, this would be an issue. It is easy for a vendor to contrive test
conditions to artificially produce a desired result, and then further manipulate statistical claims based
on that already-manipulated result. 1 Independent testing is one way to prevent such blatant cheating.
Cisco could have chosen to allow independent testing, or to participate in an open bake-off. It appears
that they did not believe that they would fair well in a fair test, and chose to avoid any testing
environment which would allow comparisons to be conducted under experimentally valid conditions.
Along the same lines, when Cisco makes negative statements about the Brocade DCX system, they do
not provide essential configuration details such as software and firmware versions, hardware
revisions, or cabling and traffic flow information sufficient to duplicate their testing. From a scientific
standpoint, this means that there is no evidentiary value to their claims, because it is not possible to
determine what they did to achieve their claimed results. Based on some of the results they typically
report, their testing appears to have been performed with defective equipment and/or using nonproduction hardware/software/firmware on the Brocade platforms. Unlike a bake-off, the Cisco testing
was conducted without any configuration assistance from Brocade or a qualified Brocade partner, so
the Cisco internal personnel configuring the test bed were unqualified to perform Brocade DCX
installation or configuration.
Some of the known problems with their methodology have historically included:
The Cisco claims have little if any technical validity, and in some cases are simply and demonstrably
direct falsehoods. Such claims can therefore be classified as marketing FUD (Fear Uncertainty and
Doubt) rather than as a technical comparison. Their intent appears to be to deflect attention from
their own architectural shortcomings, rather than to actually compare the platforms.
This whitepaper reviews the Brocade DCX architecture, refutes the worst of their erroneous claims,
and provides some level of discussion about SAN performance measurement methods and their
applicability to real-world performance. Where it is necessary to debunk a Cisco claim, this paper tries
to provide a balanced view of the matter, and on the potential cause of the error on the part of the
marketing personnel at Cisco who created the claims in question. For instance, if it is possible that
Cisco merely lacked knowledge or understanding, or used faulty equipment, then this paper will
indicate that this is a possibility rather than assuming that every incorrect claim was an intentional
falsehood designed to mislead their customers. The idea is not to follow Cisco down the road to a
mud-slinging match. Rather, it is to give the reader enough information to reach technically valid
conclusions about the Cisco claims, and about the Brocade platform characteristics.
Mark Twain said, There are three kinds of lies. Lies, damn lies, and statistics.
2 of 16
COMPETITIVE BRIEF
3 of 16
COMPETITIVE BRIEF
The evidence they provide for this is that the uncoordinated forwarding decisions made by DCX
port blades can result in imbalances in the allocation of bandwidth. There are numerous problems
with this theory.
For instance, it is possible to create an imbalanced scenario in any network device. The DCX can
exhibit imbalances, sure, but so can the Cisco directors. In fact, if you build a core/edge network of
Cisco directors, and the same network of DCX chassis, it is even possible to create an identical
imbalance if you load the networks with the same traffic pattern.
The leads to the second major problem with their theory. From the point of view of an application, it
doesnt matter if an imbalance occurs within a switch, or between switches in a network. The
application doesnt even actually care about imbalances per se. (If applications crashed when
bandwidth imbalances occurred, then there wouldnt be any functioning networks anywhere in the
world, because they all have imbalances in them.) What applications care about is whether or not they
get enough bandwidth to (a) keep their storage connections from timing out, and (b) meet their
application performance goals.
Now, it is critical to understand that imbalanced portioning of bandwidth only occurs on links with
sustained congestion. Uncongested links provide each application with 100% of the requested
bandwidth. Even if the link is running at 99% of capacity, there will be no imbalance. Links which
congest only briefly may experience a slight variation in latency during the transient congestive state,
but this has no impact on application-level performance. Network imbalances such as the ones Cisco
is describing only occur and potentially impact applications when a congestive condition persists for
extended periods of time.
The significance of this is that, if you have links with continual congestion, you already have an
application level problem regardless of the balancing of the resource. You can design a congested
Cisco network which experiences imbalances. The fix is to add more ISLs, thus eliminating the
congestion, in which case the imbalance will also go away. The same scenario applies in the same way
to DCX networks.
It is also worth noting that many customers want an imbalanced allocation bandwidth on congested
links. That is the whole point of QoS software: high priority flows get a bigger share of the bandwidth
than the lower priority flows. Brocade provides a number of mechanisms of deliberately creating
imbalances in order to favor higher priority applications, including QoS, ingress rate limiting, traffic
isolation zones, and local switching. But for most customers, less is more with these features. The
best option is to design a network which doesnt experience sustained congestion in the first place.
Cisco Claim:
The DCX has internal ISLs which cannot support 8Gbit
Reality:
It is important to note that the internal ASIC connections in a Brocade DCX are not E_Ports connecting
an internal network of switches via ISLs. The entire platform is a single domain, and a single hop in a
Fibre Channel network. When a port blade is removed, a fabric reconfiguration is not sent across the
network. Back-end connections use the same frame format as front-end ports to maximize efficiency,
but because they are contained within a single switch, there is no need to run any of the higher layer
(service) FC protocols across these connections.
The Brocade DCX features an internal Channeled Central Memory Architecture (CCMA) fabric of
purpose-built ASICs capable of switching at 320 Gbit/sec per chip. (640 Gbit/sec cross-sectional.) The
DCX is powered by a matrix of these Condor2 ASICs, which delivers up to 256 Gbit/sec per slot, net
of local switching. In addition, the high-density blades can take advantage of local switching to achieve
348 Gbit/sec per slot. This yields 3 Tbit/sec (6 Tbit/sec full duplex) for a single platform, not counting
4 of 16
COMPETITIVE BRIEF
the Inter-Chassis Links. If two platforms are connected via ICLs, the overall system delivers 6 Tbits/sec
(12 Tbits/sec cross-sectional).
The links between the ASICs within a DCX are CCMA links, not ISLs. ISLs carry traffic for a number of
standards-defined FC services such as FSPF, name server, zoning updates, and so on. None of this
protocol overhead is present on the DCX backplane, so all of the bandwidth described above is
available for application data. Since the term ISL is defined in the FC standards in a specific way,
and the inter-ASIC links inside the DCX do not match the definition of ISL, this specific Cisco claim is
a technical falsehood rather than simply a misleading bit of spin. The inter-ASIC links are only similar
to FC ISLs in one respect: they carry Fibre Channel frames. Of course, the same can be said about the
backplane of the Cisco platform. Carrying FC frames is, after all, the whole point of those links.
Besides, lets say for the sake of argument that Cisco was telling the truth, and DCX was a network in
a can with ISLs on the backplane. Is Cisco saying that networks are bad? Cisco? Unless they are
willing to admit that their own products cannot or at least should not ever be networked together, then
their assertion that the internal characteristics of DCX are like a network should be answered with a
resounding, So what?
Beyond that, there were fundamental flaws in the method Cisco used to prove that the DCX
backplane traces are not high enough performance support 8Gbit. To reach this conclusion, they
disabled the backplane trace balancing mechanism. That is, they adjusted features on their FC frame
generator and/or turned off features on the DCX to artificially create hot spots on the backplane.
This is disingenuous for several reasons. Like the uncoordinated forwarding decisions discussion in
the previous section, it is possible to e.g. design a network of Cisco chassis and get the exact same
behaviors using the exact same test equipment settings. (Except, of course, with Cisco you need to
use 4Gbit FC test equipment.) Like Brocade, Cisco uses FC exchange boundaries when balancing
links. If you turn off exchange ID rotation on the frame generator (thus making it behave radically
differently than any real FC device) then you will get hot spots inside the DCX, but you would also
create hot spots in the Cisco network.
In any case, this is academic. Because real hosts and storage devices do change exchange IDs on a
very, very frequent basis, balancing IO on exchange boundaries works quite well. Which is why both
Brocade and Cisco use this method.
5 of 16
COMPETITIVE BRIEF
The Brocade DPS feature does not require the use of adjacent ports, and balances IO on a perexchange basis much like the feature Cisco calls trunking. When DPS is combined with Brocade
frame-level trunking, the DCX can produce a 64-link balanced pipe. Here is how that works:
Each Condor2 ASICs ports can be combined into virtual interfaces, or frame-level trunks, of up to
64Gbits each. It is also possible to balance IO between multiple trunk groups to create a pipe of up
to 512Gbits/sec. between different platforms in a fabric by using DPS. Since Cisco is limited to just
the DPS-equivalent feature, lacks 8Gbit ports, has no true frame-level trunking feature at all, and is
limited to 16-link pipes, it is clear that the Brocade feature is actually considerably faster and more
flexible. The maximum Cisco trunk bandwidth is 64Gbit, vs. 512Gbit for DCX.
For reference, here is a feature comparison chart:
Brocade DCX Feature
Cisco Equivalent
Advanced Trunking
Brocade Advanced Trunking provides superior performance to DPS or to Ciscos pseudo trunking.
Advanced Trunking implements true load balancing by "spraying" frames across all the links in the
trunk, while preserving in-order delivery. The other advantage over DPS is that frames are not dropped
when a member link goes down. No re-routing takes place either. The only frames that might be
dropped are those physically in flight on the link which failed. With DPS (or Cisco pseudo trunking) it is
necessary to re-route the group, which is disruptive.
DPS or Cisco Pseudo-Trunking merely implement load sharing, not true balancing, allowing some
links to be congested and some underutilized. This could happen, for instance, when multiple
exchanges hash to the same value and therefore end up on the same link. If several high traffic, longlived exchanges are directed to the same link, that link becomes congested, at the same time when
other links with low traffic exchanges have bandwidth to spare.
For a more detailed description of DPS and frame-level trunking, see Chapter 8 in the book Principles
of SAN Design, which is now in its second edition. (This edition was released in September 2007; go to
http://www.bbotw.com and search for SAN Design to purchase this book.)
6 of 16
COMPETITIVE BRIEF
To be charitable, it is possible that the Cisco marketing personnel writing the document might not
have understood what a CRC is, how it is performed, what it is intended to accomplish, or how cutthrough switching works.
The purpose of a Cyclic Redundancy Check in an FC switch is to detect alterations of data during
transmission, and to cause defective frames to be discarded before they reach the end-point
application or LUN. The device dropping the bad frame could be a switch, router, HBA, or storage
controller. As long as the bad frames are dropped before they can become bad data at the application
level, then the CRC did its job.
To perform a CRC on a frame within an FC switch, it is necessary to run a formula against all of the bits
in the frame, then compare the result of the formula against the CRC which was written into the frame
by the originating device. The definition of cut-through switching is that the switch will start
transmitting a frame out its destination port before the entire frame has been received. It is a logical
impossibility to perform a CRC on a frame until the entire frame is received because of the
mathematical nature of CRC formulas. Since the frame has already been largely transmitted before
the switch has enough information to entirely calculate the CRC, it is certainly the case that a Brocade
platform can deliver a frame with a CRC error well, sort of.
The problem with Ciscos argument is that, before an FC frame can be considered delivered by a
switch port, the transmitting switch needs to append a valid 4-byte End of Frame Marker after all of
the data has been sent. This can be thought of as the green flag to the receiving device, which tells it
that all of the frames data was properly sent, as well as telling that device if the frame was the last
frame in a sequence.
Because Cisco is lying about the lack of CRC checking in a Brocade platform, it turns out that the
transmitting port will in fact have determined whether or not the CRC is valid before the time comes to
transmit the EoF marker. If there was an error in the frame, the Brocade platform simply does not
transmit the green flag signal to the receiving device, but rather marks the EoF as bad, and
standards mandate that the receiver discard the bad frame. Of course, the receiver could also
determine that the frame was bad by performing its own CRC calculation. In reality, FC end-point
devices do both: they perform their own CRC check and look for bad or missing EoF markers, so there
are actually two standards-based rules which prevent a node from accepting a bad frame.
It turns out that this is the only way to handle CRC in a cut-through architecture. Again, it is
mathematically impossible to calculate CRC until the entire frame is received, and the definition of cutthrough is that the switch begins transmitting the frame before the entire frame is received. It also
turns out that this method works exactly 100% of the time. Brocade has been using this method of
handling CRCs since its first FC switch, and therefore it is implemented in over ten million ports of
production SAN switches and routers installed throughout the world. The method has been vetted with
all Brocade OEMs, and there has not been one single case of a frame with a bad CRC landing bad
data onto an application or LUN. For that to happen, the receiving device would need to violate FC
standards with respect to the EoF, and fail to perform its own CRC calculation. It turns out that no such
device exists, ever has existed, or is anticipated to exist in the future in any production datacenter.
Remember that the purpose of a CRC is to prevent bad data from reaching the endpoint in a network.
Brocade ASICs accomplish that goal. This means that the Brocade cut-through CRC method does, in
fact, accomplish the intended purpose.
It is also worth noting that CRC errors are (a) very rare, and (b) always indicate some kind of failure in
the fabric. For example, this could be a failing SFP, a bad cable, or a malfunctioning HBA. In a
working FC fabric, there should be essentially zero CRC errors. As a percentage of all SAN traffic on
working ports ever analyzed by Brocade support, traffic with CRC errors constitutes a second-order
derivative, i.e. the number is so close to zero that it is mathematically indistinguishable from zero, and
would therefore be dropped out of any statistical calculation related to SAN traffic. Many CRC errors
have been observed, but only on catastrophically failing links.
7 of 16
COMPETITIVE BRIEF
This implies two things. (1) That Ciscos argument is irrelevant, even if their claim was true, which it
isnt. (2) That Cisco substantially mis-configured their test equipment or used defective gear. A
working FC fabric does not have CRC errors. The fact that Cisco was detecting CRC errors in their
testing should have told their personnel that their fabric was not working and therefore had misconfigured and/or defective equipment in the test bed, e.g. a malfunctioning testing device, numerous
bad SFPs, bad cables, etc..
Since the document appears to have been written by marketing personnel rather than by engineers, it
may not be surprising that they did not know the significance of CRC errors and thus failed to correct
the defective equipment in their test configuration, but in any case, this issue alone would be more
than enough to invalidate their testing even if no other issues existed. Results from a known defective
test bed simply cannot be used to form any scientifically valid conclusions.
8 of 16
COMPETITIVE BRIEF
1. Cisco claim conflicts with testing performed by Brocade, which implies that Cisco had
malfunctioning equipment, did not know how to operate the gear, or falsified their results.
Given their CRC problems, the faulty equipment scenario seems likely. In any case, Brocade
has not observed a similar result, and cannot duplicate Ciscos result based on the
information in their claims. This applies to their claims about latency, performance
decreasing over time, lost frames, dead internal ISLs, VC_RDY credit loss, and needing to
reboot the chassis to clear errors. For such claims, Brocades response must be the DCX
does not seem to work they way they indicate, so they appear to be lying or mistaken. Or
2. Cisco claim is based on a contrived traffic pattern, which bears no resemblance to IO patterns
ever observed by Brocade in any SAN. This case deserves more discussion. For example
Cisco Claim:
Brocade exhibits lower performance when switching continual streams of 60-byte frames.
Reality:
The standard FC frame size is a bit more than 2k bytes: 30 times larger than the frames used in this
test. Most FC testing is conducted with 2k frames, since most frames in a real-world fabric are that
size. Brocade has been selling FC switches for more than ten years: over three times longer than
Cisco. Brocade is therefore in a good position to understand the traffic patterns of a fabric perhaps a
little better than Cisco does. This claim may simply be a mistaken understanding on their part about
what kinds of traffic patterns are actually going to traverse a fabric.
It turns out that there are usually between 5x and 20x more 2k frames in a typical FC SAN than all
other frame sizes combined. This has to do with the way that FC nodes interact with filesystem block
sizes, SCSI drivers, and HBAs. Small frames are typically only used by nodes at the beginning or end of
a conversation and all intermediate frames are full size. When Brocade conducts testing of its
switches and tunes its ASIC designs for performance, the focus is on testing and tuning things for
performance in real-world scenarios, which means optimizing for mostly large frames.
It is possible to conduct a test using nothing but 60-byte frames, even though no SAN application ever
created behaves this way. Cisco did so. Not surprisingly, the throughput of the platform is reduced
when running a continual stream of 60-byte frames. In one test, throughput dropped from over
800Mbytes/sec down to ~600Mbytes/sec. Brocade testing yielded the same result.
Brocade does not consider this to be an issue for several reasons.
(1) This traffic pattern has never been observed in a real world SAN deployment. Optimizing a product
for a non-existent traffic pattern would not seem productive. It may be that Cisco believes this case to
be important because they are too new to the industry to understand SAN traffic patterns yet, or it
could be that they contrived this case to mislead their customers. Either way, it doesnt seem valid.
(2) To a large extent, lower performance on small frames is a laws of physics problem, i.e. there isnt
even a theoretical way to get equal performance on this test. That is because the ratio of header,
trailer, inter-frame gap, and other overhead vs. payload size is different with small frames. With a 60byte frame, almost half of the total frame is overhead. Indeed, the theoretical max throughput for 2k
frames is 840Mbytes/sec. but for 60-byte frames is only ~640Mbyte/sec.. Its just going to go slower.
(3) Even if an application was trying to generate a continual stream of 60-byte frames (which doesnt
happen) it would almost certainly not be able to do so because the sending node would become IOPS
bound. That is, end points in a fabric generally have a limited number of IO operations per second that
they can support, e.g. because of CPU constraints, and a sustained 4Gbit/sec stream of 60-byte
frames would exceed the IOPS limit of even the fastest nodes. It is one thing to drive this IO pattern on
SAN testing hardware. It is quite another to drive it from a SAN-attached application.
9 of 16
COMPETITIVE BRIEF
(4) To create an application with this characteristic, it would be necessary to use a block size on the
order of 30 bytes. It seems likely that a block size that small would have performance problems no
matter what the SAN fabric did with the frames. Perhaps nobody on the Cisco team has any filesystem
experience outside of NFS or CIFS, but it seems likely that customers deploying SANs will know a quite
a bit about the subject. Brocade would be most interested in discussing the use case if any customer
has a technical need for a filesystem with a 30-byte block size. In any case, Brocade most often sees
filesystems with block sizes considerably larger than the FC maximum frame size, and in that case it is
simply impossible for an application to generate IO similar to the contrived Cisco test case.
(5) Finally, it is worth noting that because of reason #2 (above), Cisco also exhibits lower performance
on this kind of contrived test, so Brocade cannot really view this as a serious competitive issue.
Note that this line of reasoning applies to most of the test cases claimed by Cisco. For instance, the
HoLB test that they claim Brocade failed was contrived by using two separate continual streams of
60-byte frames from one source. Brocade has never seen an application which generates one
continual stream of 60-byte frames, much less two at the same time. The bottom line is that there is
no real world applicability to any of the Cisco results contrived using streams of 60-byte frames
which turns out to be most of the cases in which they claim a superior result.
10 of 16
COMPETITIVE BRIEF
because that port is already busy. The new frame must be stored in a buffer, and wait until the frames
in line ahead of it drain out of their buffers. This is shown in Figure 1.
In this example, three ingress ports (Sources #1 thru #3) are trying to transmit streams of frames to a
single destination. If each source is attempting to transmit at line rate, and the destination has the
same rate as the sources, then congestion will occur. Each source will receive a portion of its line rate
in sustained bandwidth. Since FC interfaces are serial, only one frame can be transmitted at a time, so
the switch will need to hold frames in buffer memory while previously-received frames are transmitted.
If a new frame enters the switch as shown in the upper-right of the diagram, it will have to wait for
other frames to be transmitted before it can get time on the shared interface. In this figure, depending
on how the output queue is prioritized, it would have to wait for at least seven frames (the other
source #3 traffic) or possibly more than 20 frames (adding in the source #1 and #2 traffic) before
being served by the transmission logic.
11 of 16
COMPETITIVE BRIEF
First of all, as extensive testing has confirmed, the Brocade platform exhibits best-in-class latency
when interfaces are running even slightly below line rate. For locally switched traffic, latency is
measured in nanoseconds. For non-local traffic, latency is around 2.4 microseconds between 8Gbit
interfaces. Thats ten or more times faster than a Cisco platform, and indicates the true switching
latency which will be experienced in real-world scenarios, because real applications do not actually
sustain 100% of line rate. At most, they might sustain a percentage of line rate in the mid 90s, and
even that is rare.
Second, it is possible to prove that the increase in latency observed at full line rate is related to the
number of output buffers on a port. Brocade platforms have the ability to tweak the number of
buffers allocated to a given port. This feature was designed for long distance applications, but it can
also be used in intra-datacenter scenarios. When this feature is used to increase the buffers on a
congested output port, the latency reported by fabric test equipment goes up, showing that the test
equipment is measuring buffer depth rather than switching delay.
At this point, it may not be clear what distinction is being made. After all, why would an application
care if delay was caused by queue depth, or by slow switching logic? The answer has to do with the
way that Cisco rigged their own results in this test case.
Since serialized FC interfaces cannot be made to transmit frames in parallel, all vendors will queue
frames in this scenario. Latency caused by queue depth is therefore a fundamental mathematical and
physical phenomenon; it isnt possible to move frames across a congested interface any faster than
Brocade does it. So how can Cisco be showing lower latency in this test case? The answer is that they
moved the delay to the far side of the tester interface from where the test equipment is measuring it.
This is where Cisco performs the slight of hand. When their platform is saturated, it stops accepting
frames even if it has the memory to store them. This does not in any way whatsoever improve
application performance. It prevents frames from entering the congested switch, which means that
fewer frames are waiting in line for a congested port, which means that test equipment will show a
lower latency result. But it achieves that by preventing nodes from sending IO into the fabric, so
queuing is still occurring: it is just moved out of the fabric and into the application.
Figure 2 illustrates the same scenario as in the previous example, except that the switch is handling
the streams differently. Instead of letting frames enter the switch and handling queuing at the point of
congestion, the switch is now preventing source ports from sending data into the fabric in the first
place. Publicly available documentation from Cisco confirms that this is how their platform works.3
Frame-level testing equipment will not measure this effect at all because it has no application layer
capabilities. Still, the effect on application layer latency is equal to or greater than the latency caused
by buffering within the switch. This latency is generated artificially by congestion control
mechanisms, which push back on a node port. The effect of this on the application is that, instead of
waiting for frames to propagate through a congestion point in the fabric, the IO sits inside the host
waiting for the fabric to accept frames in the first place. In other words, it moves the latency from the
switch within the fabric to the application within host. This makes layer 2 test results look better by
hiding the delay from the test equipment, but it does not make the application run faster. The same
number of frames are still waiting in line; they still take the same amount of time to get to the head of
the line. Cisco just relocated the line.
3 For example, their web site has a PDF entitled Introduction to Storage Area Networking which
describes Forward Congestion Control this way: When a switch detects a congestion port (sic) in the
network, the switch generates an edge quench message to the sources as an alert to reduce the rate
at which frames should be injected into the fabric to avoid further head-of-line blocking. In addition to
admitting that they are solving downstream congestion by artificially producing congestion on
upstream ports, this is a public admission that their architecture is subject to head of line blocking.
12 of 16
COMPETITIVE BRIEF
Source #1
Source #2
Source #3
The circle at the end of this arrow is the point at which layer 2
latency is measured by Agilent.
13 of 16
COMPETITIVE BRIEF
In order to perform the latency under load test correctly, Cisco should have run the load up only to,
say, 99% of line rate. At 100% of line rate, the test would be more properly called queue depth
detection or latency under congestive loads.
In any case, this category of test result t is only applicable to customers who are running their FC
infrastructure at 100% of line rate for extended periods of time, instead of, for example, 99.9% of line
rate or less. If links are even slightly below the congestion point, then Ciscos Forward Congestion
Control does not do anything at all. In that case, their store and forward mechanism is a tenth or less
as fast as the Brocade cut-through approach. At 100% of line rate, Brocade will exhibit greater Layer
2 delay than Cisco, but Cisco will exhibit greater application layer delay than Brocade because it will
stop source ports from inserting frames into the fabric and cause application traffic to back up
inside hosts. The bottom line is that the Cisco approach only provides a performance benefit for
customers who are using FC test equipment instead of using, for example, real applications.
Local Switching
In addition to supplying more backplane bandwidth per slot than Cisco, Brocade can deliver 8Gbit/sec.
bandwidth per port even on oversubscribed blades through a process called local switching.
In the Brocade DCX, each port blade ASIC exposes some ports for connectivity and other ports connect
to the backplane. If the destination port is on the same ASIC as the source, the chip can switch the
traffic without needing to leave the blade. On the Brocade 16- and 32-port blades, local switching is
performed within 16 port groups. On the 48-port blade, traffic can be localized in 24-port groups.
Even if the traffic in question is running on an oversubscribed blade, the localized traffic does not use
the oversubscribed resource. Since localized traffic does not use backplane bandwidth, it does not
count against the subscription ratio. It cannot impact or be impacted by traffic from other devices.
backplane, locally switched devices are guaranteed 8 Gbit/sec bandwidth. This enables every port on
a Brocade DCX high-density blade to communicate at a full 8 Gbit/sec speed with port-to-port latency
of just 800 nanoseconds, about twenty times faster than the MDS. This is an important feature for
high-density/high-performance environments because it allows oversubscribed blades to achieve full
non-congested line rate.
Cisco Claim:
Taking advantage of local switching is difficult.
Reality:
Sometimes it is difficult. Sometimes it is easy. It turns out that not all customers are trying to do the
exact same thing with their SANs.
For example, when a customer is initially migrating Directly Attached Storage (DAS) systems onto a
SAN, they already know the host-to-storage connectivity patterns since DAS traffic is 100% localized by
nature. In a DAS to SAN migration, it is not only possible to maintain locality, it actually very easy.
Similarly, many customers have a small number of mission-critical systems which have a known
relationship with their storage ports, and, because they are critical, these system tend to get changed
rarely. In such cases, it is easy to understand which hosts talk to which storage ports, and simply
plug them into the same port groups. This gets the availability and performance benefits of localization
for the systems which need it most, without needing to attempt to localize all flows in the entire fabric.
On the other hand, some SAN-native deployments (e.g. large VMware clusters) are architecturally
incompatible with the concept of localized traffic. Localization isnt hard in that case; its impossible.
14 of 16
COMPETITIVE BRIEF
Brocade has never mandated the use of locality. It is offered for cases in which it makes sense.
Brocade supports local switching, but does not require it. Think of it as just one more tool in the
Brocade SAN design kit, which isnt present in the Cisco kit. Their claim that local switching isnt
needed would be more credible if it werent for the fact that they are architecturally incapable of
delivering the feature. As it is, this appears to be a case of sour grapes on their part.
CONCLUSION
The Cisco marketing FUD document entitled Performance Testing on Brocade 48k contains
numerous direct and seemingly deliberate falsehoods. The similar claims they make about the DCX
fall into the same category. The pseudo-technical content is misleading at best, and intentionally
dishonest at worst. They appear to have been using out of date, defective, and/or mis-configured test
equipment, and did not appear to understand basic SAN terminology or protocol characteristics. Since
they decided to conduct their testing behind closed doors rather than participating in open third-party
testing, and did not even include many test details in their marketing glossy, the document and
related claims cannot credibly be viewed as having any technical merit.
Based on extensive testing, it is clear that the Brocade DCX is the lowest-latency / highest-performing
platform on the market. The only way to illustrate other results involves concocting smoke and
mirrors test cases, which obscure the flaws in the MDS architecture, and / or deliberately avoid using
the Brocade platform in realistic ways. If Cisco wishes to participate in a neutral, third-party refereed
bake-off test, rather than hiding their methodology in the shadows, then Brocade would be happy to
meet them head to head. But until they are willing to have competitive testing performed in the open,
there can be no evidentiary value assigned to their biased and unsubstantiated claims.
15 of 16
COMPETITIVE BRIEF
16 of 16