Throughput Performance: Comparing Apples to Apples
First Published Wednesday, 15th July 2009 07:16 pm from Real-Time Innovations (RTI) : Rick Warren
The opinions expressed by this blogger and those providing comments are theirs alone, this does not reflect the opinion of Automated Trader or any employee thereof. Automated Trader is not responsible for the accuracy of any of the information supplied by this article.
If you're serious about
the performance of your distributed system, you probably read
with interest the performance claims made by network middleware
vendors. And if you're a network middleware vendor,
you've probably published your share of performance
claims. (RTI has comprehensive performance numbers available for
both our href="http://www.rti.com/products/dds/benchmarks-cpp-linux.html">DDS
and href="http://www.rti.com/products/jms/latency-throughput-benchmarks.html">JMS
APIs.) But in order to know which claims are meaningful
- and more importantly, which are useful to
you - it's important to understand
what you're reading. In the words on one of my
coworkers, "many apples are compared to
rhinoceroses."
First of all, there are three
primary axes along which people tend to measure network
performance:
-
Latency: end-to-end, round-trip,
and loaded latency; the amount of time it takes to send a certain
amount of data from somewhere to somewhere else under various
conditions
-
Latency
jitter: the amount of variation in latency
measurements
-
Throughput: the number of data
quanta (either raw bytes or fixed-size samples/messages)
transmitted over a given amount of time
The whitepaper " href="http://www.rti.com/mk/future.html">The Data-Centric
Future" on the RTI website has a good
overview of general performance considerations, so I
won't reiterate that material here. What I'll
focus on today is throughput, and specifically some of the
subtleties you'll want to keep in mind when
you're evaluating a networking middleware
product.
Size Matters.
/> The size of the data samples you send in a
throughput test has a huge impact on the throughput you measure.
At small data sizes (dozens to hundreds of bytes), performance is
dominated by the expense of traversing the layers of software in
between your application and the network: your middleware, if
any; your operating system's network stack, including
any system calls; and your network driver. As the data size
increases, these fixed costs become less significant relative to
the cost of actually copying the data (including the
"copy" of the data across the
network).
This difference in the throughput
profile among different data sizes means that if you're
reading a performance report, and you see
"bytes" but your application deals with
"samples" or "messages,"
it's important to understand the sample size(s) used to
generate that report. It's not sound to generate some
data with one sample size, add up the total number of bytes, and
then divide by another sample size, because you're not
correctly accounting for per-sample constant factors.
One vendor - an enterprise software vendor
newly entering the messaging market, who shall remain nameless
- made a series of throughput measurements for 256-byte
samples. This vendor then declared that what their customers
really cared about was 16-byte samples, and so immediately
multiplied their measurements by 16 (= 256 / 16) and published
those extrapolated results! Depending on how they were planning
on sending those 16-byte samples, they were assuming either that
(a) the cost of performing 16x the number of
network sends is zero or (b) the cost of
packing and unpacking 16-byte samples into and out of 256-byte
chunks is zero. Of course, both of these costs are emphatically
non-zero. But that brings me to my next
point:
Understand data
batching.
Because of the high fixed
cost of a network send, especially relative to the cost of
copying a small data sample, it is a common practice to batch
multiple data samples together and send them together as a
unit.
Suppose you're publishing
64-byte samples. Remember that each time you send a packet on the
network, you're also sending a couple dozen bytes of IP
header data and whatever meta-data your middleware requires. That
adds up to a 30-100% space penalty - added to the time
penalty discussed above. But if you can amortize these costs over
many samples, they become much less important. In fact, batching
data can increase your effective throughput by more than an order
of magnitude in some cases.
In our experience,
you can send 50-80,000 smallish packets per second using
commodity OS, computing, and networking components. When you see
samples-per-second-style throughput numbers much higher than
that, it means that those samples are being batched under the
hood.
Note that data batching is an intrinsic
part of the TCP protocol, so any middleware implementation that
relies on TCP batches data all the time.
Differentiate between
one-to-n and aggregate throughput.
/> There are two ways to look at throughput: an
application-centric view and a network-centric view. Which one of
these a given community cares about governs which one gets
measured and reported by vendors that market into that community.
It means that you need to be aware of what you're
reading when you see "n samples
per second," especially when dealing with new
vendor.
-
Applications typically care about
the number of samples that can be sent/received to/from
particular destinations using particular data producing and
consuming objects. For example, suppose I'm publishing
sensor data. I know that my device has a new value available
10,000 times every second. If I try to send that much data, will
it work? This viewpoint is relevant to applications with
individual data streams that place a significant burden on the
network all by themselves. Streaming media, high-rate sensor
data, real-time command and control, and other similar domains
are in this category. Throughput data is typically reported from
one (the data producer) to n (the number of
data consumers).
- Other systems take a
network-centric view, measuring the total
number of samples in flight across an entire system. This view is
relevant when both of the following are true:
(a) individual data streams may not be
demanding by themselves, but there are many of them, and
(b) all of those streams have a common choke
point. Enterprise integration and web services often fall into
this category, as services are invoked on human time scales, and
middleware implementations typically include central message
broker components. Network-centric throughput is typically
reported in aggregate, from n (producers) to
m (consumers), where all
(n + m) entities
bottleneck through a common broker. The goal of the test, in such
a case, is to measure the limits of the broker itself, not of the
applications that use it.
When
you're evaluating a throughput claim, be sure you know
which one of these scenarios you're looking at! I can
tell you that there was a flurry of activity at RTI when a
competing vendor started touting 6 million samples per second
- until we read further into that vendor's
testing methodology and discovered that the result was an
aggregate across 60 applications. For the record, the throughput
numbers you will find on RTI's website -
showing href="http://www.rti.com/products/dds/benchmarks-cpp-linux.html">over
1 million samples per second - are measured
1-to-n. That's one publisher on
one box publishing to one destination.
Understand the architecture.
To me, 1-to-n numbers are the
honest numbers when you have a peer-to-peer solution, as RTI
does. That's because, assuming your switch can keep up,
there's nothing to test other than the so-called
"client" applications themselves. We can
saturate a gigabit link for data sizes not much over 100 bytes,
and come close even for very small sizes. Do you have more data
to send than can fit over a single link? Then add another link.
At that point, you're no longer testing the performance
of the RTI infrastructure; you're testing the
performance of your switch.
Knowledge is Power, or, Forewarned is
Forearmed.
Now, hopefully, you have a
better understanding of how throughput performance numbers data
is measured and reported, what to expect from that data, and what
to look out for. You're ready to enter the wide and
wild world of performance evaluation. (Care to try your own hand?
Pick up a copy of href="https://www.rti.com/downloads/dds.html">RTI Data
Distribution Service or href="https://www.rti.com/downloads/jms.html">RTI Message
Service and run the comprehensive performance test you
can find in RTI's href="https://www.rti.com/kb/index.html">Knowledge
Base - search for "Example
Performance Test.")
Of course,
there's a lot more to network performance besides
throughput. No doubt I (and/or someone else) will be returning to
talk about latency, loaded latency, jitter, competitive analysis,
and other topics in the future - stay tuned.
href="http://feeds.wordpress.com/1.0/gocomments/rtidds.wordpress.com/125/">
alt="" border="0"
src="http://feeds.wordpress.com/1.0/comments/rtidds.wordpress.com/125/"
href="http://feeds.wordpress.com/1.0/godelicious/rtidds.wordpress.com/125/">
alt="" border="0"
src="http://feeds.wordpress.com/1.0/delicious/rtidds.wordpress.com/125/"
href="http://feeds.wordpress.com/1.0/gostumble/rtidds.wordpress.com/125/">
alt="" border="0"
src="http://feeds.wordpress.com/1.0/stumble/rtidds.wordpress.com/125/"
href="http://feeds.wordpress.com/1.0/godigg/rtidds.wordpress.com/125/">
alt="" border="0"
src="http://feeds.wordpress.com/1.0/digg/rtidds.wordpress.com/125/"
href="http://feeds.wordpress.com/1.0/goreddit/rtidds.wordpress.com/125/">
alt="" border="0"
src="http://feeds.wordpress.com/1.0/reddit/rtidds.wordpress.com/125/"
/>
src="http://stats.wordpress.com/b.gif?host=blogs.rti.com&blog=7350090&post=125&subd=rtidds&ref=&feed=1"
/>



