Throughput Performance: Comparing Apples to Apples

First Published Wednesday, 15th July 2009 07:16 pm from Real-Time Innovations (RTI) : Rick Warren

The opinions expressed by this blogger and those providing comments are theirs alone, this does not reflect the opinion of Automated Trader or any employee thereof. Automated Trader is not responsible for the accuracy of any of the information supplied by this article.


If you're serious about

the performance of your distributed system, you probably read

with interest the performance claims made by network middleware

vendors. And if you're a network middleware vendor,

you've probably published your share of performance

claims. (RTI has comprehensive performance numbers available for

both our href="http://www.rti.com/products/dds/benchmarks-cpp-linux.html">DDS

and href="http://www.rti.com/products/jms/latency-throughput-benchmarks.html">JMS

APIs.) But in order to know which claims are meaningful

- and more importantly, which are useful to

you - it's important to understand

what you're reading. In the words on one of my

coworkers, "many apples are compared to

rhinoceroses."

First of all, there are three

primary axes along which people tend to measure network

performance:

  • Latency: end-to-end, round-trip,

    and loaded latency; the amount of time it takes to send a certain

    amount of data from somewhere to somewhere else under various

    conditions

  • Latency

    jitter: the amount of variation in latency

    measurements

  • Throughput: the number of data

    quanta (either raw bytes or fixed-size samples/messages)

    transmitted over a given amount of time

The whitepaper " href="http://www.rti.com/mk/future.html">The Data-Centric

Future" on the RTI website has a good

overview of general performance considerations, so I

won't reiterate that material here. What I'll

focus on today is throughput, and specifically some of the

subtleties you'll want to keep in mind when

you're evaluating a networking middleware

product.

Size Matters.

/> The size of the data samples you send in a

throughput test has a huge impact on the throughput you measure.

At small data sizes (dozens to hundreds of bytes), performance is

dominated by the expense of traversing the layers of software in

between your application and the network: your middleware, if

any; your operating system's network stack, including

any system calls; and your network driver. As the data size

increases, these fixed costs become less significant relative to

the cost of actually copying the data (including the

"copy" of the data across the

network).

This difference in the throughput

profile among different data sizes means that if you're

reading a performance report, and you see

"bytes" but your application deals with

"samples" or "messages,"

it's important to understand the sample size(s) used to

generate that report. It's not sound to generate some

data with one sample size, add up the total number of bytes, and

then divide by another sample size, because you're not

correctly accounting for per-sample constant factors.

One vendor - an enterprise software vendor

newly entering the messaging market, who shall remain nameless

- made a series of throughput measurements for 256-byte

samples. This vendor then declared that what their customers

really cared about was 16-byte samples, and so immediately

multiplied their measurements by 16 (= 256 / 16) and published

those extrapolated results! Depending on how they were planning

on sending those 16-byte samples, they were assuming either that

(a) the cost of performing 16x the number of

network sends is zero or (b) the cost of

packing and unpacking 16-byte samples into and out of 256-byte

chunks is zero. Of course, both of these costs are emphatically

non-zero. But that brings me to my next

point:

Understand data

batching.

Because of the high fixed

cost of a network send, especially relative to the cost of

copying a small data sample, it is a common practice to batch

multiple data samples together and send them together as a

unit.

Suppose you're publishing

64-byte samples. Remember that each time you send a packet on the

network, you're also sending a couple dozen bytes of IP

header data and whatever meta-data your middleware requires. That

adds up to a 30-100% space penalty - added to the time

penalty discussed above. But if you can amortize these costs over

many samples, they become much less important. In fact, batching

data can increase your effective throughput by more than an order

of magnitude in some cases.

In our experience,

you can send 50-80,000 smallish packets per second using

commodity OS, computing, and networking components. When you see

samples-per-second-style throughput numbers much higher than

that, it means that those samples are being batched under the

hood.

Note that data batching is an intrinsic

part of the TCP protocol, so any middleware implementation that

relies on TCP batches data all the time.

Differentiate between

one-to-n and aggregate throughput.

/> There are two ways to look at throughput: an

application-centric view and a network-centric view. Which one of

these a given community cares about governs which one gets

measured and reported by vendors that market into that community.

It means that you need to be aware of what you're

reading when you see "n samples

per second," especially when dealing with new

vendor.

  • Applications typically care about

    the number of samples that can be sent/received to/from

    particular destinations using particular data producing and

    consuming objects. For example, suppose I'm publishing

    sensor data. I know that my device has a new value available

    10,000 times every second. If I try to send that much data, will

    it work? This viewpoint is relevant to applications with

    individual data streams that place a significant burden on the

    network all by themselves. Streaming media, high-rate sensor

    data, real-time command and control, and other similar domains

    are in this category. Throughput data is typically reported from

    one (the data producer) to n (the number of

    data consumers).

  • Other systems take a

    network-centric view, measuring the total

    number of samples in flight across an entire system. This view is

    relevant when both of the following are true:

    (a) individual data streams may not be

    demanding by themselves, but there are many of them, and

    (b) all of those streams have a common choke

    point. Enterprise integration and web services often fall into

    this category, as services are invoked on human time scales, and

    middleware implementations typically include central message

    broker components. Network-centric throughput is typically

    reported in aggregate, from n (producers) to

    m (consumers), where all

    (n + m) entities

    bottleneck through a common broker. The goal of the test, in such

    a case, is to measure the limits of the broker itself, not of the

    applications that use it.

When

you're evaluating a throughput claim, be sure you know

which one of these scenarios you're looking at! I can

tell you that there was a flurry of activity at RTI when a

competing vendor started touting 6 million samples per second

- until we read further into that vendor's

testing methodology and discovered that the result was an

aggregate across 60 applications. For the record, the throughput

numbers you will find on RTI's website -

showing href="http://www.rti.com/products/dds/benchmarks-cpp-linux.html">over

1 million samples per second - are measured

1-to-n. That's one publisher on

one box publishing to one destination.

Understand the architecture.

To me, 1-to-n numbers are the

honest numbers when you have a peer-to-peer solution, as RTI

does. That's because, assuming your switch can keep up,

there's nothing to test other than the so-called

"client" applications themselves. We can

saturate a gigabit link for data sizes not much over 100 bytes,

and come close even for very small sizes. Do you have more data

to send than can fit over a single link? Then add another link.

At that point, you're no longer testing the performance

of the RTI infrastructure; you're testing the

performance of your switch.

Knowledge is Power, or, Forewarned is

Forearmed.

Now, hopefully, you have a

better understanding of how throughput performance numbers data

is measured and reported, what to expect from that data, and what

to look out for. You're ready to enter the wide and

wild world of performance evaluation. (Care to try your own hand?

Pick up a copy of href="https://www.rti.com/downloads/dds.html">RTI Data

Distribution Service or href="https://www.rti.com/downloads/jms.html">RTI Message

Service and run the comprehensive performance test you

can find in RTI's href="https://www.rti.com/kb/index.html">Knowledge

Base - search for "Example

Performance Test.")

Of course,

there's a lot more to network performance besides

throughput. No doubt I (and/or someone else) will be returning to

talk about latency, loaded latency, jitter, competitive analysis,

and other topics in the future - stay tuned.

href="http://feeds.wordpress.com/1.0/gocomments/rtidds.wordpress.com/125/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/comments/rtidds.wordpress.com/125/"

/>

href="http://feeds.wordpress.com/1.0/godelicious/rtidds.wordpress.com/125/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/delicious/rtidds.wordpress.com/125/"

/>

href="http://feeds.wordpress.com/1.0/gostumble/rtidds.wordpress.com/125/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/stumble/rtidds.wordpress.com/125/"

/>

href="http://feeds.wordpress.com/1.0/godigg/rtidds.wordpress.com/125/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/digg/rtidds.wordpress.com/125/"

/>

href="http://feeds.wordpress.com/1.0/goreddit/rtidds.wordpress.com/125/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/reddit/rtidds.wordpress.com/125/"

/>

src="http://stats.wordpress.com/b.gif?host=blogs.rti.com&blog=7350090&post=125&subd=rtidds&ref=&feed=1"

/>

  • Copyright © Automated Trader Ltd 2013 - The Gateway to Algorithmic and Automated Trading

click here to return to the top of the page