Reliability isn’t just for getting everything that was sent….

First Published Friday, 25th May 2012 02:31 pm from Real-Time Innovations (RTI) : rtihoward

The opinions expressed by this blogger and those providing comments are theirs alone, this does not reflect the opinion of Automated Trader or any employee thereof. Automated Trader is not responsible for the accuracy of any of the information supplied by this article.


I got a email from a user that basically stated that

"as a general rule, sending data with BEST_EFFORT

Reliability qos (i.e., using nominal UDP semantics) should

provide better performance than sending data with RELIABLE

Reliability QOS on a stable, clean and thus relatively lossless

network".

Hmm, that sounds

reasonable enough….to use a reliable protocol, the

delivery protocol would have to send and process additional

network packets like heartbeats, ACK/NACK packets thus consuming

both network bandwidth and additional CPU cycles. This additional

overhead should make the "performance" of a

reliable connection worse than that of a "best

effort" connection. If not "worse",

it certainly shouldn't make it better…or

would it?

Well, we may want to make some

definitions first. What is "performance"? Is

it maximum throughput? Or the latency of the data (time it takes

from sending to receiving)? Or resources being consumed while

sending at a particular data rate (CPU, network bandwidth,

memory)?

In general, when sending data slower

than the network bandwidth on a "stable, lossless

transport", then with Best Effort, there is no

additional overhead in CPU/network bandwidth/memory being

consumed. Of course, if you use the Reliable mode,

you'll get the same throughput performance but at a

higher "price" (overhead).

So, no, you do not get better throughput/latency, using

Best Effort vs Reliable QOS when sending data below network

bandwidth limitations on networks that do not lose data packets.

You'll get the same throughput/latency

performance…just for lower

"cost".

If there is a

chance that data packets can be lost on the network no matter

what the network load is, then there is an obvious performance

difference between Best Effort and Reliable…not

necessarily in terms of throughput and latency, but in terms of

determinism vs the guaranteed receipt of all data sent in the

order sent.

With Best Effort, you may not

receive all of the data, but you will receive whatever data that

was able to get through with minimal latency (more

deterministically), and no additional overhead will be incurred

even if there is data loss.

With Reliability,

the reliable protocol will be able detect and repair lost packets

so that all of data sent will be received in the order sent, at

the expense of additional network packets (HB, ACK/NACK) to

detect and repair lost packets, not to mention the increased CPU

and memory needed as well. But one could argue that the

"performance" of the Reliable connection is

better if less deterministic (i.e., there may be unpredictable

delays in receiving data while the system repairs missing

data).

That's all good and great

when the data rates are well below the network

bandwidth…

However, when you send

data faster than the system can handle, no matter if the network

itself is "lossless", e.g., shared memory,

data still can be lost…by DDS or the OS if not by the

network hardware.

It's easy to send

data faster than the network can handle. Data rates is calculated

by (amount of data/time). You can overwhelm a network by sending

small 1-200 byte data too fast. Or the same can happen by trying

to send a MB in a single write() call.

When an

application tries to send data faster than the network can

handle, data packets are lost.

In Best Effort

mode, DDS does not try to detect that it is being asked to send

data faster than the network can handle. And in Best Effort mode,

there is no mechanism to stop DDS from pushing data through the

network stack even though the network is saturated. So the

network stack and/or physical network will throw away data

packets exceeding the network bandwidth.

So

just because the "network" is lossless,

doesn't imply that from App to App there

isn't a place where data can be thrown out. The

physical network may never see a packet because the OS throws out

the data packet when the network reports that it can't

handle any more. So the packet isn't lost by the

physical network, but intentionally dropped by the OS or device

driver layer.

e.g., the send socket buffer is

full which causes OS to throw out the data being sent before it

reaches the Ethernet card.

Or more likely,

since it usually take more CPU to process incoming data then to

send outgoing data, Sending apps usually can send much faster

than Receiving apps can process, and thus the receive socket

buffer (or shared memory buffer) fills up while the CPU is busy

processing received packets….then the Ethernet device

or the OS shared memory driver has no choice but to drop the data

packets it's received.

So how fast

is too fast? Well, assuming a "clean

network", it's when the sender tries to send

more than the total amount of data that can be buffered in the

"system" in one go…without any

delay between sends. The "system" being a

combination of the send network stack, the network itself

(including buffers in switches/routers) and the receive network

stack. The main places where significant amounts of data can be

stpred are the send buffer and the receive buffer.

For RTI's shared memory driver, there is no

independent send buffer versus receive buffer vs network buffer,

there is only 1 shared memory buffer. So if you send data

> the size of the shared memory buffer in one go, then

some part of the data will probably be lost.

Let's take the case of sending

"large data". Large data is defined as data

that is larger than the MTU (maximum transmission unit) of the

physical transport. The largest user data packet that can be sent

by UDP is 64K. So sending 1 MB of data in a single write() call

would require some mechanism, either RTI DDS's builtin

large-data fragmentation feature or a user-level software layer,

to break up the large data into smaller (MTU-sized) chunks, and

sending the fragments individually through the physical

network.

And usually sending the data

fragments consecutively without any delay…which with

today's CPU speeds..can easily exceed the maximum

network bandwidth of most networks.

e.g.

sending 1 MB in the 1 ms that takes a CPU to breakup and send 1

MB in 64K chunks through a UDP socket requires a network that can

handle 8 Gbps. A 1 Gbps network would not be able to transmit the

data that fast.

With other networks, if the

large data being sent is greater than the network can buffer,

then data fragments could be lost.

e.g., 1 MB

of data. 64K chunks -> 16 data fragments are

sent. But if the shared memory buffer only holds 512 KB of data,

it's likely that the send side sends much faster than

the receive side can process, so up to 8 data fragments could be

"lost" (in the case that the send side sends

so fast that all of the fragments is "sent"

even before DDS on the receive side has a chance to take one

packet from the network).

The situation that I

just described is exactly what would happen if you try to send

too much data in Best Effort mode. There is no throttle. DDS will

push the data to the network as fast as the application sends the

data. And if the application gives DDS large data (e.g., 1 MB),

DDS will send all of the data in fragments without delay. If the

"network" looses data, then you'll

see your effective throughput either be zero (i.e., the network

is always loosing the last parts of a large data), or not with

high performance (i.e., every now and then you get lucky and all

of the fragments of a large data sample does make it

through).

So, what can you do? Put in a

mechanism to limit the rate that DDS pushes packets onto a

network to something that the network can handle. You can do this

open loop, i.e., put in arbitrary delays between sending of data

at the application layer and/or use the RTI DDS FlowControl

mechanism, or closed loop, by using feedback from the receiving

side to let the send side know when it's OK to send

more data.

The closed-loop mechanism is

basically what you're getting with the Reliable mode.

By using a limited-sized send queue, the reliable mechanism will

block DDS from sending any more data when the send queue is full,

and only when there is feedback (ACKs) back from the receive side

(indicating that it's able to process more packets) is

DDS allowed to send more data. This is also known as

"throttling".

Yes, this

will add some amount of overhead…but sending using the

Reliable protocol to throttle the send rate and thus not lose any

data due to excessive data rates at the cost of

receiving/processing HB/ACK is a small price to pay compared to

sending data so fast that data is lost and then having to use the

same Reliable protocol to repair the lost packets.

So, even when using the Reliable protocol,

it's still better to tune the protocol to never send

faster than the end-to-end network can handle.

In short, for large data, you're almost

guaranteeing that DDS will try to send it fast than the network

(even shared memory) can handle, and thus data will be lost. If

you're using RTI DDS's internal large data

algorithm, the data rate can be throttled using the Reliable

protocol. If your own code is breaking up the large data

yourself, you can use arbitrary delays in your send loop. Another

open-loop approach is to use the RTI DDS FlowControl mechanism

which can be configured to limit the max send rate for a

DataWriter to a specified data rate. The FlowController can also

be used by the RTI DDS internal large data algorithm.

For those of you who have used TCP for transferring MBs

and MBs of data without every having to worry about this

issue…well TCP internally breaks up data to MTU sized

chunks and uses a reliable protocol for data transfer and limited

buffer (queue) sizes so that it doesn't send data

faster than the network can handle. You don't actually

get to choose if you want to send Best Effort or Reliable,

it's always Reliable. And it's hard to tune

TCP to work under abnormal conditions. And the MTU size is

usually based on the MTU of Ethernet (around 1500

bytes).

So, in conclusion, sending data using

Best Effort QOS may not provided the best

performance…especially if peak data rates are greater

than the network data rate. You can see this on the highways of

California…during rush hour, there are metering lights

at the on-ramps that regulate when cars get to get on the

highway. With the metering lights, the network, aka highway, can

be run at higher effect throughput. Without this type of

regulation, driving in the SF Bay area or LA during rush hour

would be more of a mad house than it is.

href="http://feeds.wordpress.com/1.0/gocomments/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/comments/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/godelicious/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/delicious/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/gofacebook/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/facebook/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/gotwitter/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/twitter/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/gostumble/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/stumble/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/godigg/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/digg/rtidds.wordpress.com/415/"

/>

href="http://feeds.wordpress.com/1.0/goreddit/rtidds.wordpress.com/415/"> alt="" border="0"

src="http://feeds.wordpress.com/1.0/reddit/rtidds.wordpress.com/415/"

/>

src="http://stats.wordpress.com/b.gif?host=blogs.rti.com&blog=7350090&post=415&subd=rtidds&ref=&feed=1"

width="1" height="1" />

  • Copyright © Automated Trader Ltd 2013 - The Gateway to Algorithmic and Automated Trading

click here to return to the top of the page