Low latency optimisation using Software Defined Radio technology
Published in Automated Trader Magazine Issue 43 Q2 2017
High frequency traders are using radio links to reduce latency. Simply buying the fastest transceivers and repeaters is not enough  design and configuration are critical. We show how Software Defined Radio (SDR) can be used to optimise a network and to provide advantages that hardware does not.
Designing and bringing up a fast and reliable radio link poses considerable challenges. Further optimising the link for latency only adds to the challenge. Even with the fastest hardware, an appropriate balance needs to be struck between channel reliability, capacity and bandwidth. A careful consideration of the relationship between these factors is important for successfully optimising the functional latency  the actual latency required to send a single quantum of information  of a radio link. We begin by taking a closer look at exactly what the factors mean and their consequences for performance, before looking at some of the considerations for choosing a specific modulation scheme. A good understanding of these factors will help us balance the latency impact imposed by practical concerns including multipath, fading and intersymbol interference.
Current pointtopoint microwave radio hardware in the high frequency trading (HFT) sector is focused on delivering performance gains by reducing latency. The consequence of this focus on latency is that potential gains in reliability are often overlooked and traded against increases in transmission power. Low latency microwave providers have turned to custom silicon to deliver the fastest analogue amplifiers and regenerators. In addition to purchasing the fastest silicon available, network designers carefully plan routes to minimise the number of towers required to complete a loop. When a direct line of sight connection is not possible, analogue amplifiers provide the fastest mechanism to compensate for path loss and ensure sufficient signal strength at the repeater. As amplifiers contribute to the noise of the signal, very long routes may also have signal regenerators installed. These regenerators demodulate the digital waveform and then remodulate it, while providing in the process a new 'clean' signal that is comparable to the original. These regenerators are fairly simple devices, but have a higher latency cost than pure amplifiers.
Nevertheless, in the quest for speed, minimising the number of towers required to link a given span is generally considered to be the safest strategy. The easiest way to close the gap between reliability and latency is by increasing transmitter output power. However, regulatory agencies have been increasingly resistant to approving higher transmission powers. Here we look at using Software Defined Radio schemes to dynamically adjust the modulation scheme. This helps to reduce the number of towers required to span a given link, and thereby minimises latency. SDR provides unique opportunities for the HFT market, which traditional amplifiers and regenerators cannot offer. Additionally, the incorporation of fieldprogrammable gate arrays (FPGA) into the signal processing start and end points provides another mechanism for users to reduce end point latency  the time to convert a signal from wired to wireless or vice versa  to the destination network.
Traditional radio links are designed to maximise the amount of information sent over a given period of time. This article deals with those links designed to minimise the specific amount of time required to send a specific amount of information. We begin by providing the background necessary to recognise the design choices that differentiate low latency links. The most logical place to start is by discussing exactly how we measure the rate at which we send information over a network.
The fundamental utility of a radio link relates to its capacity to help us reliably communicate information. This is called its 'channel capacity' and is measured in terms of bits per second. Channel capacity is directly proportional to bandwidth and is theoretically bounded by our symbol modulation scheme and the signaltonoise (SNR) ratio, as shown in Figure 01. Its relationship with channel capacity is fairly straightforward. Higher order modulation schemes are capable of supporting greater transfer rates, but are also more likely to result in errors due to the presence of noise or interference. Note that most practical radio links use the signaltointerferenceplusnoise ratio (SINR) as a figure of merit instead of SNR because it includes sources of interference.
Figure 01: Shannon capacity of a given channel based on modulation scheme and signaltonoise ratio
It is important to note that reliability is a critical aspect of channel capacity. By reliability we mean the highest rate at which we can transmit information with an arbitrarily small error. This helps quantify an implicit measure of utility: That the receiver is able to correctly receive the transmitted information without error. We can therefore use the bit error rate (BER), which reflects the probability that a single bit is erroneously interpreted, as a proxy for reliability.
Link reliability presents significant challenges and is adversely impacted by a number of factors, including atmospheric conditions, signal interference, multipath and fading. Reliability is generally addressed by ensuring sufficient margin to maintain the minimum SNR required to maintain a specified BER within the worst case expected across a specified availability window. The specific link margin is generally dictated by the type of link and the associated capacity requirements. Additionally, a number of tables exist to help associate link margin and reliability. A reasonable margin may fall between 2550 dB, with extreme conditions requiring margins in excess of 60 dB.
Reliability, Throughput and Latency
An obvious and first order requirement for the operation of any radio link is reliability: There is not much utility in a link that is unable to accurately transmit information between transmitting and receiving nodes. As a practical matter, the general aim is to place receiver and transmitter pairs as far away from each other as possible, while ensuring a minimum specified SNR between two nodes. This minimises the time, complexity, delay and cost associated with introducing additional towers or hardware into the link. However, longer links imply greater path loss and are generally more susceptible to other sources of degradation, including interference, scattering and absorption. Those can adversely impact the maximum achievable SNR of the link, and, in turn, the available link margin and reliability. How we encode information impacts the SNR required to maintain a given BER. The relationship for various orders of quadrature amplitude modulation (QAM) encoding is shown in Figure 02.
Figure 02: Bit error rate for various QAM modulation orders M, as a function of SNR
We can also achieve reliability through our encoding scheme. In the simplest case, the use of repetition codes sees us retransmit messages multiple times to ensure that at least one uncorrupted message is communicated. As we increase the number of times we retransmit a message, we also increase the likelihood that the message will be correctly transmitted, though at the cost of reducing our channel throughput by a factor of N. In effect, while we have improved reliability, we have substantially reduced channel capacity and, by extension, the time required to successfully decode a message.
A substantial improvement to this scheme was proven by Claude Shannon, who demonstrated that it is possible to encode messages such that we can send information through a channel at the greatest possible rate and with an arbitrary small error. However, the actual implementation of any of these codes is done by increasing the block size or minimum size of a message. This is similar to the tradeoff seen in repetition codes: In order to reliably send a message in the presence of noise, we generally see the time increase that is required to successfully receive a message and decode it. This is of particular significance when designing low latency networks. As these types of networks generally operate on a nearrealtime basis, late packets aren't useful and are generally dropped. By determining the maximum acceptable message latency, along with the desired reliability goals, we can infer exactly how much time we have (measured as a fraction of the channel capacity) to spend on error correction. This allows us to optimise link reliability across noisy or adverse channels.
Calculating Latency
In discussing the latency of a radio system, we first need to identify all the contributing sources of latency. For a practical radio system, enabling a network to communicate over two nodes, we can use Equation 01 to help isolate the different sources of latency.
Equation 01
\(\tau_{lat,sys}\) 
Total system latency 
\(\tau_{deframe}\) 
Time required to deframe messages from the wired network 
\(\tau_{frame}\) 
Time required to buffer messages before going into the wired network 
\(\tau_{mod}\) 
Time to modulate the digital message into an analogue waveform suitable for transmission over the air 
\(\tau_{demod}\) 
Time to demodulate the digital message 
\(\tau_{RF,Tx}\) 
Radio frequency (RF) or feed line propagation delays on the transmit side 
\(\tau_{RF,Rx}\) 
Radio frequency (RF) or feed line propagation delays on the receive side 
\(\tau_{path}\) 
Time required for the start of the message to travel across the distance between the two nodes 
\(\tau_{msg}\) 
Duration of the actual message 
Note that within this formula, \(\tau_{msg}\) represents the time required to send the entire message. This message may comprise of multiple bits or symbols and does not differentiate between user payload and additional coding overheads such as error correction, hashing or protocol. For a latencycritical system, the overhead associated with any of these features directly impacts \(\tau_{msg}\) by increasing the time required to send a message, in favour of improved reliability. As we will discuss later, dynamically balancing this overhead is critical to the optimisation of a latencycritical channel.
Most high performance systems are already optimised to reduce \(\tau_{RF,Tx}\) and \(\tau_{RF,Rx}\). We can also assume that the RF delays caused by feed lines, filter banks and amplifiers remain constant across the various systems or have already been optimised. The modem modulation and demodulation times, \(\tau_{mod}\) and \(\tau_{demod}\) respectively, may also be significantly reduced using time memory tradeoffs.
Note: On the transmit side, we can aid transmission by storing raw wave forms and using the incoming payload data to select the raw waveform data to be transmitted. Similarly, on the receive side, cross correlation of raw waveform data against stored wave forms representing ideal symbols can be used to immediately decode received waveform data into its binary components. This significantly reduces any latencies associated with the conversion between raw bits and transmissible wave forms.
Finally, most radio links have adopted one of two solutions. The first is to optimise the interface between transmitting and receiving networks \(\tau_{deframe}\) and \(\tau_{frame}\). The second is to have a fixed wired connection with bandwidth and capacity that are so much larger than that of the wireless link so as to make the associated framing and deframing latency negligible.
Note: Consider a 10 Gbps network interfaced to a 100 Mbps wireless link: The framing and deframe of message to the 10 Gbps network is generally faster than the time required to modulate, transmit, and demodulate a message over the radio link.
This leaves us to consider the time required to send a message, \(\tau_{msg}\). There are two aspects to consider here. First and foremost, the majority of practical and ongoing uses of high performance and low latency networks require the speedy transmission of more than a single bit of data. Therefore, when calculating latency, we need to consider the time required to transfer enough bits across to ensure the complete message is sent.
Note: The problem is made more complex when considering variable message length encoding schemes such as Huffman encoding. In these cases, we must distinguish the probability a given message will be transmitted based on its value or importance. For some applications, low probability messages may have greater value than high probability messages. The challenge in effectively implementing such algorithms is to weigh messages according to value, which may be harder to define, and which may also merit the use of lower order encoding schemes.
Second, we need to recognise the difference between symbol rate (or Baud rate) and bit rates. Depending on the type of modulation used, a given symbol or unit of transmission may be comprised of multiple bits. Generally speaking, as spectral efficiency increases, the channel capacity and throughput also increases, but the symbol rate remains constant and modem complexity increases.
To illustrate this, consider the time required to send a single message, \(\tau_{msg}\), consisting of a Nbit payload (\(N_{b/pay}\)) using a modulation scheme with Nbits per symbol (\(N_{b=sym}\)) and a symbol rate of \(f_s\). The total time required to send this message is the number of symbols required to make up the complete payload, multiplied by the time required to send each symbol (Equation 02).
Equation 02
Recognising that the symbol rate is simply the channel capacity, \(C\), in bits per second, over the number of bits per symbol (Equation 03), we can apply Hartley's Law to determine the maximum channel capacity (Equation 04).
Equation 03
Equation 04
Substituting Equations 03 and 04 into Equation 02, we obtain Equation 05.
Equation 05
This has the practical result of showing that even though the theoretical symbol rate is fixed for a given bandwidth, then, assuming a fixed payload size, the number of bits per symbol helps dramatically reduce the time required to send a complete message. To illustrate the importance of this relationship on latency, we can contrast a quadrature phase shift keying (QPSK) modulation scheme, where \(N_{b/sym, QPSK = 2}\), and a 16QAM scheme, where \(N_{b/sym, 16QAM} = 4\). Assuming we want to send a fixed payload size \(N_{b/pay}\) of 1 byte (8 bits), we can substitute into equation Equation 05 and find \(\tau_{msg,1B,QPSK} = 2/B\) and \(\tau_{msg,1B,16QAM} = 1/B\). That is, a modulation scheme of 16QAM can send a fixed payload twice as fast as a QPSK link.
Optimising for Latency
Based on the above analysis, it would seem like maximising channel capacity is the best way to start improving payload latency. We can apply the ShannonHartley theorem to determine the optimal number of bits per symbol (Equation 06).
Equation 06
Here, \(N_{opt,b/sym}\) is the largest number of bits per symbol in the presence of noise, and \(S/N\) is the voltage ratio of the root mean square (RMS) signal amplitude to the standard deviation of noise. As a practical matter, however, in our calculations we still need to consider reliability requirements, including fade margins, which would reduce the maximum SNR we can use. In addition, it is not always the case that improvements in channel efficiency lead to latency gains.
Equation 05 also illustrates this caveat when calculating the latency gain associated with higher order modulation schemes. As our unit of transmission is a symbol, we can only encode and decode \(N_{b/sym}\) bits at a time. This limits the latency advantage associated with higher order encoding. Applying the same equation to 32QAM, with \(N_{b/sym,32QAM} = 5\), the ceiling function limits the latency gain associated with the transmission of a fixed 1 byte payload to \(\tau_{msg,1B,32QAM} = 1/B\). This is identical to the case of \(\tau_{msg,1B,16QAM}\) and means our message transmission latency has not improved, despite an increase in throughput and channel capacity.
Note: This is largely because we are using two symbols, capable of encoding 10 bits, to send 8 bits of information. Though these additional bits could be variously applied to error detection or correction to improve reliability, doing so would not have any latency impact.
We can generally avoid this situation by ensuring that we match payload size with modulation scheme. Optimal results occur when \(N_{b/pay}/N_{b/sym} = 1\), with subsequent lesser maxima occurring at lower integer ratios of this quantity \(N_{b/pay}/N_{b/sym} \gt 1, \in \mathbb{Z} \).
Optimising the system for latency therefore requires the ongoing monitoring and evaluation of channel SNR. Once we know the SNR of a given channel, we can use our BER requirements to determine the best modulation scheme which also meets our reliability requirements and then determine an optimal payload size.
Hardware Support
Actively ensuring a network is optimised therefore requires coordination between transmitting and receiving nodes, and flexible transceivers. Unlike traditional radio broadcast equipment, which is limited to specific bands of operation and modulation schemes, modern Software Defined Radio allows for many of these parameters to be controlled entirely by application software. By allowing applications to adjust fundamental radio and messaging parameters such as frequency, transmission power, frontend sensitivity and even encoding schemes, applications can intelligently optimise network parameters in the face of changing environmental conditions.
This is of particular importance for high performance and low latency networks: Traditional links are designed to preserve connectivity and reliability in a worst case environment. In the absence of feedback, the receiving node has limited evidence of channel noise or reliability and does not have a mechanism to communicate that to the transmitter. This is key to ensuring consistent link quality in the face of daily and seasonal variations in climate and propagation characteristics. By incorporating feedback, either inband or outofband, this information can be shared with the transmitter and used to adjust radio properties and better favour throughput or latency goals without compromising reliability. This allows for easy implementation of Adaptive Coding and Modulation (ACM) schemes and Automatic Transmit Power Control (ATPC), which can be tuned to provide better latency performance.
More importantly, these schemes can be managed autonomously, thus reducing network maintenance overhead, and can be evaluated against quantitative requirements to ensure consistent performance. The complexity and utility of these applications are limited by hardware capabilities. High performance systems generally benefit from large instantaneous share bandwidth, high clock rates and ample FPGA or processing resources, which provide a strong foundation for implementing application specific optimisations.
The author gratefully acknowledges the assistance and editing of B. Malatest and C. Wollesen in the preparation of this article.
Title 
Author 
Released 
Source 
Suppression of fiber nonlinearities and PMD in codedmodulation schemes with coherent detection by using turbo equalization 
Djordjevic, I., Minkov, L., Xu, L. & Wang, T. 
2009 
Journal of Optical Communications and Networking, Vol. 1, Issue 6, pp. 555564 
BER comparison of Mary QAM 
Hussain, M. 
2012 

RF path loss & transmission distance calculations 
Debus, W. 
2006 

Digital modulation in communications systems  An introduction 
Agilent Technologies 
2000 
Agilent Technologies Application Note 1298 