The Gateway to Algorithmic and Automated Trading

NAG improves gridding algorithm for the Square Kilometre Array Radio Telescope

First Published 10th November 2017

Working in collaboration with the University of Oxford, the Numerical Algorithms Group investigated methods for improving the performance of a convolution gridding algorithm. Findings show that the algorithm runs faster on the NVIDIA P100 and how marked improvements can be made with considered code changes

The Numerical Algorithms Group (NAG), providers of algorithms, software and HPC were asked by the Scientific Computing Group at the University of Oxford's e-Research Centre to investigate methods for improving the performance of a convolution gridding algorithm used in radio astronomy for processing fringe visibilities, targeting Intel Knights Landing (Xeon Phi) and NVIDIA P100 GPU. During their investigation, NAG experts used simulated Square Kilometre Array (SKA) data to observe the potential differences in algorithm enhancements that related to particular hardware choices.

Although the Square Kilometre Array (SKA) Radio Telescope is not due to begin collecting data until 2020, work is already underway to design and implement the software needed to process the vast amounts of data that the project will produce, hence NAG being asked to look at algorithm use.

NAG are sharing some of the initial comparative performance figures related to the work on the optimization of a signal processing code for large data sets for the SKA project and will publish a Technical Poster on this subject at the Supercomputing Trade Show and Conference (SC17) in Denver next week.

Some study findings:

NAG found that the convolution gridding algorithm studied is not entirely suited to either the Intel Knights Landing or the NVIDIA P100 GPU because the spatial distribution of visibilities in the data leads to random memory access patterns and poor reuse of cached data, and race conditions exist on parallel grid updates. Furthermore, complex memory access patterns during the convolution steps inhibit efficient vectorization.

To effectively tune the algorithm NAG team members decomposed the computational domain into tiles to promote data reuse, and implemented methods to enforce contiguous access to convolution data. The initial performance results suggest that the tiling was important for Intel Knights Landing performance because it negated the need to use slow atomic operations when updating the grid. The code was shown to run fastest on the NVIDIA P100; partly because of the GPU hardware atomics, but also because of the availability of large numbers of registers to store frequently-accessed data.

For details of the code changes implemented by NAG, and the results from the Technical Poster, presented by NAG and the University of Oxford's e-Research Centre, please click here.