IFIBA   22255
INSTITUTO DE FISICA DE BUENOS AIRES
Unidad Ejecutora - UE
artículos
Título:
GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA
Autor/es:
MININNI, PABLO D.; REDDY, RAGHU; ROSENBERG, DUANE; POUQUET, ANNICK
Revista:
Atmosphere
Editorial:
MDPI
Referencias:
Año: 2020 vol. 11
Resumen:
An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grainparallelization approach for multidimensional distributed Fourier transforms, in a well-characterizedpseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristicsprovided to show a potential benefit of the CUDA implementation. The method draws heavily onthe CUDA runtime library to handle memory management and on the cuFFT library for computinglocal FFTs. The manner in which the interfaces to these libraries are constructed, and ISO bindingsutilized to facilitate platform portability, are discussed. CUDA streams are implemented to overlapdata transfer with cuFFT computation. Testing with a baseline solver demonstrated significantaggregate speed-up over the hybrid MPI-OpenMP solver by offloading to GPUs on an NVLink-basedtest system. While the batch streamed approach provided little benefit with NVLink, we sawa performance gain of 30% when tuned for the optimal number of streams on a PCIe-based system.It was found that strong GPU scaling is nearly ideal, in all cases. Profiling of the CUDA kernelsshows that the transform computation achieves 15% of the attainable peak FlOp-rate based on aroofline model for the system. In addition to speed-up measurements for the fiducial solver, wealso considered several other solvers with different numbers of transform operations and found thataggregate speed-ups are nearly constant for all solvers.