Intro -- Preface -- Organization -- Contents -- Porting Scientific Applications to Heterogeneous Architectures Using Directives -- GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC -- 1 Introduction -- 2 Baseline Solver on CPU-based Computers -- 2.1 The Target Problem -- 2.2 The Solver Algorithm -- 2.3 Implementation of Solver for CPU Systems -- 3 GPU Implementation Using OpenACC -- 3.1 Baseline Implementation -- 3.2 Introduction of Lower-Precision Data Types -- 3.3 Miscellaneous Optimizations in the Solver
4 Performance Measurement -- 4.1 Performance Evaluation of FP21 Computation -- 4.2 Performance Evaluation of the Entire Solver -- 5 Conclusions -- References -- Acceleration in Acoustic Wave Propagation Modelling Using OpenACC/OpenMP and Its Hybrid for the Global Monitoring System -- 1 Introduction -- 2 Computing Environment -- 3 3D-SSFPE -- 3.1 Overview -- 3.2 Implementation -- 3.3 Performance Evaluation and Conclusion -- 4 Global Acoustic Simulation with FDM -- 4.1 Overview -- 4.2 Formulation -- 4.3 Yin-Yang Grid -- 4.4 Computational Schemes -- 4.5 Performance Optimization
4.6 Software Evaluation -- 5 Conclusion -- References -- Accelerating the Performance of Modal Aerosol Module of E3SM Using OpenACC -- 1 Introduction -- 2 OpenACC Programming Model -- 2.1 Parallelizing Loops -- 2.2 Data Transfer -- 3 Experimental Platforms and Approach -- 4 MAM Algorithms and Kernels -- 5 Offloading Computations to GPUs -- 5.1 Kernel: subgrid_mean_updraft -- 5.2 Kernel: hetfrz_classnuc_cam_calc -- 5.3 Kernel: ccncalc -- 5.4 Kernel: nsubmix -- 6 MAM Kernel Performance Discussion -- 6.1 Multi-Process Service (MPS) -- 6.2 Scaling Results -- 7 Summary and Conclusion -- References
Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices -- 1 Introduction -- 2 Background and Related Work -- 3 Methodology -- 3.1 The LOBPCG Algorithm -- 3.2 Baseline CPU Implementation -- 3.3 A GPU Implementation of LOBPCG -- 3.4 Tiling LOBPCG Kernels to Fit in GPU Memory Capacity -- 3.5 Hardware and Software Environment -- 3.6 Experiments -- 4 Results -- 4.1 Performance of the LOBPCG Solver -- 4.2 Performance of XTY and SpMM Kernels for Large Matrices -- 4.3 Performance of Tiled and Unified Memory Versions of SpMM -- 5 Discussion
6 Conclusions -- References -- Directive-Based Programming for Math Libraries -- Performance of the RI-MP2 Fortran Kernel of GAMESS on GPUs via Directive-Based Offloading with Math Libraries -- 1 Introduction -- 2 RI-MP2 Kernel of GAMESS -- 2.1 RI-MP2 Kernel -- 2.2 Inputs for the RI-MP2 Kernel from GAMESS -- 3 Employed Systems -- 3.1 Summit System at Oak Ridge Leadership Computing Facility -- 3.2 JLSE System at Argonne Leadership Computing Facility -- 4 Programming Environments -- 4.1 Employed Compilers -- 4.2 Math Libraries -- 5 Offloading the RI-MP2 Kernel
Summary
This book constitutes the refereed post-conference proceedings of the 6th International Workshop on Accelerator Programming Using Directives, WACCPD 2019, held in Denver, CO, USA, in November 2019. The 7 full papers presented have been carefully reviewed and selected from 13 submissions. The papers share knowledge and experiences to program emerging complex parallel computing systems. They are organized in the following three sections: porting scientific applications to heterogeneous architectures using directives; directive-based programming for math libraries; and performance portability for heterogeneous architectures