Journal Home > Volume 26 , Issue 3

Despite efficient parallelism in the solution of physical parameterization in the Global/Regional Assimilation and Prediction System (GRAPES), the Helmholtz equation in the dynamic core, with the increase of resolution, can hardly achieve sufficient parallelism in the solving process due to a large amount of communication and irregular access. In this paper, optimizing the Helmholtz equation solution for better performance and higher efficiency has been an urgent task. An optimization scheme for the parallel solution of the Helmholtz equation is proposed in this paper. Specifically, the geometrical multigrid optimization strategy is designed by taking advantage of the data anisotropy of grid points near the pole and the isotropy of those near memory equator in the Helmholtz equation, and the Incomplete LU (ILU) decomposition preconditioner is adopted to speed up the convergence of the improved Generalized Conjugate Residual (GCR), which effectively reduces the number of iterations and the computation time. The overall solving performance of the Helmholtz equation is improved by thread-level parallelism, vectorization, and reuse of data in the cache. The experimental results show that the proposed optimization scheme can effectively eliminate the bottleneck of the Helmholtz equation as regards the solving speed. Considering the test results on a 10-node two-way server, the solution of the Helmholtz equation, compared with the original serial version, is accelerated by 100×, with one-third of iterations reduced.


menu
Abstract
Full text
Outline
About this article

Helmholtz Solving and Performance Optimization in Global/Regional Assimilation and Prediction System

Show Author's information Jianqiang HuangWei XueHaodong BianWenxin YanXiaoying WangWenguang Chen( )
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Department of Computer Technology and Applications, Qinghai University, Xining 810016, China.

Abstract

Despite efficient parallelism in the solution of physical parameterization in the Global/Regional Assimilation and Prediction System (GRAPES), the Helmholtz equation in the dynamic core, with the increase of resolution, can hardly achieve sufficient parallelism in the solving process due to a large amount of communication and irregular access. In this paper, optimizing the Helmholtz equation solution for better performance and higher efficiency has been an urgent task. An optimization scheme for the parallel solution of the Helmholtz equation is proposed in this paper. Specifically, the geometrical multigrid optimization strategy is designed by taking advantage of the data anisotropy of grid points near the pole and the isotropy of those near memory equator in the Helmholtz equation, and the Incomplete LU (ILU) decomposition preconditioner is adopted to speed up the convergence of the improved Generalized Conjugate Residual (GCR), which effectively reduces the number of iterations and the computation time. The overall solving performance of the Helmholtz equation is improved by thread-level parallelism, vectorization, and reuse of data in the cache. The experimental results show that the proposed optimization scheme can effectively eliminate the bottleneck of the Helmholtz equation as regards the solving speed. Considering the test results on a 10-node two-way server, the solution of the Helmholtz equation, compared with the original serial version, is accelerated by 100×, with one-third of iterations reduced.

Keywords: Global/Regional Assimilation and Prediction System (GRAPES), Helmholtz equation, Generalized Conjugate Residual (GCR), performance optimization, Incomplete LU (ILU)

References(33)

[1]
Y. Su, X. S. Shen, X. D. Peng, X. L. Li, X. J. Wu, S. Zhang, and X. Chen, Application of PRM scalar advection scheme in GRAPES global forecast system, (in Chinese), Chin.J. Atmos. Sci., vol. 37, no. 6, pp. 1309-1325, 2013.
[2]
T. Yanagawa and K. Suehiro, Software system of the earth simulator, Parallel Comput., vol. 30, no. 12, pp. 1315-1327, 2004.
[3]
S. Habata, K. Umezawa, M. Yokokawa, and S. Kitawaki, Hardware system of the earth simulator, Parallel Comput., vol. 30, no. 12, pp. 1287-1313, 2004.
[4]
H. Ishizaki and I. Ishikawa, High parallelization efficiency in barotropic-mode computation of ocean models based on multi-grid boundary ghost area, Ocean Modelling, vol. 13, nos. 3&4, pp. 238-254, 2006.
[5]
P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, M. Ohlberger, and O. Sander, A generic grid interface for parallel and adaptive scientific computing. Part I: Abstract framework, Computing, vol. 82, nos. 2&3, pp. 103-119, 2008.
[6]
P. Lynch, The origins of computer weather prediction and climate modeling, J. Comput. Phys., vol. 227, no. 7, pp. 3431-3444, 2008.
[7]
N. Raba, E. Stankova, and N. Ampilova, On investigation of parallelization effectiveness with the help of multi-core processors, Procedia Computer Science, vol. 1, no. 1, pp. 2763-2768, 2010.
[8]
J. Michalakes, J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, and W. Wang, The weather research and forecast model: Software architecture and performance, in Proc. 11th ECMWF Workshop on the Use of Parallel Processors in Meteorology, Reading, UK, 2005, pp. 156-168.
DOI
[9]
Y. R. Chen, Research on key technigues of performance models for high performace computing, (in Chinese), PhD dissertation, National University of Defense Technology, Changsha, China, 2007.
[10]
Z. Y. Jin and D. X. Wang, Diffusion algorithm of dynamic load balancing for heterogeneous system, (in Chinese), Chin. J. Comput., vol. 26, no. 11, pp. 1487-1493, 2003.
[11]
Z. Y. Jin and D. X. Wang, An optimal method of diffusion algorithm for hetergeneous system, (in Chinese), J. Softw., vol. 14, no. 5, pp. 904-910, 2003.
[12]
L. L. Zhang, H. Ye, J. P. Wu, and J. Q. Song, Parallel load-balancing performance analysis based on maximal ratio of load offset, (in Chinese), J. Comput. Res. Dev., vol. 47, no. 6, pp. 1125-1131, 2010.
[13]
Z. Y. Mo, X. P. Liu, and Z. M. Liao, Research on key techniques for parallelization and optimization of applied codes, (in Chinese), J. Numer. Methods Comput. Appl., vol. 23, no. 1, pp. 31-40, 2002.
[14]
Y. Q. Zhang, DRAM(h): A parallel computation model for high performance numerical computing, (in Chinese), Chin. J. Comput., vol. 26, no. 12, pp. 1660-1670, 2003.
[15]
J Michalakes and M Vachharajani, GPU acceleration of numerical weather prediction, Parallel Process. Lett., vol. 18, no. 4, pp. 531-548, 2008.
[16]
Z. W. Wang, X. B. Xu, W. Q. Zhao, S. B. He, and Y. P. Zhang, Parallel acceleration and performance optimization for GRAPES model based on GPU, (in Chinese), J. Comput. Res. Dev., vol. 50, no. 2, pp. 401-411, 2013.
[17]
J. Michalakes, J. Hacker, R. Loft, M. O. McCracken, A. Snavely, N. J. Wright, T. Spelce, B. Gorda, and R. Walkup, WRF nature run, in Proc. 2007 ACM/IEEE Conf. Supercomputing, Reno, NV, USA, 2007, pp. 1-6.
DOI
[18]
G. Ruetsch, E. Phillips, and M. Fatica, GPU acceleration the long-wave rapid radiative transfer model in WRF using CUDA Fortran, in Proc. 2010 Many-Core and Reconfigurable Supercomputing Conf., Rome, Italy, 2010, pp. 1-11
[19]
P. Xu, Research on performance optimization of GRAPES dynamic core on sunway Taihu light, (in Chinese), Master dissertation, Tsinghua University, Beijing, China, 2019.
[20]
J. S. Xue and D. H. Chen, Scientific Design and Application of Numerical Prediction System, (in Chinese). Beijing, China: Science Press, 2008.
[21]
X. J. Wu, Study on the parallel computing in GRAPES high resolution numerical weather prediction mode, (in Chinese), PhD dissertation, National University of Defense Technology, Changsha, China, 2011.
[22]
D. H. Chen, J. S. Xue, X. S. Yang, H. L. Zhang, X. S. Shen, J. L. Hu, Y. Wang, L. R. Ji, and J. B. Chen, New generation of multi-scale NWP system (GRAPES): General scientific design, Chin. Sci. Bull., vol. 53, no. 22, pp. 3433-3445, 2008.
[23]
D. H. Chen, X. S. Yang, H. L. Zhang, and J. L. Hu, Strategy for designing a non-hydrostatic multi-scale community model dynamic core, (in Chinese), J. Appl. Meteor. Sci., vol. 14, no. 4, pp. 452-461, 2003.
[24]
L. B. Zhao and Y. X. Tian, Improved parallel generalized conjugate residual algorithm, (in Chinese), Comput. Eng., vol. 35, no. 4, pp. 80-82, 2009.
[25]
Y. Saad, Iterative Methods for Sparse Linear Systems. 2nd ed. Philadelphia, PA, USA: SIAM, 2003.
DOI
[26]
X. M. Huang, Q. Tang, Y. H. Tseng, Y. Hu, A. H. Baker, F. O. Bryan, J. Dennis, H. H. Fu, and G. W. Yang, P-CSI v1.0, an accelerated barotropic solver for the high-resolution ocean model component in the Community Earth System Model v2.0, Geosci. Model Dev., vol. 9, no. 11, pp. 4209-4225, 2016.
[27]
S. Cools and W. Vanroose, The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems, Parallel Computing, vol. 65, pp. 1-20, 2017.
[28]
H. A. Van der Vorst, Iterative Krylov Methods for Large Linear Systems: Volume 13. Cambridge, UK: Cambridge University Press, 2003.
DOI
[29]
P. Sanan, S. M. Schnepp, and D. A. May, Pipelined, flexible Krylov subspace methods, SIAM J. Sci. Comput., 2016, vol. 38, no. 5, pp. C441-C470, 2016.
[30]
J. Demmel, M. F. Hoemmen, M. Mohiyuddin, and K. A. Yelick. Avoiding Communication in Computing Krylov Subspaces. EECS Department, University of California, Berkeley, CA, USA, 2007.
[31]
M. F. Hoemmen, Communication-Avoiding Krylov Subspace Methods. EECS Department, University of California, Berkeley, CA, USA, 2010.
[32]
F. L. Lin, Performance optimization technology of global numerical weather forecasting system, (in Chinese), Master dissertation, Tsinghua University, Beijing, China, 2012.
[33]
S. Williams, M. Lijewski, A. Almgren, B. Van Straalen, E. Carson, N. Knight, and J. Demmel, s-Step Krylov subspace methods as bottom solvers for geometric multigrid, presented at 2014 IEEE 28th Int. Parallel and Distributed Processing Symp., Phoenix, AZ, USA, 2014, pp. 1149-1158.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 10 October 2019
Accepted: 17 October 2019
Published: 12 October 2020
Issue date: June 2021

Copyright

© The author(s) 2021.

Acknowledgements

This paper was partially supported by the Open Project of State Key Laboratory of Plateau Ecology and Agricuture, Qinghai University (No. 2020-ZZ-03), the Qinghai Province High-End Innovative Thousand Talents Program Leading Talents, the National Natural Science Foundation of China (Nos. 61762074 and 61962051), and the National Natural Science Foundation of Qinghai Province (No. 2019-ZJ-7034).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return