Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on the Sugon supercomputer, a domestic supercomputer equipped with deep computing units. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.
Naveh Y, Likharev K K. Shrinking limits of silicon MOSFETs: Numerical study of 10 nm scale devices. Superlattices and Microstructures, 2000, 27(2/3): 111–123. DOI: 10. 1006/spmi.1999.0807.
Kohn W, Sham L J. Self-consistent equations including exchange and correlation effects. Physical Review, 1965, 140(4A): A1133–A1138. DOI: 10.1103/PhysRev.140.A1133.
Payne M C, Teter M P, Allan D C, Arias T A, Joannopoulos J D. Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients. Reviews of Modern Physics, 1992, 64(4): 1045–1097. DOI: 10.1103/RevMod-Phys.64.1045.
Kresse G, Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Physical Review B, 1996, 54(16): 11169–11186. DOI: 10.1103/PhysRevB.54.11169.
Tsuchida E, Tsukada M. Electronic-structure calculations based on the finite-element method. Physical Review B, 1995, 52(8): 5573–5578. DOI: 10.1103/PhysRevB.52.5573.
Suryanarayana P, Gavini V, Blesgen T, Bhattacharya K, Ortiz M. Non-periodic finite-element formulation of Kohn–Sham density functional theory. Journal of the Mechanics and Physics of Solids, 2010, 58(2): 256–280. DOI: 10.1016/j.jmps.2009.10.002.
Bao G, Hu G H, Liu D. An h-adaptive finite element solver for the calculations of the electronic structures. Journal of Computational Physics, 2012, 231(14): 4967–4979. DOI: 10.1016/j.jcp.2012.04.002.
Nakata A, Baker J S, Mujahed S Y, Poulton J T L, Arapan S, Lin J B, Raza Z, Yadav S, Truflandier L, Miyazaki T, Bowler D R. Large scale and linear scaling DFT with the CONQUEST code. The Journal of Chemical Physics, 2020, 152(16): 164112. DOI: 10.1063/5.0005074.
Kühne T D, Iannuzzi M, Del Ben M, Rybkin V V, Seewald P, Stein F, Laino T, Khaliullin R Z, Schütt O, Schiffmann F, Golze D, Wilhelm J, Chulkov S, Bani-Hashemian M H, Weber V, Borštnik U, Taillefumier M, Jakobovits A S, Lazzaro A, Pabst H, Müller T, Schade R, Guidon M, Andermatt S, Holmberg N, Schenter G K, Hehn A, Bussy A, Belleflamme F, Tabacchi G, Glöß A, Lass M, Bethune I, Mundy C J, Plessl C, Watkins M, Vandevondele J, Krack M, Hutter J. CP2K: An electronic structure and molecular dynamics software package-quickstep: Efficient and accurate electronic structure calculations. The Journal of Chemical Physics, 2020, 152(19): 194103. DOI: 10.1063/5.0007045.
Hu W, Qin X M, Jiang Q C, Chen J S, An H, Jia W L, Li F, Liu X, Chen D X, Liu F F, Zhao Y W, Yang J L. High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight. Science Bulletin, 2021, 66(2): 111–119. DOI: 10.1016/j.scib.2020.06.025.
Schade R, Kenter T, Elgabarty H, Lass M, Schütt O, Lazzaro A, Pabst H, Mohr S, Hutter J, Kühne T D, Plessl C. Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms. Parallel Computing, 2022, 111: 102920. DOI: 10.1016/j.parco.2022.102920.
Goedecker S. Linear scaling electronic structure methods. Reviews of Modern Physics, 1999, 71(4): 1085–1123. DOI: 10.1103/RevModPhys.71.1085.
Lin L, Lu J F, Car R, E W N. Multipole representation of the Fermi operator with application to the electronic structure analysis of metallic systems. Physical Review B, 2009, 79(11): 115133. DOI: 10.1103/PhysRevB.79.115133.
Bowler D R, Miyazaki T. O( N) methods in electronic structure calculations. Reports on Progress in Physics, 2012, 75(3): 036503. DOI: 10.1088/0034-4885/75/3/036503.
Wang L W, Zhao Z J, Meza J. Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations. Physical Review B, 2008, 77(16): 165113. DOI: 10.1103/PhysRevB.77.165113.
Tomo S, Langou J, Dongarra J, Canning A, Wang L W. Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures. International Journal of Computational Science and Engineering, 2006, 2(3/4): 205–212. DOI: 10.1504/IJCSE.2006.012774.
Kohn W. Density functional and density matrix method scaling linearly with the number of atoms. Physical Review Letters, 1996, 76(17): 3168–3171. DOI: 10.1103/PhysRevLett.76.3168.
Auckenthaler T, Blum V, Bungartz H J, Huckle T, Johanni R, Krämer L, Lang B, Lederer H, Willems P R. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Computing, 2011, 37(12): 783–794. DOI: 10.1016/j.parco.2011.05.002.
Yang C, Meza J C, Wang L W. A trust region direct constrained minimization algorithm for the Kohn–Sham equation. SIAM Journal on Scientific Computing, 2007, 29(5): 1854–1875. DOI: 10.1137/060661442.
Vecharynski E, Yang C, Pask J E. A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix. Journal of Computational Physics, 2015, 290: 73–89. DOI: 10.1016/j.jcp.2015.02.030.
Knyazev A V. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM Journal on Scientific Computing, 2001, 23(2): 517–541. DOI: 10.1137/S1064827500366124.
Jia W L, Cao Z Y, Wang L, Fu J Y, Chi X B, Gao W G, Wang L W. The analysis of a plane wave pseudopotential density functional theory code on a GPU machine. Computer Physics Communications, 2013, 184(1): 9–18. DOI: 10.1016/j.cpc.2012.08.002.
Hohenberg P, Kohn W. Inhomogeneous electron gas. Physical Review, 1964, 136(3B): B864–B871. DOI: 10.1103/ PhysRev.136.B864.
Van Zee F G, van de Geijn R A. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Mathematical Software, 2015, 41(3): Article No. 14. DOI: 10.1145/2764 454.
Bosma W, Cannon J, Playoust C. The Magma algebra system I: The user language. Journal of Symbolic Computation, 1997, 24(3/4): 235–265. DOI: 10.1006/jsco.1996.0125.