AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Perspective

Unified Programming Models for Heterogeneous High-Performance Computers

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Unified programming models can effectively improve program portability on various heterogeneous high-performance computers. Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability. In this paper, we present a preliminary design of a performance-portable unified programming model including four aspects: programming language, programming abstraction, compilation optimization, and scheduling system. Specifically, domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures. The unified programming abstraction unifies the common features of different architectures to support common optimizations. Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations. Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers. This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.

Electronic Supplementary Material

Video
JCST-2210-12888-video.mp4
Download File(s)
JCST-2210-12888-Highlights.pdf (732.4 KB)

References

[1]

Dongarra J J, Meuer H W, Strohmaier E. Top500 supercomputer sites. Supercomputer, 1997, 13(1): 89–111.

[2]
Vazhkudai S S, de Supinski B R, Bland A S et al. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2018, pp.661–672. DOI: 10.1109/SC.2018.00055.
[3]

Fu H H, Liao J F, Yang J Z et al. The Sunway TaihuLight supercomputer: System and applications. Science China Information Sciences, 2016, 59(7): 072001. DOI: 10.1007/s11432-016-5588-7.

[4]
Fu H H, Liao J F, Xue W et al. Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2016, pp.969–980. DOI: 10.1109/SC.2016.82.
[5]
Neale R B, Gettelman A, Park S et al. Description of the NCAR community atmosphere model (CAM 5.0). No. NCAR/TN-486+STR, 2010. DOI: 10.5065/wgtk-4g06.
[6]

Edwards H C, Trott C R, Sunderland D. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 2014, 74(12): 3202–3216. DOI: 10.1016/j.jpdc.2014.07.003.

[7]

Trott C R, Lebrun-Grandié D, Arndt D et al. Kokkos 3: Programming model extensions for the exascale era. IEEE Trans. Parallel and Distributed Systems, 2022, 33(4): 805–817. DOI: 10.1109/TPDS.2021.3097283.

[8]
Beckingsale D A, Burmark J, Hornung R et al. RAJA: Portable performance for large-scale scientific applications. In Proc. the 2019 IEEE/ACM International workshop on Performance, Portability and Productivity in HPC (P3HPC), Nov. 2019, pp.71–81. DOI: 10.1109/P3HPC49587.2019.00012.
[9]
Reinders J, Ashbaugh B, Brodman J, Kinsner M, Pennycook J, Tian X M. Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer Nature, 2021. DOI: 10.1007/978-1-4842-5574-2.
[10]
Pennycook S J, Sewall J D, Lee V W. Implications of a metric for performance portability. Future Generation Computer Systems, 2019, 92: 947–958. DOI: 10.1016/j.future.2017.08.007.
[11]
Lin W C, McIntosh-Smith S. Comparing Julia to performance portable parallel programming models for HPC. In Proc. the 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov. 2021, pp.94–105. DOI: 10.1109/PMBS54543.2021.00016.
[12]
Ma Z X, He J A, Qiu J Z et al. BaGuaLu: Targeting brain scale pretrained models with over 37 million cores. In Proc. the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 2022, pp.192–204. DOI: 10.1145/3503221.3508417.
[13]

Zhang Y M, Lu K, Chen W G. Processing extreme-scale graphs on China’s supercomputers. Communications of the ACM, 2021, 64(11): 60–63. DOI: 10.1145/3481614.

[14]
Zhang Y, Yang M, Baghdadi R, Kamil S, Shun J. Graphit: A high-performance graph DSL. Proceedings of the ACM on Programming Languages, 2018, 2(OOPSLA): Article No. 121. DOI: 10.1145/3276491.
[15]
Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proc. the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2013, pp.519–530. DOI: 10.1145/2499370.2462176.
[16]
Chen T Q, Moreau T, Jiang Z H et al. TVM: An automated end-to-end optimizing compiler for deep learning. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.579-594.
[17]
Ben-Nun T, de Fine Licht J, Ziogas A N, Schneider T, Hoefler T. Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures. In Proc. the 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2019, Article No. 81. DOI: 10.1145/3295500.3356173.
[18]
Ziogas A N, Ben-Nun T, Fernández G I, Schneider T, Luisier M, Hoefler T. A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations. In Proc. the 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2019, Article No. 1. DOI: 10.1145/3295500.3357156.
[19]
Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. the 2004 International Symposium on Code Generation and Optimization, Mar. 2004, pp.75–86. DOI: 10.1109/CGO.2004.1281665.
[20]
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O. MLIR: A compiler infrastructure for the end of Moore’s law. arXiv: 2002.11054, 2020. https://arxiv.org/abs/2002.11054, Mar. 2020.
[21]

Gysi T, Müller C, Zinenko O, Herhut S, Davis E, Wicky T, Fuhrer O, Hoefler T, Grosser T. Domain-specific multi-level IR rewriting for GPU: The open earth compiler for GPU-accelerated climate simulation. ACM Transactions on Architecture and Code Optimization, 2021, 18(4): Article No. 51. DOI: 10.1145/3469030.

[22]
McCaskey A, Nguyen T. A MLIR dialect for quantum assembly languages. In Proc. the 2021 IEEE International Conference on Quantum Computing and Engineering, Oct. 2021, pp.255–264. DOI: 10.1109/QCE52317.2021.00043.
[23]
Yoo A B, Jette M A, Grondona M. SLURM: Simple Linux utility for resource management. In Proc. the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, Jun. 2003, pp.44–60. DOI: 10.1007/10968987_3.
[24]
Bode B, Halstead D M, Kendall R et al. The portable batch scheduler and the Maui scheduler on Linux clusters. In Proc. the 4th Annual Linux Showcase & Conference, Oct. 2000. DOI: 10.5555/1268379.1268406.
[25]
Vavilapalli V K, Murthy A C, Douglas C et al. Apache Hadoop YARN: Yet another resource negotiator. In Proc. the 4th Annual Symposium on Cloud Computing, Oct. 2013, Article No. 5. DOI: 10.1145/2523616.2523633.
[26]
Hindman B, Konwinski A, Zaharia M et al. Mesos: A platform for fine-grained resource sharing in the data center. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 2011, pp.295–308.
[27]
Tang X C, Wang H J, Ma X S et al. Spread-n-Share: Improving application performance and cluster throughput with resource-aware job placement. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2019, Article No. 12. DOI: 10.1145/3295500.3356152.
Journal of Computer Science and Technology
Pages 211-218
Cite this article:
Ma Z-X, Jin Y-Y, Tang S-Z, et al. Unified Programming Models for Heterogeneous High-Performance Computers. Journal of Computer Science and Technology, 2023, 38(1): 211-218. https://doi.org/10.1007/s11390-023-2888-4

611

Views

1

Crossref

1

Web of Science

1

Scopus

1

CSCD

Altmetrics

Received: 05 October 2022
Revised: 28 October 2022
Accepted: 10 January 2023
Published: 28 February 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return