AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

High Performance MPI over the Slingshot Interconnect

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, U.S.A.
Show Author Information

Abstract

The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.

Electronic Supplementary Material

Download File(s)
JCST-2210-12907-Highlights.pdf (586.1 KB)

References

[1]
Khorassani K S, Chen C C, Ramesh B, Shafi A, Subramoni H, Panda D. High performance MPI over the Slingshot interconnect: Early experiences. In Proc. the 2022 Practice and Experience in Advanced Research Computing, Jul. 2022. DOI: 10.1145/3491418.3530773.
[2]
Kim J, Dally W J, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In Proc. the 2008 International Symposium on Computer Architecture, Jun. 2008, pp.77–88. DOI: 10.1109/ISCA.2008.19.
[3]
Gabriel E, Fagg G E, Bosilca G, Angskun T, Dongarra J J, Squyres J M, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R H, Daniel D J, Graham R L, Woodall T S. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proc. the 11th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Sept. 2004, pp.97–104. DOI: 10.1007/978-3-540-30218-6_19.
[4]
Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. International Journal of High Performance Computing Applications, 2005, 19(1): 49–66. DOI: 10.1177/1094342005051521.
[5]
Panda D K, Subramoni H, Chu C H, Bayatpour M. The MVAPICH project: Transforming research into high-performance MPI library for HPC community. Journal of Computational Science, 2021, 52: 101208. DOI: 10.1016/j.jocs.2020.101208.
[6]
Bureddy D, Wang H, Venkatesh A, Potluri S, Panda D K. OMB-GPU: A micro-benchmark suite for evaluating MPI libraries on GPU clusters. In Proc. the 19th European Conference on Recent Advances in the Message Passing Interface, Sept. 2012, pp.110–120. DOI: 10.1007/978-3-642-33518-1_16.
[7]
Chakraborty S, Bayatpour M, Hashmi J, Subramoni H, Panda D K. Cooperative rendezvous protocols for improved performance and overlap. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2018, pp.361–373. DOI: 10.1109/SC.2018.00031.
[8]
Khorassani K S, Hashmi J, Chu C H, Chen C C, Subramoni H, Panda D K. Designing a ROCm-aware MPI library for AMD GPUs: Early experiences. In Proc. the 36th International Conference on High Performance Computing, Jun. 24–Jul. 2, 2021, pp.118–136. DOI: 10.1007/978-3-030-78713-4_7.
[9]
De Sensi D, Di Girolamo S, McMahon K H, Roweth D, Hoefler T. An in-depth analysis of the Slingshot interconnect. In Proc. the 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2020. DOI: 10.1109/SC41405.2020.00039.
[10]
Melesse Vergara V G, Budiardja R D, Joubert W. Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs. U.S. Department of Energy, 2021. https://cug.org/proceedings/cug2021_proceedings/includes/files/pap108s2-file2.pdf, Jan. 2023.
Journal of Computer Science and Technology
Pages 128-145
Cite this article:
Khorassani KS, Chen C-C, Ramesh B, et al. High Performance MPI over the Slingshot Interconnect. Journal of Computer Science and Technology, 2023, 38(1): 128-145. https://doi.org/10.1007/s11390-023-2907-5

742

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 16 October 2022
Revised: 29 October 2022
Accepted: 05 January 2023
Published: 28 February 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return