High Performance MPI over the Slingshot Interconnect

Kawthar Shafie Khorassani; Chen-Chun Chen; Bharath Ramesh; Aamir Shafi; Hari Subramoni; Dhabaleswar K. Panda

doi:10.1007/s11390-023-2907-5

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

High Performance MPI over the Slingshot Interconnect

Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, U.S.A.

Show Author Information

Abstract

The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.

Keywords

AMD GPU interconnect technology MPI (message passing interface)Slingshot

Electronic Supplementary Material

Download File(s)

JCST-2210-12907-Highlights.pdf (586.1 KB)

References

【1】

Crossref Google Scholar

Journal of Computer Science and Technology

Volume 38 Issue 1,
February 2023

Pages 128-145

DOI: 10.1007/s11390-023-2907-5

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Khorassani KS, Chen C-C, Ramesh B, et al. High Performance MPI over the Slingshot Interconnect. Journal of Computer Science and Technology, 2023, 38(1): 128-145. https://doi.org/10.1007/s11390-023-2907-5

2295

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 16 October 2022

Revised: 29 October 2022

Accepted: 05 January 2023

Published: 28 February 2023