AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Analyzing and Optimizing Packet Corruption in RDMA Network

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, China
Huawei Technologies Co. Ltd, Nanjing 210012, China
Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100190, China
Show Author Information

Abstract

Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.

Electronic Supplementary Material

Download File(s)
2123_ESM.pdf (157.1 KB)

References

【1】
【1】
 
 
Journal of Computer Science and Technology
Pages 743-762

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Gao Y-X, Tian C, Chen W, et al. Analyzing and Optimizing Packet Corruption in RDMA Network. Journal of Computer Science and Technology, 2022, 37(4): 743-762. https://doi.org/10.1007/s11390-022-2123-8

837

Views

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Received: 31 December 2021
Revised: 02 July 2022
Accepted: 05 July 2022
Published: 25 July 2022
©Institute of Computing Technology, Chinese Academy of Sciences 2022