AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access | Just Accepted

Accelerating Distributed Training of Large Concurrent-Branch Models through Bidirectional Pipeline Coordination

Zan Zong1Yuyang Chen2Qi Zhang1Daming Zhao1Jianjiang Li3Yijun Jing4Jidong Zhai1( )

1 Department of Computer Science and Technology, Tsinghua University, Beijing, China

2 Shanghai AI Laboratory, Shanghai, China

3 School of Computer & Communication Engineering, University of Science and Technology Beijing, China

4 Mathematical School, Sun Yat-sen University, China

Show Author Information

Abstract

Large models have been widely used in the field of neural language processing, information retrieving, etc. With the development of the large models, not only is the parameter scale increased, but the model architecture has also become more complex. For example, the multi-modal transformer-based model mainly has concurrent branches, which we denoted as the concurrent branch model (CBM). Many CBMs have enlarged to tens of billions of parameters, and require distributed resources to train this kind of model. Existing distributed training systems cannot fully handle this type of model architecture because there are interactions between branches. Inspired by the unbalanced resource usage of pipeline parallelism, we prefer to organize different branches with a fine-grained bidirectional pipeline schedule of communication and computation. However, improper coordination between branches leads to idle time for computation and low training efficiency. In this paper, we present Flexpipe, a pipeline engine for c3oncurrent-branch models. We first introduce a branch-aware pipeline parallelism to make full use of the concurrent characteristic of the model architecture. Then, based on a multi-branch pipeline simulator, we propose an adaptive interaction coordinator, which facilitates the low-overhead branch interactions during the distributed model training. We evaluate our approach on popular concurrent branch models combined with modern training systems. Compared with the Chimera, the experiential results show that our method improves the end-toend training throughput by 20% on average.

Tsinghua Science and Technology
Cite this article:
Zong Z, Chen Y, Zhang Q, et al. Accelerating Distributed Training of Large Concurrent-Branch Models through Bidirectional Pipeline Coordination. Tsinghua Science and Technology, 2025, https://doi.org/10.26599/TST.2024.9010233

127

Views

48

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 30 July 2024
Revised: 29 September 2024
Accepted: 11 November 2024
Available online: 03 April 2025

© The author(s) 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return