A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems

Qi Zhang; Jidong Zhai; Weimin Zheng

doi:10.26599/TST.2025.9010169

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (1.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Review | Open Access

A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems

Qi Zhang^¹, Jidong Zhai^²(

), Weimin Zheng^¹

1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China, and also with School of Computer Science and Engineering, Qinghai University, Xining 810000, China

Show Author Information

Abstract

Mixture-of-Experts (MoE) models have emerged as a transformative paradigm for scaling Large Language Models (LLMs), enabling unprecedented model capacity while maintaining computational efficiency through sparse activation mechanisms. However, the unique architectural characteristics of MoE models introduce significant system-level challenges that fundamentally differ from traditional dense models. These challenges necessitate specialized system optimizations tailored to MoE’s distinctive properties. This survey systematically analyzes accelerated technologies for MoE training systems, discussing recent advances across four critical optimization dimensions: hybrid parallel computing, comprehensive memory management, fine-grained communication scheduling, and adaptive load balancing. Our analysis reveals a paradigm shift from computation-centric to workload-centric optimization strategies. What’s more, we identify emerging research directions including machine learning-guided load balancing, cross-layer optimization frameworks, and hardware-software co-design for MoE training workloads. This work aims to provide researchers and system engineers with a comprehensive technical reference to support the design of more efficient and scalable next-generation MoE training systems.

Keywords

Mixture-of-Experts (MoE)Large Language Models (LLMs)distributed training system optimization parallel computing memory management communication scheduling load balancing

References

【1】

Crossref Google Scholar

Tsinghua Science and Technology

Volume 31 Issue 3,
June 2026

Pages 1411-1439

DOI: 10.26599/TST.2025.9010169

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Zhang Q, Zhai J, Zheng W. A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems. Tsinghua Science and Technology, 2026, 31(3): 1411-1439. https://doi.org/10.26599/TST.2025.9010169

Part of a topical collection:

Special Section on the 110th Anniversary of the Tsinghua Journal

3910

Views

246

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 27 July 2025

Revised: 07 September 2025

Accepted: 13 October 2025

Published: 19 December 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).