AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Survey

A Survey of Quantization in LLM: Unlocking Potential Hardware Efficiency

Yi-Dong Chen^¹, Kai-Jun Zheng^¹, Zhen-Hua Guo^², Qi-Hao Zhang^¹, Yong-Hua Zhang^¹, Ji-Dong Zhai^¹

1Department of Computer Science and Technology, Tsinghua University, Beijing 100190, China

2Inspur Electronic Information Industry Co., Ltd., Beijing 100190, China

Show Author Information

Abstract

Large language models (LLMs) have achieved remarkable progress in natural language processing, but their immense scale leads to significant computational and storage overheads, limiting their deployment and widespread application in resource-constrained environments. Model quantization, as an effective model compression technique, significantly reduces LLMs' memory footprint and computational requirements by lowering the numerical precision of model parameters and/or activations, while striving to maintain minimal performance loss. This survey aims to comprehensively review the latest advancements in LLM quantization, covering various techniques from the pretraining phase to the inference phase. We will delve into state of the art quantization during pretraining, post-training quantization, and quantization-aware training in quantization fine-tuning, and various quantization methods during inference. Through in-depth analysis of these methods, this survey seeks to provide researchers and engineers with a comprehensive understanding of LLM quantization techniques to identify future research directions and offers an insight of how to generate high performance low-precision kernels in different chips.

Keywords

quantization large language model (LLM)mixed-precision kernel generation

References

【1】

Crossref Google Scholar

Journal of Computer Science and Technology

Volume 41 Issue 1,
April 2026

Pages 341-358

DOI: 10.1007/s11390-026-5979-1

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Chen Y-D, Zheng K-J, Guo Z-H, et al. A Survey of Quantization in LLM: Unlocking Potential Hardware Efficiency. Journal of Computer Science and Technology, 2026, 41(1): 341-358. https://doi.org/10.1007/s11390-026-5979-1

188

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 24 September 2025

Accepted: 29 December 2025

Published: 30 April 2026