AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

A Survey of Quantization in LLM: Unlocking Potential Hardware Efficiency

Department of Computer Science and Technology, Tsinghua University, Beijing 100190, China
Inspur Electronic Information Industry Co., Ltd., Beijing 100190, China
Show Author Information

Abstract

Large language models (LLMs) have achieved remarkable progress in natural language processing, but their immense scale leads to significant computational and storage overheads, limiting their deployment and widespread application in resource-constrained environments. Model quantization, as an effective model compression technique, significantly reduces LLMs' memory footprint and computational requirements by lowering the numerical precision of model parameters and/or activations, while striving to maintain minimal performance loss. This survey aims to comprehensively review the latest advancements in LLM quantization, covering various techniques from the pretraining phase to the inference phase. We will delve into state of the art quantization during pretraining, post-training quantization, and quantization-aware training in quantization fine-tuning, and various quantization methods during inference. Through in-depth analysis of these methods, this survey seeks to provide researchers and engineers with a comprehensive understanding of LLM quantization techniques to identify future research directions and offers an insight of how to generate high performance low-precision kernels in different chips.

References

【1】
【1】
 
 
Journal of Computer Science and Technology
Pages 341-358

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Chen Y-D, Zheng K-J, Guo Z-H, et al. A Survey of Quantization in LLM: Unlocking Potential Hardware Efficiency. Journal of Computer Science and Technology, 2026, 41(1): 341-358. https://doi.org/10.1007/s11390-026-5979-1

188

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Received: 24 September 2025
Accepted: 29 December 2025
Published: 30 April 2026
© Institute of Computing Technology, Chinese Academy of Sciences 2026