AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.8 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access | Just Accepted

A Multi-Objective Optimization Framework for Data Cleaning Using Large Language Models

Tianze Hu1Jiacheng Wang1Wenqi Pu1Jiajun Li1Ruixin Gu2Xin Bi3Haijun Yin4Yu-Ping Wang1( )

1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China

2 College of Software Engineering, Beijing University of Technology, Beijing 100124, China

3 School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

4 Engineering Department, Shenyang Aircraft Industry (Group) Co., LTD., Shenyang 110000, China

Show Author Information

Abstract

The use of Large Language Models (LLMs) in data cleaning tasks has demonstrated impressive capabilities. However, the high inference costs associated with LLMs pose significant challenges, particularly when managing large-scale datasets within constrained budgets. While many studies focus on direct methods to reduce inference costs, we propose a novel framework to alleviate the high inference costs of LLMs by transforming the task into a multi-objective optimization problem. This framework begins by decomposing the complex data cleaning task into smaller, well-defined sub-tasks. For each sub-task, the most appropriate method is selected from a range of options, such as rule-based tools, code generation methods, smaller pretrained language models, or LLMs, depending on the trade-off between cost and effectiveness. This allows for a systematic balance between cost and quality, enabling the completion of high-quality data cleaning tasks within budget constraints. Experimental results validate the effectiveness of this approach. The framework significantly reduces inference costs while maintaining high-quality data processing. This framework offers a practical pathway to optimizing LLM-based data cleaning methods, balancing computational efficiency and data processing quality. Future work could explore the dynamic adaptations for evolving sub-tasks or deeper integrations with explainable AI and human-in-the-loop approaches to enhance trust and interpretability in data cleaning pipelines.

References

【1】
【1】
 
 
Big Data Mining and Analytics

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Hu T, Wang J, Pu W, et al. A Multi-Objective Optimization Framework for Data Cleaning Using Large Language Models. Big Data Mining and Analytics, 2025, https://doi.org/10.26599/BDMA.2025.9020074

623

Views

63

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Received: 31 December 2024
Revised: 15 April 2025
Accepted: 13 June 2025
Available online: 10 October 2025

© The author(s) 2026.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).