Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning

Meng Wang; Yisong Wang; Feifan Wu

doi:10.26599/BDMA.2025.9020047

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (3.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning

Meng Wang^¹(

), Yisong Wang^¹, Feifan Wu^²

1College of Design and Innovation, Tongji University, Shanghai 200092, China

2School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

Show Author Information

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge, leading to significant improvements in both factual accuracy and task performance. However, existing dense retrievers face considerable challenges when handling numerical constraints, particularly in queries requiring precise filtering conditions. To systematically explore these issues, we introduce Numerical Constraint Question (NumConQ), a comprehensive multi-domain benchmark dataset that contains more than 6500 queries covering healthcare, finance, education, sports, and movies. Empirical analysis reveals that state-of-the-art dense retrievers achieve only 16.3% accuracy in numerical constraint satisfaction, significantly underperforming relative to their semantic matching capabilities. To address these limitations, we propose Numerical Constraint-aware Retriever (NC-Retriever), which features: (1) a two-phase contrastive learning framework that combines in-batch negative samplings with progressively introduced hard negatives, and (2) a hybrid numerical representation scheme for consistent tokenization. Extensive experiments show that NC-Retriever achieves a relative improvement of 65.84% in recall@10 and a 78.28% increase in precision@10 compared to current state-of-the-art methods. The code and benchmark dataset are available at https://github.com/Tongji-KGLLM/NumConQ.

Keywords

dense retriever benchmark dataset contrastive learning numerical constraint query Retrieval-augmented Generation (RAG)

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 9 Issue 2,
April 2026

Pages 341-359

DOI: 10.26599/BDMA.2025.9020047

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Wang M, Wang Y, Wu F. Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning. Big Data Mining and Analytics, 2026, 9(2): 341-359. https://doi.org/10.26599/BDMA.2025.9020047

904

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 15 February 2025

Revised: 23 April 2025

Accepted: 28 April 2025

Published: 09 February 2026

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).