AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.5 MB)
Submit Manuscript AI Chat Paper
Show Outline
Show full outline
Hide outline
Show full outline
Hide outline
Open Access

Proxy-Based Embedding Alignment for RGB-Infrared Person Re-Identification

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Baidu Inc., Beijing 100084, China
Show Author Information


RGB-Infrared person re-IDentification (re-ID) aims to match RGB and infrared (IR) images of the same person. However, the modality discrepancy between RGB and IR images poses a significant challenge for re-ID. To address this issue, this paper proposes a Proxy-based Embedding Alignment (PEA) method to align the RGB and IR modalities in the embedding space. PEA introduces modality-specific identity proxies and leverages the sample-to-proxy relations to learn the model. Specifically, PEA focuses on three types of alignments: intra-modality alignment, inter-modality alignment, and cycle alignment. Intra-modality alignment aims to align sample features and proxies of the same identity within a modality. Inter-modality alignment aims to align sample features and proxies of the same identity across different modalities. Cycle alignment requires that a proxy is aligned with itself after tracing it along a cross-modality cycle (e.g., IR→RGB→IR). By integrating these alignments into the training process, PEA effectively mitigates the impact of modality discrepancy and learns discriminative features across modalities. We conduct extensive experiments on several RGB-IR re-ID datasets, and the results show that PEA outperforms current state-of-the-art methods. Notably, on SYSU-MM01 dataset, PEA achieves 71.0% mAP under the multi-shot setting of the indoor-search protocol, surpassing the best-performing method by 7.2%.


L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, Scalable person re-identification: A benchmark, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1116–1124.

J. Lu, H. Wan, P. Li, X. Zhao, N. Ma, and Y. Gao, Exploring high-order spatio-temporal correlations from skeleton for person re-identification, IEEE Trans. Image Process., vol. 32, pp. 949–963, 2023.

Z. Dou, Z. Wang, W. Chen, Y. Li, and S. Wang, Reliability-aware prediction via uncertainty learning for person image retrieval, in Proc. 17 th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 588–605.
Z. Dou, Z. Wang, Y. Li, and S. Wang, Progressive-granularity retrieval via hierarchical feature alignment for person re-identification, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP ), Singapore, 2022, pp. 2714–2718.
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline), in Proc. 15 th European Conf. Computer Vision (ECCV ), Munich, Germany, 2018, pp. 501–518.
Y. Cho, W. J. Kim, S. Hong, and S. E. Yoon, Part-based pseudo label refinement for unsupervised person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 7298–7308.
X. Gu, H. Chang, B. Ma, S. Bai, S. Shan, and X. Chen, Clothes-changing person re-identification with RGB modality only, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 1050–1059.
Z. Wang, Z. Wang, Y. Zheng, Y. Y. Chuang, and S. Satoh, Learning to reduce dual-level discrepancy for infrared-visible person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 618–626.

M. Ye, X. Lan, Z. Wang, and P. C. Yuen, Bi-directional center-constrained top-ranking for visible thermal person re-identification, IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 407–419, 2020.

Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, FMCNet: Feature-level modality compensation for visible-infrared person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 7339–7348.

N. Huang, J. Liu, Y. Luo, Q. Zhang, and J. Han, Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification, Pattern Recognit., vol. 135, p. 109145, 2023.

M. Ye, Z. Wang, X. Lan, and P. C. Yuen, Visible thermal person re-identification via dual-constrained top-ranking, in Proc. 27 th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 1092–1099.
A. Wu, W. S. Zheng, H. X. Yu, S. Gong, and J. Lai, RGB-infrared cross-modality person re-identification, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5390–5399.

D. T. Nguyen, H. G. Hong, K. W. Kim, and K. R. Park, Person recognition system based on a combination of body images from visible light and thermal cameras, Sensors, vol. 17, no. 3, p. 605, 2017.

S. Liao, Y. Hu, X. Zhu, and S. Z. Li, Person re-identification by local maximal occurrence representation and metric learning, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 2197–2206.
Y. Yang, S. Liao, Z. Lei, and S. Z. Li, Large scale similarity learning using similar pairs for person verification, in Proc. 30 th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 3655–3661.

H. M. Hu, W. Fang, G. Zeng, Z. Hu, and B. Li, A person re-identification algorithm based on pyramid color topology feature, Multimed. Tools Appl., vol. 76, no. 24, pp. 26633–26646, 2017.

Y. Sun, Q. Xu, Y. Li, C. Zhang, Y. Li, S. Wang, and J. Sun, Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 393–402.

M. Pang, Y. M. Cheung, Q. Shi, and M. Li, Iterative dynamic generic learning for face recognition from a contaminated single-sample per person, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1560–1574, 2021.

R. He, X. Wu, Z. Sun, and T. Tan, Learning invariant deep representation for NIR-VIS face recognition, in Proc. 31 st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 2000–2006.
Y. Cao, M. Long, J. Wang, and S. Liu, Collective deep quantization for efficient cross-modal retrieval, in Proc. 31 st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 3974–3980.
Y. Wu, S. Wang, and Q. Huang, Online asymmetric similarity learning for cross-modal retrieval, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3984–3993.

J. Huo, Y. Gao, Y. Shi, W. Yang, and H. Yin, Heterogeneous face recognition by margin-based cross-modality metric learning, IEEE Trans. Cybern., vol. 48, no. 6, pp. 1814–1826, 2018.


L. Lin, G. Wang, W. Zuo, X. Feng, and L. Zhang, Cross-domain visual matching via generalized similarity measure and feature learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1089–1102, 2017.


C. Peng, X. Gao, N. Wang, and J. Li, Graphical representation for heterogeneous face recognition, IEEE Trans, Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 301–312, 2017.


B. Yang, J. Chen, X. Ma, and M. Ye, Translation, Association and augmentation: Learning cross-modality re-identification from single-modality annotation, IEEE Transactions on Image Processing, vol. 32, pp. 5099–5113, 2023.

M. Ye, X. Lan, J. Li, and P. C. Yuen, Hierarchical discriminative learning for visible thermal person re-identification, in Proc. 32 nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 7501–7508.
Y. Hao, N. Wang, J. Li, and X. Gao, HSME: Hypersphere manifold embedding for visible thermal person re-identification, in Proc. 33 rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 8385–8392.

Y. Zhu, Z. Yang, L. Wang, S. Zhao, X. Hu, and D. Tao, Hetero-center loss for cross-modality person re-identification, Neurocomputing, vol. 386, pp. 97–109, 2020.

Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh, No fuss distance metric learning using proxies, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 360–368.
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, Neighbourhood components analysis, in Proc. 17 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 513–520.
Q. Qian, L. Shang, B. Sun, J. Hu, T. Tacoma, H. Li, and R. Jin, SoftTriple loss: Deep metric learning without triplet sampling, in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 6449–6457.
N. Aziere and S. Todorovic, Ensemble deep manifold similarity learning using hard proxies, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 7291–7299.
S. Kim, D. Kim, M. Cho, and S. Kwak, Proxy anchor loss for deep metric learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3235–3244.
Y. Wen, K. Zhang, Z. Li, and Y. Qiao, A discriminative feature learning approach for deep face recognition, in Proc. 14 th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 499–515.
P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5967–5976.
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2242–2251.
C. Godard, O. M. Aodha, and G. J. Brostow, Unsupervised monocular depth estimation with left-right consistency, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6602–6611.
F. Wang, Q. Huang, and L. J. Guibas, Image co-segmentation via consistent functional maps, in Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 849–856.

L. Wu, Y. Wang, and L. Shao, Cycle-consistent deep generative hashing for cross-modal retrieval, IEEE Trans. Image Process., vol. 28, no. 4, pp. 1602–1612, 2019.

A. Hermans, L. Beyer, and B. Leibe, In defense of the triplet loss for person re-identification, arXiv preprint arXiv: 1703.07737, 2017.
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.

D. Zhang, Z. Zhang, Y. Ju, C. Wang, Y. Xie, and Y. Qu, Dual mutual learning for cross-modality person re-identification, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5361–5373, 2022.

M. Ye, J. Shen, D. J. Crandall, L. Shao, and J. Luo, Dynamic dual-attentive aggregation learning for visible-infrared person re-identification, in Proc. 16 th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 229–247.
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, Random erasing data augmentation, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 13001–13008.
G. Wang, T. Zhang, J. Cheng, S. Liu, Y. Yang, and Z. Hou, RGB-infrared cross-modality person re-identification via joint pixel and feature alignment, in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 3622–3631.
D. Li, X. Wei, X. Hong, and Y. Gong, Infrared-visible cross-modal person re-identification with an X modality, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 4610–4617.
P. Dai, R. Ji, H. Wang, Q. Wu, and Y. Huang, Cross-modality person re-identification with generative adversarial training, in Proc. 27 th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 677–683.

Z. Feng, J. Lai, and X. Xie, Learning modality-specific representations for visible-infrared person re-identification, IEEE Trans. Image Process., vol. 29, pp. 579–590, 2020.

Y. Lu, Y. Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, Cross-modality person re-identification with shared-specific feature transfer, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 13376–13386.
Y. Chen, L. Wan, Z. Li, Q. Jing, and Z. Sun, Neural feature search for RGB-infrared person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 587–597.
X. Tian, Z. Zhang, S. Lin, Y. Qu, Y. Xie, and L. Ma, Farewell to mutual information: Variational distillation for cross-modal person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 1522–1531.

Y. Zhou, R. Li, Y. Sun, K. Dong, and S. Li, Knowledge self-distillation for visible-infrared cross-modality person re-identification, Appl. Intell., vol. 52, no. 9, pp. 10617–10631, 2022.

Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, Circle loss: A unified perspective of pair similarity optimization, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 6397–6406.

L. Van Der Maaten, Accelerating t-SNE using tree-based algorithms, J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, 2014.

Tsinghua Science and Technology
Pages 1112-1124
Cite this article:
Dou Z, Sun Y, Li Y, et al. Proxy-Based Embedding Alignment for RGB-Infrared Person Re-Identification. Tsinghua Science and Technology, 2025, 30(3): 1112-1124.








Web of Science






Received: 06 April 2023
Revised: 26 September 2023
Accepted: 28 September 2023
Published: 08 April 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (
