AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

A Precise Framework for Rice Leaf Disease Image–Text Retrieval Using FHTW-Net

Hongliang Zhou1Yufan Hu1Shuai Liu1( )Guoxiong Zhou1( )Jiaxin Xu1Aibin Chen1Yanfeng Wang2Liujun Li3Yahui Hu4
College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, Hunan, China
National University of Defense Technology, Changsha 410015, Hunan, China
Department of Soil and Water Systems, University of Idaho, Moscow, ID 83844, USA
Plant Protection Research Institute, Academy of Agricultural Sciences, Changsha 410125, Hunan, China
Show Author Information

Abstract

Cross-modal retrieval for rice leaf diseases is crucial for prevention, providing agricultural experts with data-driven decision support to address disease threats and safeguard rice production. To overcome the limitations of current crop leaf disease retrieval frameworks, we focused on four common rice leaf diseases and established the first cross-modal rice leaf disease retrieval dataset (CRLDRD). We introduced cross-modal retrieval to the domain of rice leaf disease retrieval and introduced FHTW-Net, a framework for rice leaf disease image–text retrieval. To address the challenge of matching diverse image categories with complex text descriptions during the retrieval process, we initially employed ViT and BERT to extract fine-grained image and text feature sequences enriched with contextual information. Subsequently, two-way mixed self-attention (TMS) was introduced to enhance both image and text feature sequences, with the aim of uncovering important semantic information in both modalities. Then, we developed false-negative elimination–hard negative mining (FNE-HNM) strategy to facilitate in-depth exploration of semantic connections between different modalities. This strategy aids in selecting challenging negative samples for elimination to constrain the model within the triplet loss function. Finally, we introduced warm-up bat algorithm (WBA) for learning rate optimization, which improves the model’s convergence speed and accuracy. Experimental results demonstrated that FHTW-Net outperforms state-of-the-art models. In image-to-text retrieval, it achieved R@1, R@5, and R@10 accuracies of 83.5%, 92%, and 94%, respectively, while in text-to-image retrieval, it achieved accuracies of 82.5%, 98%, and 98.5%, respectively. FHTW-Net offers advanced technical support and algorithmic guidance for cross-modal retrieval of rice leaf diseases.

References

1

Rai A, Maharjan MR, Harris Fry HA, Chhetri PK, Wasti PC, Saville NM. Consumption of rice, acceptability and sensory qualities of fortified rice amongst consumers of social safety net rice in Nepal. PLOS ONE. 2019;14(10):Article e0222903.

2
Wang SS, Gougherty AV, Davies TJ. Non-native tree pests have a broader host range than native pests and differentially impact host lineages. J. Ecol. 2022;110(12):2898–2910.
3
Schuster F. Finding potential solutions for growers’ needs in the field of pests and diseases by searching for existing solutions in other countries. Am. Chem. Soc. 2015:250.
4

Kwon TH, Kim JY, Lee C, Park GH, Ashtiani-Araghi A, Baek SH, Rhee JY. Survey on informatization status of farmers for introducing ubiquitous agriculture information system. J Biosyst Eng. 2014;39(1):57–67.

5

Zhen Y, Yeung DY. Active hashing and its application to image and text retrieval. Data Min Knowl Disc. 2013;26:255–274.

6

Yilmaz T, Yazici A, Kitsuregawa M. RELIEF-MM: Effective modality weighting for multimedia information retrieval. Multimedia Syst. 2014;20(4):389–413.

7
Jain MS, Polanski K, Conde CD, Chen X, Park J, Mamanova L, Knights A, Botting RA, Stephenson E, Haniffa M, et al. MultiMAP: Dimensionality reduction and integration of multimodal data. Genome Biol. 2021;22(1):1–26.
8

Li M, Zhou G, Chen A, Yi J, Lu C, He M, Hu Y. FWDGAN-based data augmentation for tomato leaf disease identification. Comput Electron Agric. 2022;194:Article 106779.

9
Cai C, Wang Q, Cai W, Yang Y, Hu Y, Li L, Wang Y, Zhou G. Identification of grape leaf diseases based on VN-BWT and Siamese DWOAM-DRNet. Eng. Appl. Artif. Intel. 2023;123:Article 106341.
10
Li M, Zhou G, Chen A, Li L, Hu Y. Identification of tomato leaf diseases based on LMBRNet. Eng. Appl. Artif. Intel. 2023;123:Article 106195.
11

Deng Y, Xi H, Zhou G, Chen A, Wang Y, Li L, Hu Y. An effective image-based tomato leaf disease segmentation method using MC-UNet. Plant Phenom. 2023;5:0049.

12

Tang Z, He X, Zhou G, Chen A, Wang Y, Li L, Hu Y. A precise image-based tomato leaf disease detection approach using PLPNet. Plant Phenom. 2023;5:0042.

13

Dong Y, Xu F, Liu L, du X, Ren B, Guo A, Geng Y, Ruan C, Ye H, Huang W, et al. Automatic system for crop pest and disease dynamic monitoring and early forecasting. IEEE J Sel Top Appl Earth Observ Remote Sens. 2020;13:4410–4418.

14
Ta X, An D, Wei Y. Dissolved oxygen prediction method for recirculating aquaculture system, based on a timing attenuation matrix and a convolutional neural network. Aquaculture. 2019;503:26–33.
15

Xin M, Wang Y. Image recognition of crop diseases and insect pests based on deep learning. Wirel Commun Mob Comput. 2021;2021:1–15.

16

Frome A, Corrado GS, Shlens J, Begio S, Dean J, Ranzato MA, Mikolov T. Devise: A deep visual-semantic embedding model. Adv Neural Inf Proces Syst. 2013;26:2121–2129.

17
Li Z, Guo C, Wang X, Feng Z, Du Z. Selectively hard negative mining for alleviating gradient vanishing in image-text matching. arXiv. 2023. arXiv:2303.00181.
18
Wang Z, Gao Z, Guo K, Yang Y, Wang X, Shen HT. Multilateral semantic relations modeling for image text retrieval. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023; Vancouver, Canada.
19
Faghri F, Fleet DJ, Kiros JR. Vse++: Improving visual-semantic embeddings with hard negatives. arXiv. 2017. arXiv:1707.05612.
20
Chen J, Hu H, Wu H, Jiang Y, Wang C. Learning the best pooling strategy for visual semantic embedding. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021; Nashville, TN, USA.
21
Li H, Bin Y, Liao J, et al. Your negative may not be true negative: Boosting image-text matching with false negative elimination. Paper presented at: Proceedings of the 31st ACM International Conference on Multimedia; 2023; Ottawa, Canada.
22
Lee KH, Chen X, Hua G, Hu H, He X. Stacked cross attention for image-text matching. Paper presented at: Proceedings of the European conference on computer vision (ECCV); 2018; Munich, Germany.
23
Wei J, Yang Y, Xu X, Zhu X, Shen HT. Universal weighting metric learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2021;44(10):6534–6545.
24
Qu L, Liu M, Wu J, Nie L, Qu L, . Dynamic modality interaction modeling for image-text retrieval. Paper presented at: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2021; Pizza, Italian.
25
Ge X, Chen F, Xu S, Tao F, Jose JM. Cross-modal semantic enhanced interaction for image-sentence retrieval. Paper presented at: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2023; Vancouver, Canada.
26
Wei X, Zhang T, Li Y, Zhang Y, Wu F. Multi-modality cross attention network for image and sentence matching. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; Seattle, WA, USA.
27
Zhang K, Mao Z, Wang Q, Zhang Y. Negative-aware attention framework for image-text matching. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022; New Orleans, LA, USA.
28
Schuster F. New tools for finding potential solutions for differential MRLs and for growers’ needs in the area of pests and diseases. Am Chem Soc. 2018:256.
29

Sethy PK, Barpanda NK, Rath AK, Behera SK. Deep feature based rice leaf disease identification using support vector machine. Comput Electron Agric. 2020;175:Article 105527.

30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Uszkoreit J, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 2020. arXiv:2010.11929.
31

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. Adv Neural Inf Proces Syst. 2017;30.

32
Qu L, Liu M, Cao D, Nie L, Tian Q. Context-aware multi-view summarization network for image-text matching. Paper presented at: Proceedings of the 28th ACM International Conference on Multimedia; 2020; Beijing, China.
33
Devlin J, Chang M W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. 2018. arXiv:1810.04805.
34
Xuan H, Stylianou A, Liu X, Pless R. Hard negative examples are hard, but useful. Paper presented at: Computer Vision–ECCV 2020:16th European Conference; 2020 Aug 23–28; Glasgow, UK.
35
Wang Z, Gao Z, Xu X, Luo Y, Yang Y, Shen H. Point to rectangle matching for image text retrieval. Paper presented at: Proceedings of the 30th ACM International Conference on Multimedia; 2022; Lisbon, Portugal.
36
Yang X S. A new metaheuristic bat-inspired algorithm. Nature inspired cooperative strategies for optimization (NICSO 2010). Berlin, Heidelberg: Springer Berlin Heidelberg; 2010. p. 65–74.
37
Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Dollar P, Zitnick CL. Microsoft coco captions: Data collection and evaluation server. arXiv. 2015. arXiv:1504.00325.
38
Bhattacharjee D, Zhang T. Süsstrunk S, Salzmann M. Mult: An end-to-end multitask learning transformer. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022; New Orleans, LA, USA.
39
Zhang X, Sun X. Luo Y, Ji J, Zhou Y, Wu Y, Huang F, Ji R. Rstnet: Captioning with adaptive attention on visual and non-visual words. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021; Nashville, Tennessee, USA.
40
Child R, Gray S, Radford A, Sutskerver I. Generating long sequences with sparse transformers. arXiv. 2019. arXiv:1904.10509.
41
Hu X, Zhang P, Zhang Q, Yuan F. GLSANet: Global-local self-attention network for remote sensing image semantic segmentation. IEEE Geosci. Remote Sens. Lett. 2023;20:1–5.
42
Kennedy J, Eberhart R. Particle swarm optimization. Paper presented at: Proceedings of ICNN'95-International Conference on Neural Networks; 1995; Perth, Western Australia, Australia.
43
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv. 2014. arXiv:1412.6980.
44

Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011;21:137–146.

45
Chen T, Deng J, Luo J. Adaptive offline quintuplet loss for image-text matching. Paper presented at: Computer Vision–ECCV 2020:16th European Conference; 2020 Aug 23–28; Glasgow, UK.
46
Chen W, Chen X, Zhang J, Huang K. Beyond triplet loss: A deep quadruplet network for person re-identification. Paper presented at: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017; Honolulu, Hawaii, USA.
47
Zhang K, Mao Z, Liu AA, Zhang Y. Unified adaptive relevance distinguishable attention network for image-text matching. IEEE Trans. Multimed. 2022;25:1320–1332.
48
Wu Y, Wang S, Song G, Huang Q. Learning fragment self-attention embeddings for image-text matching. Paper presented at: Proceedings of the 27th ACM International Conference on Multimedia; 2019; Nice, France.
49
Chen H, Ding G. Liu X, Lin X, Liu Z, Han J. Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; Seattle, WA, USA.
50
Li K, Zhang Y, Li K, Li Y, Fu Y. Visual semantic reasoning for image-text matching. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019; Long Beach, CA, USA.
51
Liu C, Mao Z, Zhang T, Xie H, Wang B, Zhang Y. Graph structured network for image-text matching. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; Seattle, WA, USA.
Plant Phenomics
Article number: 0168
Cite this article:
Zhou H, Hu Y, Liu S, et al. A Precise Framework for Rice Leaf Disease Image–Text Retrieval Using FHTW-Net. Plant Phenomics, 2024, 6: 0168. https://doi.org/10.34133/plantphenomics.0168

199

Views

5

Crossref

4

Web of Science

2

Scopus

0

CSCD

Altmetrics

Received: 12 November 2023
Accepted: 13 March 2024
Published: 25 April 2024
© 2024 Hongliang Zhou et al. Exclusive licensee Nanjing Agricultural University. No claim to original U.S. Government Works.

Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0).

Return