AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Multimodal Dependence Attention and Large-Scale Data Based Offline Handwritten Formula Recognition

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230022, China
Show Author Information

Abstract

Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures. Recently, the deep neural network recognizers based on the encoder-decoder framework have achieved great improvements on this task. However, the unsatisfactory recognition performance for formulas with long \LaTeX strings is one shortcoming of the existing work. Moreover, lacking sufficient training data also limits the capability of these recognizers. In this paper, we design a multimodal dependence attention (MDA) module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition performance of the formulas with long \LaTeX strings. To alleviate overfitting and further improve the recognition performance, we also propose a new dataset, Handwritten Formula Image Dataset (HFID), which contains 25620 handwritten formula images collected from real life. We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances, 63.79% and 65.24% expression accuracy on CROHME 2014 and CROHME 2016, respectively.

Electronic Supplementary Material

Download File(s)
JCST-2110-11987-Highlights.pdf (145.1 KB)

References

[1]

Zhang J S, Du J, Zhang S L, Liu D, Hu Y L, Hu J S, Wei S, Dai L R. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition , 2017, 71: 196–206. DOI: 10.1016/j.patcog.2017.06.017.

[2]
Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Image-to-markup generation via paired adversarial learning. In Proc. the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Sept. 2018, pp.18–34. DOI: 10.1007/978-3-030-10925-7_2.
[3]

Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vision , 2020, 128(10): 2386–2401. DOI: 10.1007/s11263-020-01291-5.

[4]
Anderson R H. Syntax-directed recognition of hand-printed two-dimensional mathematics. In Proc. the Association for Computing Machinery Inc. Symposium, Aug. 1967, pp.436–459. DOI: 10.1145/2402536.2402585.
[5]
Hu L, Zanibbi R. Segmenting handwritten math symbols using AdaBoost and multi-scale shape context features. In Proc. the 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp.1180–1184. DOI: 10.1109/ICDAR.2013.239.
[6]
Álvaro F, Sánchez J A, Benedí J M. Offline features for classifying handwritten math symbols with recurrent neural networks. In Proc. the 22nd International Conference on Pattern Recognition, Aug. 2014, pp.2944–2949. DOI: 10.1109/ICPR.2014.507.
[7]

Awal A M, Mouchère H, Viard-Gaudin C. A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognit. Lett. , 2014, 35: 68–77. DOI: 10.1016/j.patrec.2012.10.024.

[8]

Álvaro F, Sánchez J A, Benedí J M. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. , 2016, 51: 135–147. DOI: 10.1016/j.patcog.2015.09.013.

[9]
Deng Y T, Kanervisto A, Ling J, Rush A M. Image-to-markup generation with coarse-to-fine attention. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.980–989.
[10]
Zhang J S, Du J, Dai L R. Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In Proc. the 24th International Conference on Pattern Recognition, Aug. 2018, pp.2245–2250. DOI: 10.1109/ICPR.2018.8546031.
[11]

Le A D, Indurkhya B, Nakagawa M. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett. , 2019, 128: 255–262. DOI: 10.1016/j.patrec.2019.09.002.

[12]
Li Z, Jin L W, Lai S X, Zhu Y C. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In Proc. the 17th International Conference on Frontiers in Handwriting Recognition, Sept. 2020, pp.175–180. DOI: 10.1109/ICFHR2020.2020.00041.
[13]
Zhang J S, Du J, Yang Y X, Song Y Z, Wei S, Dai L R. A tree-structured decoder for image-to-markup generation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 1027.
[14]
Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S, Bengio Y. Show, attend and tell: Neural image caption generation with visual attention. In Proc. the 32nd International Conference on International Conference on Machine Learning, Jul. 2015, pp.2048–2057.
[15]

Mouchère H, Zanibbi R, Garain U, Viard-Gaudin C. Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011-2014. Int. J. Document Anal. Recognit. , 2016, 19(2): 173–189. DOI: 10.1007/s10032-016-0263-5.

[16]
Mouchère H, Viard-Gaudin C, Zanibbi R, Garain U. ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In Proc. the 15th International Conference on Frontiers in Handwriting Recognition, Oct. 2016, pp.607–612. DOI: 10.1109/ICFHR.2016.0116.
[17]
Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C, Garain U. ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1533–1538. DOI: 10.1109/ICDAR.2019.00247.
[18]

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. , 1997, 9(8): 1735–1780. DOI: 10.1162/neco. 1997.9.8.1735.

[19]
Chung J, Gulcehre C, Cho K H, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. https://arxiv.org/abs/1412.3555, May 2024.
[20]
Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y N. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1243–1252.
[21]
Tang G B, Müller M, Rios A, Sennrich R. Why self-attention? A targeted evaluation of neural machine translation architectures. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.4263–4272. DOI: 10.18653/v1/D18-1458.
[22]

Zhang J S, Du J, Dai L R. Track, Attend, and Parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia , 2019, 21(1): 221–233. DOI: 10.1109/TMM.2018.2844 689.

[23]
Liu C, Yin F, Wang D, Wang Q. CASIA online and offline Chinese handwriting databases. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.37–41. DOI: 10.1109/ICDAR.2011.17.
[24]

Marti U V, Bunke H. The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Document Anal. Recognit. , 2002, 5(1): 39–46. DOI: 10.1007/ s100320200071.

[25]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, May 2024.
[26]
Gu J X, Wang G, Cai J F, Chen T. An empirical study of language CNN for image captioning. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1231–1240. DOI: 10.1109/ICCV.2017.138.
[27]
Xiu Y H, Wang Q Q, Zhan H J, Lan M, Lu Y. A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1464–1469. DOI: 10.1109/ICDAR.2019.00235.
[28]
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: 10.1109/CVPR.2017.243.
[29]
Weston J, Chopra S, Bordes A. Memory networks. arXiv: 1410.3916, 2014. https://arxiv.org/abs/1410.3916, May 2024.
[30]
Ranzato M A, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. arXiv: 1511.06732, 2015. https://arxiv.org/abs/1511.06732, May 2024.
[31]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.
[32]
Zanibbi R, Mouchère H, Viard-Gaudin C. Evaluating structural pattern recognition for handwritten math via primitive label graphs. In Proc. the SPIE 8658, Document Recognition and Retrieval XX, Feb. 2013, Article No. 865817. DOI: 10.1117/12.2008409.
[33]
Abadi M, Agarwal A, Barham P et al. Tensor-flow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2016. https://arxiv.org/abs/1603.04467, May 2024.
[34]
Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv: 1212.5701, 2012. https://arxiv.org/abs/1212.5701, May 2024.
[35]
Krogh A, Hertz J A. A simple weight decay can improve generalization. In Proc. the 4th International Conference on Neural Information Processing Systems, Dec. 1991, pp.950–957.
[36]
Cho K. Natural language understanding with distributed representation. arXiv: 1511.07916, 2015. https://arxiv.org/abs/1511.07916, May 2024.
Journal of Computer Science and Technology
Pages 654-670
Cite this article:
Liu H-C, Dong L-F, Zhang X-M. Multimodal Dependence Attention and Large-Scale Data Based Offline Handwritten Formula Recognition. Journal of Computer Science and Technology, 2024, 39(3): 654-670. https://doi.org/10.1007/s11390-022-1987-y

125

Views

0

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 21 October 2021
Accepted: 27 April 2022
Published: 26 June 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return