AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (4.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Beyond amplitude: Phase integration in bird vocalization recognition with MHAResNet

Jiangjian Xiea,b,c,d,1Zhulin Haoa,1Chunhe Hua,c,dChangchun Zhanga,c,dJunguo Zhanga,b,c( )
School of Technology, Beijing Forestry University, Beijing, 100083, China
State Key Laboratory of Efficient Production of Forest Resources, Beijing Forestry University, Beijing, 100083, China
Key Laboratory of National Forestry and Grassland Administration on Forestry Equipment and Automation, Beijing, 100083, China
Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing, 100083, China

1 The two authors contribute equally to this work.

Peer review under the responsibility of Editorial Office of Avian Research.

Show Author Information

Abstract

Bird vocalizations are pivotal for ecological monitoring, providing insights into biodiversity and ecosystem health. Traditional recognition methods often neglect phase information, resulting in incomplete feature representation. In this paper, we introduce a novel approach to bird vocalization recognition (BVR) that integrates both amplitude and phase information, leading to enhanced species identification. We propose MHAResNet, a deep learning (DL) model that employs residual blocks and a multi-head attention mechanism to capture salient features from logarithmic power (POW), Instantaneous Frequency (IF), and Group Delay (GD) extracted from bird vocalizations. Experiments on three bird vocalization datasets demonstrate our method’s superior performance, achieving accuracy rates of 94%, 98.9%, and 87.1% respectively. These results indicate that our approach provides a more effective representation of bird vocalizations, outperforming existing methods. This integration of phase information in BVR is innovative and significantly advances the field of automatic bird monitoring technology, offering valuable tools for ecological research and conservation efforts.

References

 

Dvořáková, D., Šipoš, J., Suchomel, J., 2023. Impact of agricultural landscape structure on the patterns of bird species diversity at a regional scale. Avian Res. 14, 100147.

 

Eisele, J., Gerlach, A., Maeder, M., Marburg, S., 2024. Relevance of phase information for object classification in automotive ultrasonic sensing using convolutional neural networks. J. Acoust. Soc. Am. 155, 1060–1070.

 

Feng, Y., Xiao, Y., Yan, Y., Max, L., 2018. Adaptation in Mandarin tone production with pitch-shifted auditory feedback: influence of tonal contrast requirements. Lang. Cogn. Neurosci. 33, 734–749.

 

Florentin, J., Dutoit, T., Verlinden, O., 2020. Detection and identification of European woodpeckers with deep convolutional neural networks. Ecol. Inform. 55, 101023.

 

Ge, R., Kakade, S.M., Kidambi, R., Netrapalli, P., 2019. The step decay schedule: a near optimal, geometrically decaying learning rate procedure for least squares. Adv. Neural Inf. Process. Syst. 32.

 

Gupta, G., Kshirsagar, M., Zhong, M., Gholami, S., Ferres, J.L., 2021. Comparing recurrent convolutional neural networks for large scale bird species classification. Sci. Rep. 11, 17085.

 

He, F., Liu, T., Tao, D., 2020. Why resnet works? residuals generalize. IEEE Transact. Neural Networks Learn. Syst. 31, 5349–5362.

 
He, K., Zhang, X., Ren, S., Sun, J., 2016. In: Deep residual learning for image recognition. IEEE, pp. 770–778.
 
Hidaka, S., Wakamiya, K., Kaburagi, T., 2022. An investigation of the effectiveness of phase for audio classification. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE.
 

Hu, S., Chu, Y., Tang, L., Zhou, G., Chen, A., Sun, Y., 2023a. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition. Appl. Soft Comput. 146, 110678.

 

Hu, S., Chu, Y., Wen, Z., Zhou, G., Sun, Y., Chen, A., 2023b. Deep learning bird song recognition based on mff-scsenet. Ecol. Indic. 154, 110844.

 

Jiang, D., Hu, Y., Dai, L., Peng, J., 2021. Facial expression recognition based on attention mechanism. Sci. Program. 2021, 2021.

 

Kahl, S., Wood, C.M., Eibl, M., Klinck, H., 2021. Birdnet: a deep learning solution for avian diversity monitoring. Ecol. Inform. 61, 101236.

 

Kasten, E.P., Gage, S.H., Fox, J., Joo, W., 2012. The remote environmental assessment laboratory*s acoustic library: an archive for studying soundscape ecology. Ecol. Inform. 12, 50–67.

 
Kinsler, L.E., Frey, A.R., Coppens, A.B., Sanders, J.V., 1999. Fundamentals of Acoustics, fourth ed. Wiley, New York.
 
Koh, C.Y., Chang, J.Y., Tai, C.L., Huang, D.Y., Hsieh, H.H., Liu, Y.W., 2019. In: Bird sound classification using convolutional neural networks. Conference and Labs of the Evaluation Forum. https://ceur-ws.org/Vol-2380/paper_68.pdf (Accessed 20 July 2024).
 

Lin, X., Liu, J., Kang, X., 2016. Audio recapture detection with convolutional neural networks. IEEE Transact. Multimed. 18, 1480–1487.

 

Lu, J., Zhang, Y., Lv, D., Xie, S., Fu, Y., Lv, D., et al., 2023. Improved broad learning system for birdsong recognition. Appl. Sci. 13, 11009.

 

Ma, K., 2016. Biodiversity monitoring relies on the integration of human observation and automatic collection of data with advanced equipment and facilities. Biodivers. Sci. 24, 1201.

 

Manna, A., Upasani, N., Jadhav, S., Mane, R., Chaudhari, R., Chatre, V., 2023. Bird image classification using convolutional neural network transfer learning architectures. Int. J. Adv. Comput. Sci. Appl. 14.

 

Nemeth, E., Pieretti, N., Zollinger, S.A., Geberzahn, N., Partecke, J., Miranda, A.C., et al., 2013. Bird song and anthropogenic noise: vocal constraints may explain why birds sing higher-frequency songs in cities. Proc. R. Soc. B 280, 20122798.

 

Noumida, A., Rajan, R., 2022. Multi-label bird species classification from audio recordings using attention framework. Appl. Acoust. 197, 108901.

 

de Oliveira, A.G., Ventura, T.M., Ganchev, T.D., Silva, L.N., Marques, M.I., Schuchmann, K.L., 2020. Speeding up training of automated bird recognizers by data reduction of audio features. PeerJ 8, e8407.

 

Pahuja, R., Kumar, A., 2021. Sound-spectrogram based automatic bird species recognition using mlp classifier. Appl. Acoust. 180, 108077.

 

Ptacek, L., Machlica, L., Linhart, P., Jaska, P., Muller, L., 2016. Automatic recognition of bird individuals on an open set using as-is recordings. Bioacoustics 25, 55–73.

 

Ruff, J., Lesmeister, D., Duchac, L., Padmaraju, B., Sullivan, C., 2019. Automated identification of avian vocalizations with deep convolutional neural networks. Remote Sens. Ecol. Conserv. 6, 79–92.

 

Schwab, E., Pogrebnoj, S., Freund, M., Flossmann, F., Vogl, S., Frommolt, K.H., 2023. Automated bat call classification using deep convolutional neural networks. Bioacoustics 32, 1–16.

 

Sedlácek, O., Vokurková, J., Ferenc, M., Djomo, E., Albrecht, T., Hořák, D., 2015. A comparison of point counts with a new acoustic sampling method: a case study of a bird community from the montane forests of mount Cameroon. Ostrich 86, 213–220.

 
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. In: Grad-cam: visual explanations from deep networks via gradient-based localization. IEEE, pp. 618–626.
 

Sinha, H., Awasthi, V., Ajmera, P., 2020. Audio classification using braided convolutional neural networks. IET Signal Process. 14, 448–454.

 

Tanzi, L., Audisio, A., Cirrincione, G., Aprato, A., Vezzetti, E., 2022. Vision transformer for femur fracture classification. Injury 53, 2625–2634.

 

Wheeldon, A., Mossman, H.L., Sullivan, M.J., Mathenge, J., de Kort, S.R., 2019. Comparison of acoustic and traditional point count methods to assess bird diversity and composition in the Aberdare National Park, Kenya. Afr. J. Ecol. 57, 168–176.

 

Xiao, H., Liu, D., Avolio, A.P., Chen, K., Li, D., Hu, B., et al., 2022a. Estimation of cardiac stroke volume from radial pulse waveform by artificial neural network. Comput. Methods Progr. Biomed. 218, 106738.

 

Xiao, H., Liu, D., Chen, K., Zhu, M., 2022b. Amresnet: an automatic recognition model of bird sounds in real environment. Appl. Acoust. 201, 109121.

 
Xiao, H., Ran, Z., Mabu, S., Li, Y., Li, L., 2022c. Saunet++: an Automatic Segmentation Model of Covid-19 Lesion from Ct Slices, 39. Visual Comput, pp. 2291–2304.
 

Xie, J., Hu, K., Zhu, M., Yu, J., Zhu, Q., 2019. Investigation of different cnn-based models for improved bird sound classification. IEEE Access 7, 175353–175361.

 

Xie, J., Zhong, Y., Zhang, J., Liu, S., Ding, C., Triantafyllopoulos, A., 2023. A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecol. Inform. 73, 101927.

 

Xie, J., Zhu, M., 2023. Acoustic classification of bird species using an early fusion of deep features. Birds 4, 138–147.

 

Xie, S., Lu, J., Liu, J., Zhang, Y., Lv, D., Chen, X., et al., 2022. Multi-view features fusion for birdsong classification. Ecol. Inform. 72, 101893.

 

Xu, Y., Li, L., Gao, H., 2020. Sentiment classification with adversarial learning and attention mechanism. Comput. Intell. 37, 774–798.

 

Yan, N., Chen, A., Zhou, G., Zhang, Z., Liu, X., Wang, J., et al., 2021. Birdsong classification based on multi-feature fusion. Multimed. Tool. Appl. 80, 36529–36547.

 

Yang, F., Jiang, Y., Xu, Y., 2022. Design of bird sound recognition model based on lightweight. IEEE Access 10, 85189–85198.

 

Zollinger, S.A., Podos, J., Nemeth, E., Goller, F., Brumm, H., 2012. On the relationship between, and measurement of, amplitude and frequency in birdsong. Anim. Behav. 84, e1–e9.

Avian Research
Cite this article:
Xie J, Hao Z, Hu C, et al. Beyond amplitude: Phase integration in bird vocalization recognition with MHAResNet. Avian Research, 2025, 16(1). https://doi.org/10.1016/j.avrs.2025.100229

122

Views

5

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 26 September 2024
Revised: 22 January 2025
Accepted: 14 February 2025
Published: 15 February 2025
© 2025 The Authors.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Return