Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

Rand Kouatly; Talha Ali Khan

doi:10.26599/TST.2023.9010018

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (5.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

Rand Kouatly^¹, Talha Ali Khan^¹(

)

1Faculty of Tech and Software Engineering, University of Europe for Applied Sciences, Potsdam 14469, Germany

Show Author Information

Abstract

This paper studies a high-speed text-independent Automatic Speaker Recognition (ASR) algorithm based on a multicore system’s Gaussian Mixture Model (GMM). The high speech is achieved using parallel implementation of the feature’s extraction and aggregation methods during training and testing procedures. Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm. The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ (2.3 GHz, four cores without hyper-threading, and 8 GB of RAM). In addition, a remarkable 100% speaker recognition accuracy is achieved.

Keywords

OpenMP Automatic Speaker Recognition (ASR)Gaussian Mixture Model (GMM)shared memory parallel programming PThreads

References

[1]

T. Kinnunen and H. Li, An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., vol. 52, no. 1, pp. 12–40, 2010.

Crossref Google Scholar

[2]

D. A. Reynolds, Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J., vol. 8, no. 2, pp. 173–191, 1995.

Crossref Google Scholar

[3]

R. Auckenthaler, E. S. Parris, and M. J. Caray, Improving a GMM speaker verification system by phonetic weighting, in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, 1999, pp. 313–316.

Crossref Google Scholar

[4]

A. Janicki and S. Biay, Improving GMM-based speaker recognition using trained voice activity detection, https://www.researchgate.net/publication/268290565_Improving_GMM-based_Speaker_Recognition_Using_Trained_Voice_Activity_Detection, 2006.

[5]

D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, vol. 10, nos. 1–3, pp. 19–41, 2000.

Crossref Google Scholar

[6]

F. Ganjeizadeh, H. Lei, A. Maganito, and G. Pallipatta, Reducing the computational complexity of the GMM-UBM speaker recognition approach, Int. J. Eng. Res. Technol., vol. 3, no. 3, pp. 1793–1797, 2014.

Google Scholar

[7]

R. Makhijani, U. Shrawankar, and V. M. Thakare, Opportunities & challenges in automatic speech recognition, arXiv preprint arXiv:1305.2846, 2013.

Google Scholar

[8]

M. Petracca, A. Servetti, and J. C. De Martin, Low-complexity automatic speaker recognition in the compressed GSM AMR domain, in Proc. IEEE Int. Conf. Multimedia and Expo, Amsterdam, the Netherlands, 2005, p. 4.

Google Scholar

[9]

E. Gonina, G. Friedland, H. Cook, and K. Keutzer, Fast speaker diarization using a high-level scripting language, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 2011, pp. 553–558.

Crossref Google Scholar

[10]

D. A. Reynolds and R. C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 72–83, 1995.

Crossref Google Scholar

[11]

T. Yoshimura, T. Fujimoto, K. Oura, and K. Tokuda, SPTK4: An open-source software toolkit for speech signal processing, presented at Proc. 12th Speech Synthesis Workshop, Grenoble, France, 2023.

Crossref Google Scholar

[12]

P. Pacheco, An Introduction to Parallel Programming. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.

[13]

IEEE Standard for Information Technology: Portable Operating System Interface (POSIX), https://pubs.opengroup.org/onlinepubs/009695399/, 2022.

[14]

F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. Petrovska-Delacrétaz, and D. A. Reynolds, A tutorial on text-independent speaker verification, EURASIP J. Adv. Signal Process., vol. 2004, pp. 430–451, 2004.

Crossref Google Scholar

[15]

R. N. Bracewell, The Fourier Transform and Its Applications. New York, NY, USA: McGraw-Hill, 1965.

[16]

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. IEEE, vol. 77, no. 2, pp. 257–286, 1989.

Crossref Google Scholar

[17]

J. Vanek, J. Trmal, J. V. Psutka, and J. Psutka, Optimization of the Gaussian mixture model evaluation on GPU, in Proc. Interspeech 2011, Florence, Italy, 2011, pp. 1737–1740.

Crossref Google Scholar

[18]

G. Friedland, J. Chong, and A. Janin, Parallelizing speaker-attributed speech recognition for meeting browsing, in Proc. IEEE Int. Symp. Multimedia, Taichung, China, 2010, pp. 121–128.

Crossref Google Scholar

[19]

A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1989.

[20]

W. J. J. Roberts and J. P. Willmore, Automatic speaker recognition using Gaussian mixture models, in Proc. Information, Decision and Control Data and Information Fusion Symp., Signal Processing and Communications Symp. and Decision and Control Symp., Adelaide, Australia, 1999, pp. 465–470.

Crossref Google Scholar

[21]

D. Reynolds, Gaussian mixture models, in Encyclopedia of Biometrics, S. Z. Li and A. Jain, eds. New York, NY, USA: Springer, 2009, pp. 659–663.

Crossref

[22]

F. Pernkopf and D. Bouchaffra, Genetic-based EM algorithm for learning Gaussian mixture models, IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 8, pp. 1344–1348, 2005.

Crossref Google Scholar

[23]

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, AFIPS Conf. Proc., vol. 30, pp. 483–485, 1967.

Google Scholar

[24]

C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.

[25]

C. E. Leiserson and I. B. Mirman, How to Survive the Multicore Software Revolution (or at Least Survive the Hype). Burlington, MA, USA: CILK Arts, Inc., 2008.

[26]

G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun, Internally deterministic parallel algorithms can be fast, in Proc. 17^th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, New Orleans, LA, USA, 2012, pp. 181–192.

Crossref Google Scholar

[27]

L. Dagum and R. Menon, OpenMP: An industry standard API for shared-memory programming, IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46–55, 1998.

Crossref Google Scholar

[28]

C. Pheatt, Intel® threading building blocks, J. Comput. Sci. Coll., vol. 23, no. 4, p. 298, 2008.

Google Scholar

[29]

Z. Budimlić, V. Cavé, R. Raman, J. Shirako, S. Taşırlar, J. Zhao, and V. Sarkar, The design and implementation of the habanero-java parallel programming language, in Proc. ACM Int. Conf. Companion on Object Oriented Programming Systems Languages and Applications Companion, Portland, OR, USA, 2011, pp. 185&186.

Crossref Google Scholar

[30]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, X10: An object-oriented approach to non-uniform cluster computing, in Proc. 20^th Annu. ACM SIGPLAN Conf. Object-Oriented it Programming, Systems, Languages, and Applications, San Diego, CA, USA, 2005, pp. 519–538.

Crossref Google Scholar

[31]

R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol. 46, no. 5, pp. 720–748, 1999.

Crossref Google Scholar

[32]

R. J. Anderson and L. Snyder, A comparison of shared and nonshared memory models of parallel computation, in Proc. IEEE, vol. 79, no. 4, pp. 480–487, 1991.

Crossref Google Scholar

[33]

M. Andersch, C. C. Chi, and B. Juurlink, Using OpenMP superscalar for parallelization of embedded and consumer applications, in Proc. Int. Conf. Embedded Computer Systems, Samos, Greece, 2012, pp. 23–32.

Crossref Google Scholar

[34]

J. Arndt, Algorithms for programmers ideas and source code, http://www.jjj.de/fxt/, 2015.

[35]

M. J. Quinn, Parallel Programming in C with MPI and OpenMP. Boston, MA, USA: McGraw-Hill Higher Education, 2004.

[36]

OpenMP: Application Programming. Interface. Version 4.5 November 2015, https://pubs.opengroup.org/onlinepubs/009695399/, 2022.

[37]

W. P. Petersen and P. Arbenz, Introduction to Parallel Computing: A Practical Guide with Examples in C. Oxford, UK: Oxford University Press, 2004.

Crossref

[38]

K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications. San Diego, CA, USA: Academic Press, 1990.

Crossref

[39]

N. S. Disc, J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, Acoustic-phonetic continuous speech corpus, https://catalog.ldc.upenn.edu/LDC93s1, 2022.

[40]

J. Vanĕk, J. Trmal, J. V. Psutka, and J. Psutka, Full covariance Gaussian mixture models evaluation on GPU, in Proc. IEEE Int. Symp. Signal Processing and Information Technology, Ho Chi Minh City, Vietnam, 2012, pp. 203–207.

Crossref Google Scholar

[41]

L. Lu, A. Ghoshal, and S. Renals, Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 2013, pp. 374–379.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 29 Issue 2,
April 2024

Pages 447-456

DOI: 10.26599/TST.2023.9010018

Cite this article:

Kouatly R, Khan TA. Performance of Text-Independent Automatic Speaker Recognition on a Multicore System. Tsinghua Science and Technology, 2024, 29(2): 447-456. https://doi.org/10.26599/TST.2023.9010018

365

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 26 September 2022

Revised: 10 March 2023

Accepted: 18 March 2023

Published: 22 September 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).