Robust Automatic Speech Recognition Using PD-MEEMLIN

Hernández, Igmar; García, Paola; Nolazco, Juan; Buera, Luis; Lleida, Eduardo

doi:10.1007/978-3-540-72849-8_1

Igmar Hernández¹,
Paola García¹,
Juan Nolazco¹,
Luis Buera² &
…
Eduardo Lleida²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4478))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

2411 Accesses

Abstract

This work presents a robust normalization technique by cascading a speech enhancement method followed by a feature vector normalization algorithm. To provide speech enhancement the Spectral Subtraction (SS) algorithm is used; this method reduces the effect of additive noise by performing a subtraction of the noise spectrum estimate over the complete speech spectrum. On the other hand, an empirical feature vector normalization technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based LInear Normalization) has also shown to be effective. PD-MEMLIN models clean and noisy spaces employing Gaussian Mixture Models (GMMs), and estimates a set of linear compensation transformations to be used to clean the signal. The proper integration of both approaches is studied and the final design, PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models based LInear Normalization), confirms and improves the effectiveness of both approaches. The results obtained show that in very high degraded speech PD-MEEMLIN outperforms the SS by a range between 11.4% and 34.5%, and for PD-MEMLIN by a range between 11.7% and 24.84%. Furthemore, in moderate SNR, i.e. 15 or 20 dB, PD-MEEMLIN is as good as PD-MEMLIN and SS techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Framework Combining Separate and Joint Training for Neural Vocoder-Based Monaural Speech Enhancement

Improving speech command recognition through decision-level fusion of deep filtered speech cues

Article 11 November 2023

Automatic Speech Recognition in English Language: A Review

References

Boll, S.: Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Trans. ASSP 27, 113–120 (1979)
Article Google Scholar
Droppo, J., Deng, L., Acero, A.: Evaluation of the Splice Algorithm on the Aurora2 Database. In: Proc. Eurospeech, vol. 1 (2001)
Google Scholar
Gales, M.J.F., Young, S.: Cepstral Parameter Compensation for HMM Recognition in Noise. Speech Communication 12(3), 231–239 (1993)
Article Google Scholar
Moreno, P.J., Raj, B., Gouvea, E., Stern, R.M.: Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition. Department of Electrical and Computer Engineering & School of Computer Science. Carnegie Mellon University
Google Scholar
Hermansky, H., Morgan, N.: RASTA Processing of Speech. IEEE Transactions on Speech and Audio Processing 2(4), 578–589 (1994)
Article Google Scholar
Nolazco-Flores, J., Young, S.: Continuous Speech Recognition in Noise Using Spectral Subtraction and HMM adaptation. In: ICASSP, pp. I.409–I.412 (1994)
Google Scholar
Buera, L., Lleida, E., Miguel, A., Ortega, A.: Multienvironment Models Based LInear Normalization for Speech Recognition in Car Conditions. In: Proc. ICASSP (2004)
Google Scholar
Buera, L., Lleida, E., Miguel, A., Ortega, A.: Robust Speech Recognition in Cars Using Phoneme Dependent Multienvironment LInear Normalization. In: Proceedings of Interspeech, Lisboa, Portugal, pp. 381–384 (2005)
Google Scholar
Martin, R.: Spectral Subtraction Based on Minimum Statistics. In: Proc. Eur. Signal Processing Conf. pp. 1182–1185 (1994)
Google Scholar
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing, pp. 504–512. Prentice Hall PTR, Englewood Cliffs (2001)
Google Scholar
Martin, R.: Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics. IEEE Transactions on Speech and Audio Processing, vol. 9(5) (2000)
Google Scholar
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of Speech Corrupted by Acoustic Noise. In: Proc. IEEE Conf. ASSP, pp. 208–211 (1979)
Google Scholar
Hirsch, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems Under Noisy Condidions. In: ISCA ITRW ASR2000, Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Tecnolgico de Monterrey, Campus Monterrey, México
Igmar Hernández, Paola García & Juan Nolazco
Communications Technology Group (GTC), I3A, University of Zaragoza, Spain
Luis Buera & Eduardo Lleida

Authors

Igmar Hernández
View author publications
Search author on:PubMed Google Scholar
Paola García
View author publications
Search author on:PubMed Google Scholar
Juan Nolazco
View author publications
Search author on:PubMed Google Scholar
Luis Buera
View author publications
Search author on:PubMed Google Scholar
Eduardo Lleida
View author publications
Search author on:PubMed Google Scholar

Editor information

Joan MartíJosé Miguel BenedíAna Maria MendonçaJoan Serrat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernández, I., García, P., Nolazco, J., Buera, L., Lleida, E. (2007). Robust Automatic Speech Recognition Using PD-MEEMLIN. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72849-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-72849-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72848-1
Online ISBN: 978-3-540-72849-8
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics