Speech timer estimator

1/9/2024

In this paper, a speech enhancement method based on correlation canceling approach associated with the Log- minimum mean-square-error estimator is presented. The results showed that incorporating voice conversion augmentation into the baseline augmentation techniques and applying the CNN-LSTM model improved the accuracy of isolated keyword recognition. Original training data, excluding generated voice using other data augmentation and regularization techniques, were considered as the baseline. We examined the performance of the proposed voice conversion augmentation techniques using robust deep neural network algorithms. Parallel VC was used to accurately maintain the linguistic content. In this study, the main intention of voice conversion is to obtain numerous and various human-like keywords’ voices that are not identical to the source and target speakers’ pronunciation. To overcome this, we generated new raw voices from the original voices using an auxiliary classifier conditional variational autoencoder (ACVAE) method. Collecting and preparing a sufficient amount of voice data for speaker-independent speech recognition is a tedious and bulky task. In this paper, we proposed voice conversion (VC) - based augmentation to increase the limited training dataset and a fusion of a convolutional neural network (CNN) and long-short term memory (LSTM) model for robust speaker-independent isolated keyword recognition. Because of these advanced applications, improving the accuracy of keyword recognition is crucial. Keyword recognition is the basis of speech recognition, and its application is rapidly increasing in keyword spotting, robotics, and smart home surveillance. Due to the good performance, the proposed method will provide a promising alternative for various applications related to radar speech and traditional microphone speech signal enhancement. The experimental results show that the radar sensor can detect long distance speech signal and the proposed method can effectively improve the quality and intelligibility of the radar speech signal. The performance of the proposed method is evaluated by perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI) and composite measures (CMs). Secondly, ITS is employed to remove noise from useful modes which are determined by Pearson correlation coefficient (PCC).

Firstly, EMD is applied to determine the number of decomposition level, and then radar speech is decomposed into several limited bandwidth intrinsic mode functions by VMD. ITS can overcome the limitation of traditional wavelet threshold and achieve the best compromise between speech intelligibility and noise reduction. VMD is a novel adaptive decomposition method, which overcomes the problem of mode aliasing and end effect in empirical mode decomposition (EMD). Therefore, a novel method based on variational mode decomposition (VMD) and improved threshold strategy (ITS) is proposed in this paper for improving the quality and intelligibility of the radar speech. However, the radar speech is often mixed with various noise, which will seriously affect the quality and intelligibility of the speech signal. To further improve the detection distance and sensitivity of bio-radar, a 94 GHz asymmetric antenna radar sensor is employed to detect speech signal.

0 Comments

Speech timer estimator

Leave a Reply.

Author

Archives

Categories