1- Department of Electronics &Communication Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Ramapuram Campus, Chennai, India.
2- Department of Computer Science Engineering, Easwari Engineering College, Anna University, Chennai, India. , manojkud1@srmist.edu.in
3- Department of Computer Science Engineering, Easwari Engineering College, Anna University, Chennai, India.
Abstract: (2351 Views)
Background: External factors can often interfere with speech, causing it to lose important components. There are some problems with traditional algorithms and deep learning (DL) methods when it comes to removing background noise from noisy signals, especially when conditions are unstable or non-causal. The auto-associative property of the Wiener filter can be utilized to map distinguishing features such as SNR estimation and the gain of input source waveforms or their spectra. Enhancing noisy speech signals is essential in medical and assistive applications beyond traditional speech communication, including hearing aids, telemedicine, speech-based pathological diagnosis, and biomedical acoustic signal analysis. Improved intelligibility and clarity in these systems are crucial for accurate clinical assessments and human–machine interaction in healthcare settings.
Methods: The proposed work introduces a fusion technique called the Wiener-based recurrent neural network (WRNN), which integrates the Wiener filter with an enhanced variant of the recurrent neural network (RNN) referred to as the bi-directional gated recurrent unit (Bi-GRU). This hybrid model improves speech quality and eliminates background noise from noisy input signals using both statistical filtering and temporal learning features.
Results: The proposed WRNN achieved the following results on babbling noise:
For the TIMIT dataset with the same type of noise, the scores were 85.4% and 91.5%.
For the PESQ parameter, babble noise from the WSJ corpus at -5 dB and -2 dB SNR yielded scores of 2.98 and 3.15, respectively, while the TIMIT dataset with the same type of noise resulted in scores of 2.58 and 2.91. In the evaluated settings, the WRNN consistently outperforms baseline methods such as RNN, RNN-IRM, RNN-TCS, and ARN in both STOI and PESQ.
Conclusion: The suggested Wiener filter–Bi-GRU (WRNN) fusion framework demonstrates its capacity to enhance speech signals in environments with non-stationary and non-causal noise. The model shows significant promise for improving medical signals in addition to general speech enhancement. It can aid in better understanding heart sounds, breathing signals, and pathological speech even in the presence of substantial noise. The performance metrics examined—short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ)—validate the WRNN’s ability to maintain intelligibility and perceptual quality in both synthetic and real-world environments.
Keywords: Speech enhancement,
Noise removal,
Wiener filter,
Bi-GRU,
PESQ,
STOI,
Deep learning (DL),
Speech quality enhancement,
Medical signal processing,
Hearing aids,
Telemedicine
Type of Study:
Orginal Article |
Subject:
● Artificial Intelligence Received: 2025/08/22 | Accepted: 2025/10/26 | Published: 2025/12/31