Ethics code: 0000
1- Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India , manojkud1@srmist.edu.in
2- Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
3- Department of Computer Science and Engineering, Easwari Engineering College, Chennai, India
Abstract: (107 Views)
Background: In fact, outside factors can often get in the way of speech, making it lose important parts. There are some problems with traditional algorithms and Deep Learning (DL) methods when it comes to getting rid of background noise from noisy signals, especially when the conditions are not stable or causal. You can use the Wiener filter's auto-associative property to map distinguishing features like SNR estimation and the gain of input source waveforms or their spectra. The enhancement of noisy speech signals is essential in medical and assistive applications, such as hearing aids, telemedicine, speech-based pathological diagnosis, and biomedical acoustic signal analysis, beyond traditional speech communication. These systems' better intelligibility and clarity are very important for accurate clinical assessments and human–machine interaction in healthcare settings.
Methods: The proposed work introduces a fusion technique called the Wiener-based Recurrent Neural Network (WRNN), which integrates the Wiener filter with an enhanced variant of the Recurrent Neural Network (RNN) referred to as the Bi-directional Gated Recurrent Unit (Bi-GRU). This hybrid model improves speech quality and gets rid of background noise from noisy input signals by using both statistical filtering and temporal learning features.
Results:The proposed WRNN achieves the following results on babbling noise: TIMIT — STOI 85.4% (−5 dB) and 91.5% (−2 dB), PESQ 2.58 (−5 dB) and 2.91 (−2 dB); WSJ — STOI 92.1% (−5 dB) and 94.9% (−2 dB), PESQ 2.98 (−5 dB) and 3.15 (−2 dB). For the settings that were looked at, WRNN always does better in STOI and PESQ than baseline methods like RNN, RNN-IRM, RNN-TCS, and ARN.
Conclusion: The suggested Wiener filter–Bi-GRU (WRNN) fusion framework shows that it can improve speech signals better in environments with non-stationary and non-causal noise. The model shows a lot of promise for improving medical signals in addition to general speech enhancement. It can help people better understand heart sounds, breathing signals, and pathological speech even when there is a lot of noise. The performance metrics examined—Short-Time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ)—validate the WRNN’s ability to maintain intelligibility and perceptual quality in both synthetic and real-world environments.
Keywords: Speech Enhancement,
Noise Removal,
Wiener Filter,
Bi-GRU,
PESQ,
STOI,
Deep Learning,
Speech Quality Enhancement,
Medical Signal Processing,
Hearing Aids,
Telemedicine.
Type of Study:
Orginal Article |
Subject:
● Artificial Intelligence Received: 2025/08/22 | Accepted: 2025/10/26 | Published: 2025/10/26