Noise robust voice activity detection based on periodic to aperiodic component ratio
DOI: 10.1016/j.specom.2009.08.003
Title: Noise robust voice activity detection based on periodic to aperiodic component ratio
Journal Title: Speech Communication
Volume: 52
Issue: 1
Publication Date: January 2010
Start Page: 41
End Page: 60
Published online: online 28 August 2009
ISSN: 0167-6393
Affiliations:

  • a NTT Communication Science Laboratories, NTT Corporation, Hikaridai 2-4, Seikacho, Sourakugun, Kyoto 619-0237, Japan

  • b NTT Cyber Space Laboratories, NTT Corporation, Hikarino-oka 1-1, Yokosuka City, Kanagawa 239-0847, Japan
  • Abstract: er proposes a Noise robust voice activity detection (VAD) technique called PARADE (PAR based Activity DEtection) that employs the periodic component to aperiodic component ratio (PAR). Conventional Noise robust features for VAD are still sensitive to non-stationary Noise, which yields variations in the signal-to-Noise ratio, and sometimes requires a priori Noise power estimations, although the characteristics of environmental Noise change dynamically in the real world. To overcome this problem, we adopt the PAR, which is insensitive to both stationary and non-stationary Noise, as an acoustic feature for VAD. By considering both periodic and aperiodic components simultaneously in the PAR, we can mitigate the effect of the non-stationarity of Noise. PARADE first estimates the fundamental frequencies of the dominant periodic components of the observed signals, decomposes the power of the observed signals into the powers of its periodic and aperiodic components by taking account of the power of the aperiodic components at the frequencies where the periodic components exist, and calculates the PAR based on the decomposed powers. Then it detects the presence of target speech signals by estimating the voice activity likelihood defined in relation to the PAR. Comparisons of the VAD performance for noisy speech data confirmed that PARADE outperforms the conventional VAD algorithms even in the presence of non-stationary Noise. In addition, PARADE is applied to a front-end processing technique for automatic speech recognition (ASR) that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition) as an application of PARADE. Comparisons of the ASR performance for noisy speech show that the SPADE front-end combined with PARADE achieves significantly higher word accuracies than those achieved by MFCC (Mel-frequency Cepstral Coefficient) based feature extraction, which is widely used for conventional ASR systems, the SPADE front-end without PARADE, and other standard Noise robust front-end processing techniques (ETSI ES 202 050 and ETSI ES 202 212). This result confirmed that PARADE can improve the performance of front-end processing for ASR.
    Accepted: 20 August 2009
    Received: 28 May 2008
    Revised: 8 July 2009
    Keywords: Voice activity detection; Robustness; Periodicity; Aperiodicity; Noise robust front-end processing for automatic speech recognition
    Tel: +81 774 93 5216
    Fax: +81 774 93 1945
    Email: ishizuka@cslab.kecl.ntt.co.jp nak@cslab.kecl.ntt.co.jp masakiyo@cslab.kecl.ntt.co.jp miyazaki.nob

    Please Share this Paper with friends:
    Comment
    No.
    Comment Content
    User Name
    Date
    Post new Comment
    UserName