ICRA Noises

The ICRA-Noise has been developed for the International Collegium of Rehabilitative Audiology by the HACTES work group (Hearing Aid Clinical Test Environment standardisation). The purpose was to establish collection of noise signals to be used as background noise in clinical tests of hearing aids and possibly for measuring characteristics of non-linear instruments. By composing signals with well defined spectral and temporal characteristics similar to those typically found in real life speech signals and babble noise, it has been the hope of the HACTES group that these signals eventually could become an international de facto standard for these two purposes.

Specifications:
The signals are based on live English speech from the EUROM database (Chan, 1995) in which a female speaker is explaining about the system of arithmetical notation. Two signals were generated in principle using the same process, resulting in speech spectrum shaped noise corresponding to female and male speech.

The speech signal was sampled with a sampling rate of 44.1 kHz. The process consists of first splitting the signal into three bands with cross over frequencies of 850 Hz and 2500 Hz using IIR filters with a slope exceeding 100 dB/octave and more than 50 dB damping outside pass band. The cross over frequencies was chosen so the 1st formant of vowels were within the low band, the 2nd formant in the mid band and the unvoiced fricatives in the higher band. Next, each of the three bands were scrambled according to according to a process described by Schroeder (1968), which means that with a probability of 0.5 the sign of each sample of the speech is at random either reversed or kept unaltered. Since the numerical value of all samples are preserved by this process, each of the modified signals have the same modulation properties as the original speech, but will be completely unintelligible and have a flat, white spectrum. Next, the same filters by which they were originally separated again filter the Schroeder processed signals. The three signals are then scaled to have the same spectral density level. Now, the three bands are added together forming one signal with a white spectrum and with the original modulation preserved in each of the three frequency-ranges. In order to obtain the desired spectrum the signal is now filtered resulting in signals with spectra corresponding to male and female speech in close accordance with LTASS (Byrne et al., 1996) and the ANSI S3.5 (1997) standard (for the calculation of the SII). However, since the resulting signals had an unpleasant scratchy sound, their phase was smoothed in a 512 point FFT procedure by randomising the phase and then (after an inverse FFT) overlap-adding the segments with 7/8 overlap. The resulting signals have long-term spectrums according to LTASS and modulation characteristics like natural speech.

These signals are more representative of normal speech than filtered stationary Gaussian noise since both the spectrum and the modulation are preserved. Furthermore signals representative of raised and loud voices were generated by including the difference between normal, raised and loud speech according to ANSI S3.5 (1997) in the filter characteristic.

The content (including the wav-files) are kindly provided by prof em. W. Dreschler. Please click here to download.

Tracks

Documents