Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission

Analysis of the common methods of evaluating the quality and limitations of speech could serve as a resource for users of standard speech intelligibility measurement methods.

Speech is a form of language (communication code) that uses vocally produced sounds to convey thoughts, meanings, and feelings. To communicate by speech, speech sounds must be both produced and perceived. Speech production refers to the process by which predetermined vocalized sounds are produced by the talker and organized in sequences forming communication signals. Speech perception is the process by which the listener is able to hear and interpret (understand) the message encoded in the speech signals.

The effective design and use of audio communication systems requires some knowledge of the physical properties of speech and the rules that govern the human perception of speech. The two main physical descriptors of speech signal are its sound intensity and spectral content.

The long-term average sound intensity levels of phonated speech produced with various levels of vocal effort are as follows:

However, individual phonemic components of speech vary greatly in their intensity with vowels carrying much greater energy than consonants. The strongest vowel, /aw/, as in the word “all,” is about 28 dB more intense than the weakest consonant, /th/, as in the word “thin.” Whispered (unphonated) speech levels are in the order of 40 dB(A), but this kind of speech is not used in formal communication.

A person’s vocal level effort depends on the visual and auditory clues stemming from the distance (real or perceived) to the listener and the emotional state of the talker. In noisy environments vocal effort is naturally higher (raised, loud, or shouted) than in quiet (normal) environments, because talkers involuntarily raise their voices to the level needed for them to hear themselves. Conversely, talkers wearing hearing protectors reduce their vocal efforts by about 3 dB, compared to when unprotected, if the background noise level exceeds 75 dB A (ISO 2003).

The speech levels referenced above are the levels measured in front of the talker’s mouth. However, the vocal source is quite directional and the levels at the talker’s back may measure up to 5–7 dB lower. This difference is relatively small at low- and mid-frequencies but sharply increases for spectral content at higher frequencies (consonants).

This work was done by Tomasz R. Letowski and Angelique A. Scharine for the Army Research Laboratory. For more information, download the Technical Support Package (free white paper) below. ARL-0229



This Brief includes a Technical Support Package (TSP).
Document cover
Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission

(reference ARL-0229) is currently available for download from the TSP library.

Don't have an account? Sign up here.