Variable-Data-Rate Speech Encoder

This encoder could supplant older encoders that operate at diverse fixed rates.

Avariable-data-rate (VDR) speech encoder has been designed to be interoperable with, and eventually to supplant, the many different voice encoders now used in military communication systems. Because these older systems were designed to utilize specific radio links with fixed and limited channel capacities, these systems utilize many different voice compression algorithms operating at various fixed rates. The incompatibility of these systems is an obstacle to interoperability. Emerging net-centric communication systems promise to provide connectivity to all military users, but compatible encoding will be necessary for interoperability, and encryption will be necessary for secure communications.

The Seven Operating Modes of the VDR voice encoder are characterized by different average data rates. Mode 1, characterized by a fixed rate of 2.4 kb/s, is the same mode as that of the Federal standard MELP encoder for narrow-band speech.
The VDR voice encoder is designed to provide both interoperability and security in net-centric voice communications. The VDR speech encoder can operate at any or all of the various data rates of older military speech encoders. Notably, it can operate over a range of data rates up to 26 kb/s and is backward-compatible with the Multiple Excitation Linear Predictive (MELP) voice encoder, which is a Federal-standard encoder that operates at a data rate of 2.4 kb/s. The VDR speech encoder is interoperable at any and all rates simultaneously. The rate setting can be changed dynamically (that is, during operation) without disrupting operation, even when used with encryption: Hence, without compromising security, the VDR speech encoder can be dynamically adjusted to make efficient use of network bandwidth under changing network traffic conditions.

The heart of the VDR voice encoder is a multirate voice processor in which a single voice algorithm generates multiple data streams at rates from 2.4 kb/s to an average rate of about 23 kb/s for input speech at frequencies from 0 to 4 kHz. The algorithm provides for seven different operating modes (see table). Inclusion of a few more kb/s of data from the 4-to-8-kHz audio frequency band makes it possible to encode wide-band speech comparable in quality to that of standard frequency-modulation (FM) broadcasting.

The VDR bit stream has an embedded structure in which higher-rate voice data frames contain successively lower-rate voice data frames as subsets. Deletion of a certain portion of the superset (higherrate frames typically representing higher audio frequencies) makes it possible to reduce the data rate, even in the presence of encryption. Because of this embedded data structure, any of the VDR data rates are interoperable and can be switched, as often as 44 times per second, even when speech is present. Because the speech waveforms of all the VDR rates are synchronous, switching of data rates does not introduce such undesirable sounds such as clicks or warbles.

It must be emphasized that the multirate voice processor in the VDR voice encoder is a single processor running a single algorithm, in contradistinction to both (1) a collection of separate processors operating at different rates and (2) a processor running a multitude of speechcompression algorithms. Prior voice encoders that use multiple compression algorithms do not perform well when algorithms are switched while speech is present. Speech waveforms sometimes become cropped upon switching because different voice algorithms can have different internal delays. Such cropping degrades speech quality and is annoying to listeners.

The VDR speech encoder exploits the variable nature of the speech waveform, utilizing higher or lower data rates as needed (e.g., higher rates for vowels, lower rates for consonants). Unlike some prior speech processors, the speech processor in the VDR speech encoder processor does not eliminate gaps in speech for the sake of efficiency. Elimination of speech gaps that contain ambient sounds could be harmful in military communications because speech gaps often contain sounds that help listeners gauge battlefield conditions at transmitter sites. In the VDR speech encoder, speech gaps are encoded at appropriately low data rates that still provide audible information.

This work was done by Thomas M. Moran, David A. Heide, Yvette and T. Lee of the Naval Research Laboratory and George S. Kang of ITT Industries.



This Brief includes a Technical Support Package (TSP).
Document cover
Variable-Data-Rate Speech Encoder

(reference NRL-0019) is currently available for download from the TSP library.

Don't have an account? Sign up here.