Two-Way, Freeform, Speech-to-Speech Translation Systems for Tactical Use

Speech-to-speech translation technologies enable English speakers to quickly communicate with the local population without an interpreter.

The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) is developing and fielding freeform, two-way translation systems that enable speakers of different languages to communicate with one another in real-world tactical situations without an interpreter. To date, several prototype systems have been developed for traffic control points, facilities inspection, civil affairs, medical screening, and combined operations domains in Iraqi Arabic (IA), Mandarin, Farsi, Pashto, Dari, and Thai. Systems have been demonstrated on various-size platforms ranging from personal digital assistants (PDAs) to laptop-grade platforms.

There were a total of four English-to-Pashto and Pashto-to-English translation systems developed by separate teams that were evaluated. Each team’s system architecture is similar in that they feature three principal components: (a) automated speech recognition (ASR), (b) machine translation (MT), and (c) text-to-speech (TTS). When a person speaks, the ASR turns the spoken input into source text. Next, the MT translates the source text into the output target language text. The final step is where the TTS produces spoken output of the target language text. The process occurs in reverse, allowing the technology to translate in both directions (to and from English), enabling English and Pashto speakers to converse with one another.

Evaluations prior to this test event featured the TRANSTAC technologies operating on laptop-based systems and rugged mobile computer platforms. The test event described here marked the first evaluation where the translation software solely operated onboard Nexus One smartphones. These systems functioned without the need for any wireless or cellphone connectivity where the translation software was packaged entirely on the phone. Even though the smartphones featured visual interfaces, test subjects interacted with the eyes-free mode where they could operate the technology using buttons that were either built into the device or connected through its external ports.

Because the technologies were tested in both the heavily controlled lab and more-realistic field-like environments, the teams provided the test subjects with various system configurations that included numerous microphones and headphone options. Each system incorporated the use of the Nexus One’s internal microphones to capture speech. In addition, one of the teams featured a configuration with a headset microphone. While some of the teams used the system’s built-in speaker to output speech, some of them added on an external speaker for speech output.

For both the live Lab and Field evaluations, tactically relevant scenarios were developed to gauge the test subjects’ use of the TRANSTAC system. The first 17 scenarios were performed during the Lab evaluations within controlled conference room environments across three days of testing. The remaining evaluation scenarios were performed during the Field evaluations outdoors for a day following the Lab. Both the Lab and Field evaluations featured Marines who played the role of the English-speaking test subjects and the Pashto speakers conducting conversations in their native languages using the TRANSTAC technologies. The goal of each conversation was for the speaker pair to accurately convey as many concepts, relevant to their motivations, to one another in their allotted time. Analysts focused on what the human speakers said compared with what the technologies translated. Based upon the bilingual speaker’s analysis, the numerous quantitative metrics were calculated.

The Lab evaluations assessed the TRANSTAC systems in a heavily controlled environment that features no background noise and stationary participants (both sitting and standing). This type of venue enables the technology developers to gauge the best the systems can perform at their current state of maturity. There were 17 spontaneous scenarios for the Lab evaluation that the test subjects performed in 15- to 25-minute timeframes. Depending upon the scenario, the speakers were seated across from one another at a table or stood across from one another with the English-speaker holding and controlling the Nexus One. The English-speaking Marines were assigned specific scenarios based upon their deployment experiences. All of the Pashto speakers had experience as interpreters in Afghanistan and/or as role players in training exercises.

The goal of the Field evaluations was to assess the TRANSTAC systems in a more realistic environment. The Field evaluations introduced uncontrolled ambient background noise, sunlight, and wind. The Marines carried the TRANSTAC technologies where some featured external, human-attachable speakers and were allowed to move around within their scenario station. Three unique scenario stations were simulated including a truck to support a vehicle checkpoint and forward operating base entry control point scenarios, an area to simulate a local national’s home to support census and medical conversations, and another area to simulate a facility in support of facility inspection and combined operations planning dialogues. The Marines and Pashto speakers performed eight spontaneous scenarios in the field where a scribe was employed in the same manner as was done in the lab evaluations.

This work was done by Brian A. Weiss and Craig I. Schlenoff of the National Institute of Standards and Technology for DARPA. DARPA-0012



This Brief includes a Technical Support Package (TSP).
Document cover
Two-Way, Freeform, Speech-to-Speech Translation Systems for Tactical Use

(reference DARPA-0012) is currently available for download from the TSP library.

Don't have an account? Sign up here.