Tech Briefs

Speech-to-speech translation technologies enable English speakers to quickly communicate with the local population without an interpreter.

The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) is developing and fielding freeform, two-way translation systems that enable speakers of different languages to communicate with one another in real-world tactical situations without an interpreter. To date, several prototype systems have been developed for traffic control points, facilities inspection, civil affairs, medical screening, and combined operations domains in Iraqi Arabic (IA), Mandarin, Farsi, Pashto, Dari, and Thai. Systems have been demonstrated on various-size platforms ranging from personal digital assistants (PDAs) to laptop-grade platforms.

There were a total of four English-to-Pashto and Pashto-to-English translation systems developed by separate teams that were evaluated. Each team’s system architecture is similar in that they feature three principal components: (a) automated speech recognition (ASR), (b) machine translation (MT), and (c) text-to-speech (TTS). When a person speaks, the ASR turns the spoken input into source text. Next, the MT translates the source text into the output target language text. The final step is where the TTS produces spoken output of the target language text. The process occurs in reverse, allowing the technology to translate in both directions (to and from English), enabling English and Pashto speakers to converse with one another.

Evaluations prior to this test event featured the TRANSTAC technologies operating on laptop-based systems and rugged mobile computer platforms. The test event described here marked the first evaluation where the translation software solely operated onboard Nexus One smartphones. These systems functioned without the need for any wireless or cellphone connectivity where the translation software was packaged entirely on the phone. Even though the smartphones featured visual interfaces, test subjects interacted with the eyes-free mode where they could operate the technology using buttons that were either built into the device or connected through its external ports.

Because the technologies were tested in both the heavily controlled lab and more-realistic field-like environments, the teams provided the test subjects with various system configurations that included numerous microphones and headphone options. Each system incorporated the use of the Nexus One’s internal microphones to capture speech. In addition, one of the teams featured a configuration with a headset microphone. While some of the teams used the system’s built-in speaker to output speech, some of them added on an external speaker for speech output.

For both the live Lab and Field evaluations, tactically relevant scenarios were developed to gauge the test subjects’ use of the TRANSTAC system. The first 17 scenarios were performed during the Lab evaluations within controlled conference room environments across three days of testing. The remaining evaluation scenarios were performed during the Field evaluations outdoors for a day following the Lab. Both the Lab and Field evaluations featured Marines who played the role of the English-speaking test subjects and the Pashto speakers conducting conversations in their native languages using the TRANSTAC technologies. The goal of each conversation was for the speaker pair to accurately convey as many concepts, relevant to their motivations, to one another in their allotted time. Analysts focused on what the human speakers said compared with what the technologies translated. Based upon the bilingual speaker’s analysis, the numerous quantitative metrics were calculated.