RG 3-01 'Multilingual Speech Recognition'


Translation of Lectures

Talks and lectures often offer valuable content for a large audience. However, due to the language barrier many talks cannot reach all potential listeners. Due to their high costs, human translations are only in very few cases a feasible option.

Speech-to-speech translation technology can fill a gap, and can offer a solution for many situations, in which human interpreters and translators would be too expensive. Speech-to-speech translation systems are a combination of the technologies of automatic speech recognition (ASR), machine translation (MT), and speech synthesis.

 Many interesting research aspects can be found in the area of automatic speech recognition:


Topic Adaptation

Speech recognition systems deliver their best performance, when adapted to the style and content of the speech to be recognized as closely as possible. Differencies between the topic on which the models have been trained and the topic of the speech to be recognized can lead to large drops in performance, due to out-of-vocabulary words and wrongly estimated n-gram possiblities within the language model of the recognition system.

Run-Time and Latency

Especially for the case of simultaneous translation, speech recognition has to be conducted in in real-time, and the result has to be deliverd with as little latency as possible. Only then, the listeners of the talk will be able to follow the translation and to relate the content of the lectures to the interaction of the lecturer with his audience. With respect to latency interesting questions arise with respect to the interface with the machine translation component, since, in order to be able to translate correctly, a certain amount of linguistic context needs to available to the translation component.