This week latest Tilde’s research results in neural machine translations were presented at the TSD 2017 conference. Research findings show promising results in improving quality of translations for named entities and technical texts.
Tilde's research paper “Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data” analyses issues of rare and unknown word splitting with byte pair encoding for neural machine translation and proposes two methods that allow improving the quality of word splitting. The first method linguistically guides byte pair encoding and the second method limits splitting of unknown words. The authors show a significant improvement in translation quality over baseline systems in all reported experiments.
We envision that the proposed methods will allow improving the translation of named entities and technical texts in production systems that often receive data not represented in the training corpus.
The 20th International Conference of Text, Speech and Dialogue (TSD2017) took place in Prague from August 27 to August 31. It explores a wide range of topics in the field of speech and natural language processing.
This year the TSD conference is celebrating 20 years. The history of the TSD conference dates back to 1997, when the event was held for the first time in Mariánské Lázně. The essential idea behind the project was to establish a scientific meeting platform that would act as a bridge between the East and the West.