Automated Translation

Tilde is at the forefront of research and innovation in neural, statistical, and hybrid machine translation. In research community Tilde is recognized by its expertise for the technologies and solutions in the field of automated translation for complex less resourced languages. Cutting-edge research results allowed Tilde to release the world’s first neural machine translation systems for smaller languages.

Automated Translation research

Internationally acknowledged research for under resourced languages

Tilde’s research in automated translation has been internationally acknowledged, particularly research for translation into morphological rich under resourced languages.

  • Development of machine translation (MT) solution for smaller morphologically rich and highly inflected languages is more complex due to relatively free word order and richness of surface forms. Many of these languages have also limited resources (parallel and monolingual corpora) that further complicates the development process. This complex set of characteristics for morphologically rich languages requires to research for these languages specially designed methods that can minimise the negative effect of the above mentioned characteristics on the MT system quality.
  • As research on neural machine translation (MT) research has been mainly focussed on widely used languages Tilde’s researchers aim to research and develop methods and algorithms for successful NN integration in MT solutions and methods for end-to-end neural machine translation system development in a context of complex less resourced languages.

PROJECTS

Ongoing projects

Quality Translation 21

Quality Translation 21

Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.

Read more
Odine project

Open Data Incubator for Europe (ODINE)

As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems.

Read more
European Language Resource Coordination

European Language Resource Coordination

The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.

Read more

Completed Projects

project accurat logo

ACCURAT(FP7 project) – Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation

The aim of the ACCURAT project was to research methods and techniques to overcome one of the central problems of machine translation (MT) – the lack of linguistic resources for under-resourced areas of machine translation. The main goal was to find, analyze and evaluate novel methods that exploit comparable corpora on order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains.

Project website

project eastin cl logo

EASTIN CL (ICT PSP programme) – Crosslingual and Multimodal Search in a Portal for Support of Assisted Living

The project supports the e-inclusion of disabled and elderly people, by providing crosslingual and multimodal support for accessing information bases on assistive tools and technology. Recent efforts have linked national assistive technology information bases into a European portal called EASTIN . The objective of EASTIN-CL was to enhance this portal by creating a front-end to make it more accessible, using language technology: Multilingual technology allowing users to search the data in their native language; multimodal technology allowing them to access the portal not just in written but also in spoken communication.

project lets mt logo

LetsMT! (ICT PSP project) - Platform for Online Sharing of Training Data and Building User Tailored Machine Translation

To fully exploit the huge potential of existing open SMT technologies the project proposed to build an innovative online collaborative platform for data sharing and MT building. This platform supports upload of public as well as proprietary MT training data and building of multiple MT systems, public or proprietary, by combining and prioritizing this data.

Project website

project matt logo

MATT (EUREKA project) – Web-based Multilingual Automated Terminology Translation System

The goal of the project MATT was to develop a new web-based translation system for automated translation of multilingual terminology that bridges the gap between traditional local (desktop) translation tools and terminology data on the Internet. This unique translation technology is meant for both professional translators using specialised translation environments (for example, SDL Trados, Wordfast, Kilgray MemoQ), and for various experts and other users requiring easy access to high quality term resources from standard office environments.

project mli logo

MLi (FP7 project) – Towards a MultiLingual Data & Services Infrastructure

The MLi Support Action is working to deliver the strategic vision and operational specifications needed for building a comprehensive European MultiLingual data & services Infrastructure, along with a multiannual plan for its development and deployment, and foster multi-stakeholders alliances ensuring its long term sustainability.

Project website

project safe logo

SAFE (EUROSTARS project) – Social Analytics for Financial Engineering

The project results is a web based news service consisting of the real time social sentiment about a set of financial products. The news are multilingual (Latvian, Swedish, German, Dutch, Polish, French) social media sources (blogs, feeds). Tilde ensured multilingual social media translation for social sentiment analysis by matured and specially adapted for social networks and financial domains SMT (statistical machine translation) systems. The news feed will be available as a free version listing the sentiment only, and a paid subscription based feed offering added services (links to originating news message, personalization and archive functionality.

Project description

project tripod logo

TRIPOD (FP6 project) – TRI-Partite Multimedia Object Description

Tripod project aimed to automatically build rich multi-faceted text and semantic descriptions of the landscape and permanent man-made features pictured in a photograph; and to create a more advanced image search engine. Tripod augmented images with spatial data to compute contextual information about the location and features of the actual landscape pictured. Using 3D models, buildings and landscape features contained in the image are identified and located within the picture. Techniques from Web search and text summarisation were applied to automatically create textual descriptions of the photographs, producing a rich readable and multifaceted caption far removed from merely location but encompassing culturally encoded notions such as socially connoted language of place such as suburb, west end, etc. 

 

 

Publications

2016

Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alexander Fraser, Aleš Tamchyna, Ondˇrej Bojar, Barry Haddow, Rico Sennrich, Frédéric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alexandre Allauzen, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon, Stella Frank, and Mārcis Pinnis (Tilde). 2016. The QT21/HimL Combined Machine Translation System. Proceedings of the ACL 2016 First Conference on Machine Translation (WMT16), 344-355.

Marcis Pinnis, Rihards Kalnins, Raivis Skadins and Inguna Skadina. 2016. What Can We Really Learn from Post-editing? Proceedings of the Twelfth Conference of the Association for Machine Translation in the Americas (AMTA 2016), vol. 2: MT Users' Track, 86-91.