Automated Translation
Tilde is at the forefront of research and innovation in neural, statistical, and hybrid machine translation. In research community Tilde is recognized by its expertise for the technologies and solutions in the field of automated translation for complex less resourced languages. Cutting-edge research results allowed Tilde to release the world’s first neural machine translation systems for smaller languages.
Internationally acknowledged research for under resourced languages
Tilde’s research in automated translation has been internationally acknowledged, particularly research for translation into morphological rich under resourced languages.
- Development of machine translation (MT) solution for smaller morphologically rich and highly inflected languages is more complex due to relatively free word order and richness of surface forms. Many of these languages have also limited resources (parallel and monolingual corpora) that further complicates the development process. This complex set of characteristics for morphologically rich languages requires to research for these languages specially designed methods that can minimise the negative effect of the above mentioned characteristics on the MT system quality.
- As research on neural machine translation (MT) research has been mainly focussed on widely used languages Tilde’s researchers aim to research and develop methods and algorithms for successful NN integration in MT solutions and methods for end-to-end neural machine translation system development in a context of complex less resourced languages.
PROJECTS
Ongoing projects

Quality Translation 21
Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.
Read more
Open Data Incubator for Europe (ODINE)
As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems.
Read more
European Language Resource Coordination
The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.
Read moreCompleted Projects
ACCURAT(FP7 project) – Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
The aim of the ACCURAT project was to research methods and techniques to overcome one of the central problems of machine translation (MT) – the lack of linguistic resources for under-resourced areas of machine translation. The main goal was to find, analyze and evaluate novel methods that exploit comparable corpora on order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains.
EASTIN CL (ICT PSP programme) – Crosslingual and Multimodal Search in a Portal for Support of Assisted Living
The project supports the e-inclusion of disabled and elderly people, by providing crosslingual and multimodal support for accessing information bases on assistive tools and technology. Recent efforts have linked national assistive technology information bases into a European portal called EASTIN . The objective of EASTIN-CL was to enhance this portal by creating a front-end to make it more accessible, using language technology: Multilingual technology allowing users to search the data in their native language; multimodal technology allowing them to access the portal not just in written but also in spoken communication.
LetsMT! (ICT PSP project) - Platform for Online Sharing of Training Data and Building User Tailored Machine Translation
To fully exploit the huge potential of existing open SMT technologies the project proposed to build an innovative online collaborative platform for data sharing and MT building. This platform supports upload of public as well as proprietary MT training data and building of multiple MT systems, public or proprietary, by combining and prioritizing this data.
MATT (EUREKA project) – Web-based Multilingual Automated Terminology Translation System
The goal of the project MATT was to develop a new web-based translation system for automated translation of multilingual terminology that bridges the gap between traditional local (desktop) translation tools and terminology data on the Internet. This unique translation technology is meant for both professional translators using specialised translation environments (for example, SDL Trados, Wordfast, MemoQ), and for various experts and other users requiring easy access to high quality term resources from standard office environments.
MLi (FP7 project) – Towards a MultiLingual Data & Services Infrastructure
The MLi Support Action is working to deliver the strategic vision and operational specifications needed for building a comprehensive European MultiLingual data & services Infrastructure, along with a multiannual plan for its development and deployment, and foster multi-stakeholders alliances ensuring its long term sustainability.
SAFE (EUROSTARS project) – Social Analytics for Financial Engineering
The project results is a web based news service consisting of the real time social sentiment about a set of financial products. The news are multilingual (Latvian, Swedish, German, Dutch, Polish, French) social media sources (blogs, feeds). Tilde ensured multilingual social media translation for social sentiment analysis by matured and specially adapted for social networks and financial domains SMT (statistical machine translation) systems. The news feed will be available as a free version listing the sentiment only, and a paid subscription based feed offering added services (links to originating news message, personalization and archive functionality.
TRIPOD (FP6 project) – TRI-Partite Multimedia Object Description
Tripod project aimed to automatically build rich multi-faceted text and semantic descriptions of the landscape and permanent man-made features pictured in a photograph; and to create a more advanced image search engine. Tripod augmented images with spatial data to compute contextual information about the location and features of the actual landscape pictured. Using 3D models, buildings and landscape features contained in the image are identified and located within the picture. Techniques from Web search and text summarisation were applied to automatically create textual descriptions of the photographs, producing a rich readable and multifaceted caption far removed from merely location but encompassing culturally encoded notions such as socially connoted language of place such as suburb, west end, etc.