Linguistic Analysis

Methods for automated linguistic analysis are the foundation of robust multilingual solutions that allows to cross language barriers. Tilde’s researchers has deep knowledge in developing written language technology for complex, highly inflected languages.

Tilde’s long-term research activities on written language processing and linguistic analysis for highly inflected languages has resulted in the exceptional proofing tools and natural language processing technologies (e.g. morphological analyzers and taggers, syntactic parsers, named entity recognizers, etc.) for Baltic languages.

Linguistic analysis research


Linguistic analysis for better language technology

The quality of basic linguistic analysis tools for written text processing plays a crucial role in the development of high level, cutting-edge language technology solutions. Therefore, Tilde's team of researchers is constantly looking for novel methods to improve linguistic analysis tools. Methods used for linguistic analysis include knowledge based, data driven and hybrid. Recently our researchers started to investigate neural network models for three types of written text analysis tasks – syntactic analysis, assessment of grammaticality, and grammar correction.

Tilde’s research in syntactic analysis and grammar checking has been internationally acknowledged and has received best paper, 3rd place in 2014.


Ongoing projects

Quality Translation 21

Quality Translation 21

Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.

Read more
Odine Project

Open Data Incubator for Europe (ODINE)

As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems.

Read more
European Language Resource Coordination

European Language Resource Coordination

The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.

Read more

Completed Projects

project clarity logo

CLARITY (FP5 project) – Cross-Language Information Retrieval and Organisation of Text and Audio Documents

The aim of the CLARITY project was to develop cross-lingual information retrieval (CLIR) techniques for English -> Finnish, Swedish, Latvian & Lithuanian i.e low density languages with minimal translation resources and to investigate techniques of document organisation and presentation in concept hierarchies and by document genres and filters. Clarity was a fully-fledged retrieval system that supported the user during the whole process of query formulation, text retrieval and document browsing.


project ttc logo

TTC (FP7 project) – Terminology Extraction, Translation Tools and Comparable Corpora

The TTC project aimed at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian) as well as in Chinese and Russian. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.



Georg Rehm, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Victoria Arranz, Andrejs Vasiļjevs (Tilde), Gerhard Backfried, Jose Manuel Gomez-Perez, Ulrich Germann, Rémi Calizzano, Nils Feldhus, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Julian Moreno-Schneider, Dimitris Galanis, Penny Labropoulou, Miltos Deligiannis, Katerina Gkirtzou, Athanasia Kolovou, Dimitris Gkoumas, Leon Voukoutis, Ian Roberts, Jana Hamrlova, Dusan Varis, Lukas Kacena, Khalid Choukri, Valérie Mapelli, Mickaël Rigault, Julija Melnika (Tilde), Miro Janosik, Katja Prinz, Andres Garcia-Silva, Cristian Berrio, Ondrej Klejch and Steve Renals. 2021. European Language Grid: A Joint Platform for the European Language Technology Community. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 221–230.


Ēriks Ajausks (Tilde), Victoria Arranz, Laurent Bié, Aleix Cerdà-i-Cucó, Khalid Choukri, Montse Cuadros, Hans Degroote, Amando Estela, Thierry Etchegoyhen, Mercedes García-Martínez, Aitor Garcı́a-Pablos, Manuel Herranz, Alejandro Kohan, Maite Melero, Mike Rosner, Roberts Rozis (Tilde), Patrick Paroubek, Artūrs Vasiļevskis (Tilde) and Pierre Zweigenbaum.The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (2020), 471–472.