Linguistic Analysis
Methods for automated linguistic analysis are the foundation of robust multilingual solutions that allows to cross language barriers. Tilde’s researchers has deep knowledge in developing written language technology for complex, highly inflected languages.
Tilde’s long-term research activities on written language processing and linguistic analysis for highly inflected languages has resulted in the exceptional proofing tools and natural language processing technologies (e.g. morphological analyzers and taggers, syntactic parsers, named entity recognizers, etc.) for Baltic languages.
Linguistic analysis for better language technology
The quality of basic linguistic analysis tools for written text processing plays a crucial role in the development of high level, cutting-edge language technology solutions. Therefore, Tilde's team of researchers is constantly looking for novel methods to improve linguistic analysis tools. Methods used for linguistic analysis include knowledge based, data driven and hybrid. Recently our researchers started to investigate neural network models for three types of written text analysis tasks – syntactic analysis, assessment of grammaticality, and grammar correction.
Tilde’s research in syntactic analysis and grammar checking has been internationally acknowledged and has received best paper, 3rd place in 2014.
PROJECTS
Ongoing projects

Quality Translation 21
Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.
Read more
Open Data Incubator for Europe (ODINE)
As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems.
Read more
European Language Resource Coordination
The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.
Read moreCompleted Projects
CLARITY (FP5 project) – Cross-Language Information Retrieval and Organisation of Text and Audio Documents
The aim of the CLARITY project was to develop cross-lingual information retrieval (CLIR) techniques for English -> Finnish, Swedish, Latvian & Lithuanian i.e low density languages with minimal translation resources and to investigate techniques of document organisation and presentation in concept hierarchies and by document genres and filters. Clarity was a fully-fledged retrieval system that supported the user during the whole process of query formulation, text retrieval and document browsing.
TTC (FP7 project) – Terminology Extraction, Translation Tools and Comparable Corpora
The TTC project aimed at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian) as well as in Chinese and Russian. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.