Linguistic Analysis

Methods for automated linguistic analysis are the foundation of robust multilingual solutions that allows to cross language barriers. Tilde’s researchers has deep knowledge in developing written language technology for complex, highly inflected languages.

Tilde’s long-term research activities on written language processing and linguistic analysis for highly inflected languages has resulted in the exceptional proofing tools and natural language processing technologies (e.g. morphological analyzers and taggers, syntactic parsers, named entity recognizers, etc.) for Baltic languages.

Linguistic analysis research

 

Linguistic analysis for better language technology

The quality of basic linguistic analysis tools for written text processing plays a crucial role in the development of high level, cutting-edge language technology solutions. Therefore, Tilde's team of researchers is constantly looking for novel methods to improve linguistic analysis tools. Methods used for linguistic analysis include knowledge based, data driven and hybrid. Recently our researchers started to investigate neural network models for three types of written text analysis tasks – syntactic analysis, assessment of grammaticality, and grammar correction.

Tilde’s research in syntactic analysis and grammar checking has been internationally acknowledged and has received best paper, 3rd place in 2014.

Publications

2024

Martins Kronis, Askars Salimbajevs, and Mārcis Pinnis. 2024. Code-Mixed Text Augmentation for Latvian ASR. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 3469–3479.

Georg Rehm, Stelios Piperidis, Khalid Choukri, Andrejs Vasiļjevs (Tilde), Katrin Marheinecke, Victoria Arranz, Aivars Bērziņš (Tilde), Miltos Deligiannis, Dimitris Galanis, Maria Giagkou, Katerina Gkirtzou, Dimitris Gkoumas, Annika Grützner-Zahn, Athanasia Kolovou, Penny Labropoulou, Andis Lagzdiņš (Tilde), Elena Leitner, Valérie Mapelli, Hélène Mazo, et al.. 2024. Common European Language Data Space. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 3579–3586.