Knowledge Management

By connecting Big Data and language technologies, Tilde’s research and development team empowers companies to improve management of their knowledge base. Development of cloud-based collaborative terminology services and META-SHARE repository of language resources are just few examples of our successful work in the field.


Knowledge Management research

Bridging Big Data and Language Technologies

As  a  leading  European  SME,  specialising  in  language/multilingual  data  technologies,  Tilde’s research and development activities bridges  the  two  communities - Big  Data  and  Language  Technologies - facilitating  knowledge  and technology  transfer  at  the  European  level.

  • Tilde investigates the ways in which a focus on Big Data in ICT research elides important issues about the information environment we live in. With its  competences  in  language/multilingual  data technologies Tilde examines one of the greatest challenges for Big Data: the analysis and processing of multilingual content in unstructured texts.
  • Tilde is revolutionising terminology research. Tilde’s researcher have developed methods and services for term identification, for crawling the Web and extracting domain specific terms from parallel and comparable corpora, as well as for integration of custom terminology in machine translation. Tilde Terminology services keep terminology organised by identifying terms in documents, extraction, finding relevant translations, and assembling term glossaries up in the cloud. Tilde’s Terminology provides access to rich terminology data in many domains for all EU languages and services (>2.6M manually created terms in 133 term resources and 33 languages).


Ongoing projects

Quality Translation 21

Quality Translation 21

Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.

Read more
Big Data Value BDV

Big Data Value ecosystem

The mission of BDVe is to support the Big Data Value PPP in realizing a vibrant data-driven EU economy or said in other words, BDVe will support the implementation of the PPP to be a success.

Read more
European Language Resource Coordination

European Language Resource Coordination

The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.

Read more

Completed Projects

euroterm bank logo project

EuroTermBank (eContent project) – Collection of Pan-European Terminology Resources through Cooperation of Terminology Institutions

The EuroTermBank project focused on harmonisation and consolidation of terminology work in new EU member states, transferring experience from other European Union terminology networks and accumulating competencies and efforts of the accessed countries. The EuroTermBank project result in a centralized online terminology bank for languages of new EU member countries interlinked to other terminology banks and resources.

Project website

project ermione logo

eRMIONE (eTEN project) – E-Learning Resource Management Service for Interoperability Networks in the European Cultural Heritage Domain

eRMIONE project aimed at making available a range of services supporting e-learning and improving knowledge acquisition, targeted to actors operating in the cultural heritage domain all over Europe, through the electronic interface of the Internet. The final output of eRMIONE project was an e-learning resource management service that delivers European cultural heritage material online to courses to bring enriched cultural exchanges to students at Higher Education Institutions from different countries.

MATT project logo

MATT (EUREKA project) – Web-based Multilingual Automated Terminology Translation System

The goal of the project MATT was to develop a new web-based translation system for automated translation of multilingual terminology that bridges the gap between traditional local (desktop) translation tools and terminology data on the Internet. This unique translation technology is meant for both professional translators using specialised translation environments (for example, SDL Trados, Wordfast, MemoQ), and for various experts and other users requiring easy access to high quality term resources from standard office environments (Microsoft Word, Microsoft PowerPoint, OpenOffice Writer, etc). The platform for multilingual terminology translation is also made available to machine translation technologies.

META-NORD project logo

META-NORD (ICT PSP project) – Baltic and Nordic Parts of the European Open Linguistic Infrastructure

The META-NORD project aimed to establish an open linguistic infrastructure in the Baltic and Nordic countries to serve the needs of the industry and research communities. The project focused on 8 European languages - Danish, Estonian, Finnish, Icelandic, Latvian, Lithuanian, Norwegian and Swedish - that each have less than 10 million speakers. The project assembled, linked across languages, and made widely available language resources of different types used by different categories of user communities in academia and industry to create products and applications that facilitate linguistic diversity in the EU.

Project website

MIAUCE project logo

MIAUCE (FP6 project) – Multi Modal Interaction Analysis and Exploration of Users within a Controlled Environment

The project aimed to investigate and develop techniques to analyse the multi-modal behaviour of users within the context of real applications. The multi-modal behaviour takes the form of eye gaze/fixation, eye blink and body move. The techniques was developed and validated within the context of three different application domains: Security, Customized marketing, and Interactive web TV.

MLi project logo

MLi (FP7 project) – Towards a MultiLingual Data & Services Infrastructure

The MLi Support Action is working to deliver the strategic vision and operational specifications needed for building a comprehensive European MultiLingual data & services Infrastructure, along with a multiannual plan for its development and deployment, and foster multi-stakeholders alliances ensuring its long term sustainability.

Project website

semo project logo


The retrieval of metadata from various documents and their conversion into another format is one of the most significant problems faced by document processing systems. The goal of the SEMO project was to develop a novel intelligent technology that retrieves metadata from documents both in paper and electronic format regardless of their type, structure and language. With the successful implementation of the project, a universal technology is created suitable for use in various document processing systems. 

solim project logo

SOLIM (EUROSTARS project) – Spatial Ontology Language for Multimedia Information Modelling

The objective of the SOLIM project was to improve context-aware information analysis by expansion of state of the art ontology languages and their support for automated reasoning by adding a spatial dimension. This enables semantic systems to venture beyond a static world and add the concepts of space and change.

taas project logo

TaaS (FP7 project) – Terminology as a Service

The TaaS project addressed the need for instant access to the most up-to-date terms, user participation in the acquisition and sharing of multilingual terminological data, and efficient solutions for terminology resources reuse. The developed cloud-based TaaS platform provides the following online core terminology services: 1) automatic extraction of monolingual term candidates from user uploaded documents using the state-of-the-art terminology extraction techniques; 2) Automatic recognition of translation equivalents for the extracted terms in user-defined target language(s) from different public and industry terminology databases; 3 )Automatic acquisition of translation equivalents for terms not found in term banks from parallel/comparable web data using the state-of-the-art terminology extraction and bilingual terminology alignment methods (MS2: Prototype bilingual term extraction system/M12); 4)Facilities for cleaning up (i.e., revising: editing, deleting) of automatically acquired terminology by users; 5) Facilities for terminology sharing and reusing: APIs and export tools for sharing resulting terminological data with major term banks and reuse in different user applications (MS3: TaaS platform and integrated core services). 

Project website

tripod project logo

TRIPOD (FP6 project) – TRI-Partite Multimedia Object Description

Tripod project aimed to automatically build rich multi-faceted text and semantic descriptions of the landscape and permanent man-made features pictured in a photograph; and to create a more advanced image search engine. Tripod augmented images with spatial data to compute contextual information about the location and features of the actual landscape pictured. Using 3D models, buildings and landscape features contained in the image are identified and located within the picture. Techniques from Web search and text summarisation were applied to automatically create textual descriptions of the photographs, producing a rich readable and multifaceted caption far removed from merely location but encompassing culturally encoded notions such as socially connoted language of place such as suburb, west end, etc. 

ttc project logo

TTC (FP7 project) – Terminology Extraction, Translation Tools and Comparable Corpora

The TTC project aimed at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian) as well as in Chinese and Russian. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.




Jose Manuel Gómez-Pérez, Andrés García-Silva, Cristian Berrio, German Rigau, Aitor Soroa, Christian Lieske, Johannes Hoffart, Felix Sasaki, Daniel Dahlmeier, Inguna Skadiņa (Tilde), Aivars Bērziņš (Tilde), Andrejs Vasiḷjevs (Tilde) and Teresa Lynn. 2023. Deep Dive Text Analytics and Natural Language Understanding. European Language Equality, Springer, 313–336.