Tilde Data Library – one of the world's largest repositories of multilingual data

Data is the driving force behind language technology. To use language technologies like machine translation, data must be multilingual, plentiful, and reliable.

Boost system quality

Boost system quality

Data is the raw material for MT system. Tap our resources for improving your system's accuracy and quality.

Make your systems smarter

Make your systems smarter

Linguistic components can make your MT system smarter by helping to process complex phrases and inflections.

Broader your knowledge

Broader your knowledge

Adding terminology to your language technology systems can broader their scope and range.

How can you benefit from our data library?

Data for Machine Translation

For customers that don’t have enough data for building a MT system, we can draw from our rich resources to boost a system’s capabilities. Tilde Data Library includes 12.35 billion parallel sentences and 23.85 billion monolingual sentences in 124 languages.

See full language statistics
Data for Machine Translation
Terminology Data

Terminology Data

Terminology integration has been proven to boost a MT system's quality by over 30%. Tilde Data Library includes over 4 million authoritative terms and 13 million automatically extracted terms in over 25 languages.

EuroTermBank

Linguistic Components

Tilde Data Library features sophisticated linguistic components that provide linguistic knowledge for language technology systems.

Learn more

Linguistic Components
Language Resources meta share

Language Resources

Tilde maintains the Nordic master node of META-SHARE, the open language resource exchange facility.

Learn More