Training of translation models

Best translation quality for companies

All sources as a training basis: websites, glossaries, databases, documents, TMS exports

Extraction of training data automatically and independently

Validation, cleansing and processing of language data

Training and fine-tuning of language models

Evaluation of translation results in your own environment

Training of translation models

In a nutshell

Your previous translations are a treasure trove of data that you should use to your advantage. 

Through past investments in translation and proofreading, your company has created a large amount of language data that is of very good quality.

We can make this data usable for you and thus achieve the quality of human translation – with the efficiency of machine translation.

Gain speed in global rollouts of content and campaigns and make the work of your local subsidiaries easier.

What steps does the training go through? 

Training machine translation in your corporate language consists of 5 steps:

1. Collection – Collect language data from websites, glossaries, language databases, documents, TMS exports
2. Extraction – Extracting the language pairs from the collected data
3. Processing – Validating, cleaning and processing the language data for the trainings
4. AI-Training– Training the language models on the translation knowledge
5. Evaluation – Evaluation of the language models by the customer in the model blind test

How do you train translation models? 

Translate learns corporate language like a translator.

1. Collection:

You would explain language rules to a new translator. You might show them a directory of product names. There may be a glossary in which you have defined technical terms and translations. In some cases, you can even include a document on corporate language and tone-of-voice. Or you don’t have a structure and instruct the translator to look at your publications and keep that style. We do this part of the collection fully automatically as soon as we receive the information.

2. Extraction:

From all the sources and data that are made available to us, we have to find pairs of sentences.
Sentence pairs are the currency of translation model training.

For example, an example of a training set at KWS Saat is:

Deutsch: “Und so widerstandsfähig wie Roggen ist kaum ein anderes Getreide.”
Englisch: “And there is hardly any other cereal that is so resistant as rye.”

This sentence contains both grammar and the linguistic tonality that KWS speaks of “cereal” while generic translation models would translate  “grain”.

3. Processing:

The data we receive is usually unsorted. Especially in published documents and websites, we find texts that have not been translated 1:1 in the same place and appropriately. That’s why we have developed our own technology to find and process these pairs of sentences. 
This makes training economically and time-wise.

4. AI Training:

We bring together the processed language data in a training corpus. Of course, with a control group and optimally prepared for the model to be trained. The training cycles are then run through until the results are good enough for evaluation.

5. Rating:

In the evaluation phase, we make the results available in a separate area. There, the translation managers can validate and check the results of the trained business models with those of generic models or previous translations.

And how does AI continue to learn? 

Translate learns with every proofreading and publication.

During operation, we provide you with your trained translation models via API and web frontend. With each translation, you can add changes from the proofreading. In addition, we periodically use the training sources and check which changes have taken place in the vocabulary in order to train them back into the model.

And if we have ad-hoc changes and, for example, release a new product? In our training center, you can manage the changes directly and determine that translations have to be changed immediately. We also regularly include your adjustments in the training center in the model training.

Why train translation models?

Summary:

Learning the special terms, tonality and specific naming coventions of the company – for the highest translation quality.

Continuous learning and improvement in operation – for future best results.

Processing of all data sources for the training – whether websites, TMS, glossaries, documents or other sources – for the use of all language data that has been generated so far.
How much does it cost:

We offer the complete process from the collection to the provision of the evaluation environment and the evaluation of the training results – at a flat rate of €5,000 per language.

How long does it take to train an AI? 

Teaching your corporate language to Translate is possible very quickly.

We were able to complete the data exchange, training and evaluation phases in just a few working days. A common value is about 3-4 weeks. The critical path here is access to the data – especially in the case of third-party systems such as translation memory systems. And the availability of those responsible for translation for evaluation.

In der Evaluation werden etwa 200 Satzpaare geprüft, Sie sollten mit etwa 16 Stunden Aufwand rechnen.

Phase Plan Training & Operations

From data to translation

01

Data exchange

wonk.ai will receive the list of languages to be translated and access to the data sources available for training.

02

Data Checkup

wonk.ai checks the quality of the language data and the number of language pairs to be achieved and provides feedback on feasibility.

03

Specifying the testers

The client determines which stakeholders and experts in the company will review the language models and evaluate the results – compared to previous translations or alternative solutions.

04

Training of language models

wonk.ai extracts the language data, validates and cleans the training set, and trains the language models with mathematical evaluation.

05

Evaluation of the results

The customer’s testers evaluate the language models within their own assessment environment and provide feedback on individual results.

05

Commissioning

Once the language models have been initially approved, they can be put into operation directly and can be used via the customer’s own web environment. The trained models can also be integrated into third-party systems via the API and can thus be used in the entire system landscape.

What can Translate do?

Translate combines the economics of machine translation with the quality of human translation.

Website Projects and Content – . XLIFF

.xml, .xlif and other formats that a CMS can export, Translate avoids and translates the language true to format.

Downloads – .PDF

Translate translates already set documents in .pdf format into the desired languages.

Office documents – Word, Excel, PowerPoint

Classic documents that arise in daily office work can be translated with Translate. And keep your formatting and design.

Own corporate language

Within a training project, we train Translate based on existing content from websites, translation memory systems (TMS), documents of any form and existing glossaries.

Catalogs, brochures and printed matter – .IDML

This content is usually created in Indesign in .idml format. Translate translates the content so that it remains editable and in Indesign format. 

100% DSVGO compliant

Integrated through API into Microsoft Azure. Data in EU & Germany – or on-premise if desired

Our customers

How about it?

Automate your translations in high-quality.

The training for translate is clear…

what else to know?