Which AI translates best? DeepL, Google Translate, Azure Translate, ChatGPT?

The question "which AI delivers the best translations?" keeps coming up. The challenge of judging translation quality is best understood in a small self-experiment.

The task: can you tell which translation comes from which source?

Here is a typical sentence from a German consulting firm, in the original:

„Nur 16 Prozent der im Rahmen der Studie befragten Führungskräfte rechnen mit einem Stellenabbau von mehr als 5 Prozent bis Ende 2025."

The five English translations

Only one of them is by the company's own translators, with proofreading and everything that comes with it. The other four are from generic machine translation services: Google Translate, DeepL, Azure Translate and ChatGPT.

"Only 16 percent of the managers surveyed as part of the study expect a reduction in headcount of more than 5 percent by the end of 2025."
"Only 16 percent of the executives surveyed in the study expect job cuts of more than 5 percent by the end of 2025."
"Only 16% of the executives surveyed in the study plan to cut jobs by 5% or more by the end of 2025."
"Only 16 percent of the managers surveyed as part of the study expect job cuts of more than 5 percent by the end of 2025."
"Only 16 percent of the executives surveyed in the study expect a reduction of more than 5 percent in jobs by the end of 2025."

Can you spot the "real" one?

The answer

Number 4 is not it: that one is from Google Translate.
Number 2 sounds suspiciously human, but it's from Azure Translate.
Number 1 is from DeepL.
The version that is actually live on the website, signed off by the company's own translators, is Number 3.
Number 5, finally, is from ChatGPT.

What sets the translations apart?

The company itself uses "executives", writes the percentage as the symbol "%" rather than the word, and prefers "to cut jobs" over "job cuts". Lots of small decisions that no generic model reproduces by itself.

Glossaries and stop-word lists will get you part of the way. In an enterprise context, with millions of words across 5 to 40 languages, you quickly hit a ceiling. And on top of that, ChatGPT is not consistent: next time around the same sentence might become "expect a job reduction of more than 5 percent". You don't actually know how a given sentence will be translated.

Conclusion

There is no one universally correct translation. There is the one that fits this company and this use case. A generic translation model cannot produce that on its own.

That's why our approach is AI training on the company's own language. The model picks up all those specifics. After training, the company's translators review the model's output in a dedicated interface, with metrics and comment fields. The model keeps learning, consistently and in your corporate wording.

We believe that is what international success out of the DACH region needs.

More about this on Translate.Wonk, or book a 30-minute intro call.