MT engine from Globalese gains 115% after training

March 29, 2021

Technical Ru > En

Case Study

Engines from IT giants such as Google Translate, Microsoft, and Yandex often win in quality because search engine companies possess the whole internet as their data pool. However, with very specialized content and excellent translation memory, this advantage is nullified.

In this case study, the engine from a smaller MT vendor Globalese won in all evaluations after gaining an enormous 115% boost to performance in training. We took advantage of the customer's monster 1-million segment dataset accumulated over 10 years of consistent translations.

Language combination: Russian to English
Domain: Technical-Aviation
Training dataset: 1 million segments
Highest BLEU score attained: 51 (excellent)
Quality gains over stock Google: +67%

1) Dataset Preparation for Machine Translation

In this case study, we received a huge translation memory from the client, and processed it using our proprietary data pipeline. Our lexicographer kept aviation part names and nomenclature intact, and removed repetitions, segments flagged by automatic QA and inconsistencies. The resulting dataset for training has come out reduced by 60%.

Dataset health report from Custom.MT

2. Training Machine Translation with TMX

Once we had a clean dataset, we train a set of engines with it, including Globalese, Google AutoML, Yandex, Amazon ACT, Microsoft Custom Translator, IBM Watson, and ModernMT Enterprise. The training took significant time and more investment than usual due to dataset size.

3. Machine Translation Automatic and Human Evaluation

It was worth the investment: training yielded huge improvements to BLEU scores and moderate improvements to hLEPOR scores.

Globalese BLEU improved 115.5%, from 23.6 to almost 51, outstripping other engines in this experiment by 10 points or more.

Human evaluation of RU > EN MT engines for Aviation

The human evaluation has been carried out as a blind test with three specialist linguists scoring and editing six engine outputs. In this exercise, scores correlated loosely with automated evaluations. Globalese won again, tied for the first place with Amazon ACT.

The client selected Globalese for further use due to the fact it was already integrated with their preferred translation software Memsource.

Overall human evaluation scores were moderate, due to the fact Russian is an inflectional language, and many segments required suffix correction. Furthermore, the engine often misses some words in the sentence, which requires the linguists to stay vigilant and apply a consistent cognitive effort.

We expect translators working with this engine to achieve editing speeds of 1000-1500 words or up to 4-6 pages per hour after a period of adaptation.

Image credit: Ivan Lapyrin

Related posts

October 14, 2021
The Arrival of Automatic Dubbing

The year 2021 marked the arrival of speech to speech translation in the commercial world. Scientists are working on making the underlying technology smoother and more accurate, engineers are integrating it into practical use cases. At the same time, there is an explosion in neural voices. Between July and September, three companies in this area […]

Read More
July 28, 2021
The Rise of Government NLP Programs – with Manuel Herranz, Pangeanic

Partner Spotlight: Pangeanic Smart governments are hiring data scientists to further automate what governments do for their  citizens. These data scientists work on creating data highways, so that the information that flows into systems is structured, and a thousand different applications can spring forth from it in the future. In the meanwhile, Manuel Herranz and his company […]

Read More
March 29, 2021
MT engine from Globalese gains 115% after training

Case Study Engines from IT giants such as Google Translate, Microsoft, and Yandex often win in quality because search engine companies possess the whole internet as their data pool. However, with very specialized content and excellent translation memory, this advantage is nullified. In this case study, the engine from a smaller MT vendor Globalese won […]

Read More
Subscribe to our newsletter