Trained MT in automotive: 83% needs no editing

February 25, 2021

Guest post by Jourik Ciesielski*

Guest Case study

Jourik Ciesielski trained a Google AutoML engine for an automotive customer in English to Spanish (Latam) language combination. As a result training, only 17% segments needed editing

In this case study we discuss the impact of training on the output generated by an MT engine. The technology behind this project is Google Cloud AutoML, but note that there are more providers that support engine training (e.g. Microsoft, SYSTRAN, Yandex).

Project details

Language combination: English to Spanish (Latin America)

Domain: Automotive

Training data: 932K unique and approved translation units

Achieved BLEU score: 54.16

The data set was extended with 500 random sentences coming from owner manuals that are available online. Those sentences were processed with the trained engine after which they were evaluated against a “ready to publish” standard. For each translation we determined whether or not it could be published without any human intervention. Perfect translations were categorized as “No post-editing”, translations that required one correction as “Light post-editing” and translations that needed more than one correction as “Heavy post-editing”.

Results

83% of the translations were correct and didn’t need any post-editing effort. The remaining 17% required only 1 correction. There was not a single translation that needed heavy post-editing.

Despite the fact that both the domain (automotive) and the language combination (EN > LaES) are very suitable for MT, the overall quality of the generated output must be labeled as “excellent”. The trained engine produces accurate and consistent translations while style and tone of voice correspond to the training data. The Latin flavor is beautifully maintained as well.

Errors

Inconsistent translations of term candidates across multiple sentences (e.g. both “concesionario” and “distribuidor” for “dealer”) were the biggest problem. Those inconsistencies only occur when the term candidates appear either inconsistently or not frequently enough in the training data. The truth of the matter is that a lot of LSPs will face this problem; translation memories that are built up over several years may not always be very consistent.

Note that there’s actually only 1 term candidate (“airbag”) that the engine doesn’t handle well at all (both “airbag” and “bolsa de aire”, each 55 occurrences). This term candidate pushes the error rate up considerably.

Besides the inconsistencies, 4 translations had grammar issues while another 4 translations had semantic problems.

Terminology

Since the inconsistency problem is caused by deficiencies in the training data, the solution is simple: retrain the engine until it generates the desired results. We estimate that the amount of correct translations will reach 98% if the engine is retrained appropriately.

Note that Google AutoML supports the use of glossaries (API only). It can help to avoid certain terminology errors:

{

   "translations": [

       {

           "translatedText": "El volante calefactado se apaga cada vez que arranca el motor, incluso si lo encendió la última vez que condujo el vehículo.",

           "model": "projects/XXXXXXXXXXXX/locations/us-central1/models/TRLXXXXXXXXXXXXXXXXXXX"

       }

   ],

   "glossaryTranslations": [

       {

           "translatedText": "El volante térmico de dirección se apaga cada vez que arranca el motor, incluso si lo encendió la última vez que condujo el vehículo.",

           "model": "projects/XXXXXXXXXXXX/locations/us-central1/models/TRLXXXXXXXXXXXXXXXXXXX",

           "glossaryConfig": {

               "glossary": "projects/XXXXXXXXXXXX/locations/us-central1/glossaries/Automotive_en_es"

           }

       }

   ]

}

Nevertheless, it doesn’t take casing, gender, inflections or plurals into account (it only does 1-on-1 replacements similar to the Custom Terminology feature in Amazon Translate), so one should be very careful with this. Glossaries are preferably only used for non-translatables and perhaps ambiguous terms. Training must have priority over glossaries.

Unlike Google and Amazon, DeepL’s custom vocabulary feature does take casing, gender, inflections and plurals into account. For the time being it is available for a few language combinations only, so we don’t know how it will react to more exotic (Asian) or heavily inflected (Slavic) languages. If DeepL delivers, this might be a serious breakthrough for terminology in MT.

Conclusion

Despite the requirements that MT engine training entails (collecting and preparing data, evaluating, testing, etc.), it makes sense to train. A well-trained (and frequently retrained) engine is very suitable for raw MT projects and ensures that post-editing efforts are reduced to a minimum, which enables companies (both LSPs and enterprises) to improve their gross margins.

*Biography:

Jourik Ciesielski holds a Master in Translation as well as a Postgraduate Certificate in Specialized Translation from KU Leuven, Faculty of Arts in Antwerp (Belgium). In 2013 he started as an intern at Yamagata Europe in Ghent (Belgium) as part of his studies and then stayed with the company as full-time localization engineer. In addition to his responsibilities at Yamagata, he is a frequent speaker at the universities of Antwerp and Ghent.

He launched his own company, C-Jay International, in October 2020. With C-Jay he provides consulting services and technical support to enterprises as well as LSPs. Main fields of expertise are localization strategies, translation technology and machine translation.

Process:

After having trained the engine with 932K unique and approved translation units, we used it to translate a set of 500 random sentences coming from owner manuals that are available online. For each translation we determined whether or not it could be published without any human intervention. Perfect translations were categorized as “No post-editing”, translations that required one correction as “Light post-editing” and translations that needed more than one correction as “Heavy post-editing”.

We subsequently analyzed the different errors produced by the engine and categorized them as well. We ended up with three error categories: grammar, inconsistencies and semantics.

Related posts

October 14, 2021
The Arrival of Automatic Dubbing

The year 2021 marked the arrival of speech to speech translation in the commercial world. Scientists are working on making the underlying technology smoother and more accurate, engineers are integrating it into practical use cases. At the same time, there is an explosion in neural voices. Between July and September, three companies in this area […]

Read More
July 28, 2021
The Rise of Government NLP Programs – with Manuel Herranz, Pangeanic

Partner Spotlight: Pangeanic Smart governments are hiring data scientists to further automate what governments do for their  citizens. These data scientists work on creating data highways, so that the information that flows into systems is structured, and a thousand different applications can spring forth from it in the future. In the meanwhile, Manuel Herranz and his company […]

Read More
March 29, 2021
MT engine from Globalese gains 115% after training

Case Study Engines from IT giants such as Google Translate, Microsoft, and Yandex often win in quality because search engine companies possess the whole internet as their data pool. However, with very specialized content and excellent translation memory, this advantage is nullified. In this case study, the engine from a smaller MT vendor Globalese won […]

Read More
Subscribe to our newsletter