Translation Memory (TMX) Cleaning

Custom.MT provides a fully automated scanner and a human-in-the-loop service to clean translation memories.

Why maintain language data 

As translation memories accumulate over years they deteriorate in quality, they get clogged with human errors and mistranslations. Because translation tools reuse translation memory, mistakes get repeated again and again. When used for training, datasets with repeated mistakes “teach” the machine translation model to use erroneous translations in lieu of correct ones.

Even accurate translation can become a problem when it is figurative, varied, or inconsistent. Furthermore, as language evolves, old translations become obsolete, and must be purged from the database.

We recommend translation and localization teams to run automated translation memory cleaning at least every 1-2 years to maintain an acceptable level of usability.

Benefits for localization teams

Increase translation memory reuse in TMS by 2-6%
By removing false positives and segments with mistakes, you boost savings from translation memory.


Improve the effect of machine translation model training
Better TMX leads to better quality of machine translation when you train your custom models.


Improve translator experience
Cleaner TMX means fewer confusing options, and a faster search/concordance function.

Benefits for language services providers

 Win new clients, even if they already have a translation partner
Armed with our tools, your language data services teams can impress clients and open doors to localization teams and start developing a relationship with them.


Optimize internally
Increase translation memory reuse and achieve incremental improvements on margins.


How cleaning works

Custom.MT runs an automated scanner using a combination RegEX rules and neural network technology. The scanner flags segments that need to be removed or repaired. A linguist can review the list and correct the segments, in training operations, all suspicious material can be removed to save time. A typical scan & remove operation purges between 20 and 45% of the TMX database. Our checks are customizable and can be tailored to specific requirements.