TMX Cleaner | Custom.MT

Multilingual TMX Cleaner

Fully automated solution for cleaning translation memories and performing terminology checks to get more accurate translations.

Better TMX leads to better machine translation quality when you train your custom models

WHY CLEAN YOUR TMX?

As translation memories accumulate over the years, they lose quality and get clogged with human errors and mistranslations. When used for training, datasets with repeated mistakes “teach” the machine translation model to use wrong translations instead of correct ones. Our solution automatically flags incorrect segments and removes or repairs them to give you clean data to work with.

25-45%

of incorrect segments are usually identified during a routine scan & remove check

30+

quality check filters

including neuron-based ones

EXPLORE THE BENEFITS

Multilingual

Upload translation memories containing multiple languages, and the software will automatically break them into language pairs.

Smart presets

Save your favorite filters to a preset and significantly speed up your cleaning process.

Neuron filters

Leverage AI-powered filters for text processing in many languages.

Fully customizable

Our filters are customizable and can be tailored to specific requirements.

More happy clients

Impress your clients and win new ones with a state-of-the-art automation tool.

Fully

automated

Custom.MT TMX Cleaner automatically flags segments that need to be removed or repaired. So your linguists can save hours of manual work.

Boost savings from translation memory with the power of AI

Remove false positives and segments with mistakes, and increase translation memory reuse in TMS by up to 6%. With our TMX Cleaner, you can do that in a fully automated way, using a combination of RegEX rules and neural network technology.

ADVANCED FILTERS

Empty source/target

Removes translation units with empty source or target.

Too long/short

Removes translation units with source or target containing more than 512 characters or less than 3 words.

Repeated words

Removes translation units containing consecutive repeated words in source or target.

Space noise

Removes translation units containing too many consecutive spaces in source or target.

Wrong language

Removes translation units whose source or target are not in the expected language.

Near-duplicates

Removes translation units whose source and target are duplicates or near duplicates of other TU.

Inconsistencies

Removes translation units containing translation inconsistencies in source or target.

Sensitive data

Anonymization. Removes translation units that contain data similar to personal data.

Alignment check

Alignment check by calculating mutual translation probability score based on parallel corpus data. Detects noisy sentence pairs in a parallel corpus.

Ready to get started?

Individual

Best for: one-off projects

$2 per one thousand segments

Small business

Best for: high volume, reccuring needs

$180/month

2000 segments included

$2 over the limit, per thousand segments

Large Volume

Best for: medium volume, occasional need

$800/month

30,000 segments included

$0,25 over the limit, per thousand segments

IMPROVE YOUR MT QUALITY

Get the most out of your machine translation