🌐”GenAI in Localization” by CustomMT at TAUS Conference 2023. ➡️ Learn more

📓 Take a short quiz and get your MT & Localization Roadmap  ➡️ click here

Machine Model Training

Customize output with your terminology and translation memory

Training and Customization Approaches

 

  1. Monobrand MT for the best cost
  2. Multibrand MT for best quality
  3. Prompt engineering for best flexibility
  4. Build models to have unlimited usage

 

benchmark-removebg-preview

Customization Methods

Monobrand Training

Customize one brand of machine translation, for example, Google AutoML or Microsoft Custom 

We will prepare your translation memory and terminology datasets and customize one selected brand of machine translation, typically, Google, Microsoft, ModernMT or Globalese. The advantages are a lower cost and ease of integration.

ant-design_field-time-outlined

2-3 days

Untitled design

Best cost 

Untitled design (1)

The deliverable are a model and its score before and after training. 

solar_money-bag-bold

€1,000 per language

Multibrand Customization

Customize 5 different brands with your data and compare performance after training.

This is the most popular approach to select the best models per language. The engineers and lexicographers prepare training and test datasets, and train a selection of machine translation brands. The client’s linguists then carry out an evaluation and pick the optimal model out of five. 

 

As a result of multibrand customization, the client may have machine translation models of different brands depending on the per language. 

ant-design_field-time-outlined

2-5 weeks

Untitled design

Best linguistic quality

Untitled design (1)

High improvement is achieved by comparing the best available models after training. Deliverables are two models per language (main and a backup), a report, and support for the model for one year. 

solar_money-bag-bold

€3,000 – 5,000 per language

Customize with a Prompt

Translate with ChatGPT or another large language model with specific instructions

Large language models with prompts offer an easy way to customize machine translation output for terminology, tone of voice, formality and “do not translate” lists. 

 

In this service, Custom.MT develops a prompt for translation with specific instructions, and iteratively improves it to achieve higher quality.

ant-design_field-time-outlined

3-5 days

Untitled design

Best flexibility

Untitled design (1)

Unlike machine translation models trained with a large datasets, LLM output may be improved by rewriting prompts in minutes. Iterations may be run daily, allowing for a very flexible approach in customizing MT.

solar_money-bag-bold

€1,000 per language

Training Open-Source

Build to own models

This is a service to build models from scratch using a machine translation network Marian NMT (or FairSeq on request). Custom.MT will prepare a large-scale training dataset relevant to your domain and combine it with the translation memory already available. The training takes place in Lambda or NVidia cloud.

 

The resulting models have unlimited usage, which is important when working with user-generated content at scale. Vendor teams may brand them and sell them as a product.

 

Models produced this way are 2.4 GB in size, and they may be distilled to 80 MB to run on CPUs, mobile phones and portable devices without an internet connection. 

ant-design_field-time-outlined

6-10 weeks

Untitled design

Unlimited usage, own models

Untitled design (1)

Building own MT models is important  in unlimited usage scenarios: ecommerce, public sector and user-generated content. 

 

Models will require additional engineering to support different file types, scale with demand, and cover additional domains\subject matter areas. 

solar_money-bag-bold

€20,000 per language