Custom.MT
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in

Search

Subtitling with Trained Machine Translation: the 4 Next Steps
  • Blog post
  • Case studies

Author: Konstantin Dranch

Subtitlers are looking to adopt customized machine translation, but video localization systems are not ready for it yet. Here is what they could do to expedite up the process.

Since Google introduced AutoML translation and Microsoft created Custom Translator, text translation management systems (TMS) have developed excellent support for trainable machine translation.

Translation tools such as Trados, memoQ, Memsource, Smartling, Crowdin, Smartcat, and Wordbee all have multiple integrations with machine translation (MT), including MT that can be trained. The TMS also developed tools to measure human effort editing machine output.

The ability to train and use the trained models spawned an ecosystem of professional MT implementation specialists, tools, datasets & model marketplaces, and best practices.

A screenshot displaying a list of MT brand available in Crowdin TMS. The list contains:
    Google Translate
    Google AutoML Translation
    Microsoft Translator
    Yandex.Translate
    DeepL Translator
    Watson (IBM) Language Translator
    Amazon Translate
    Crowdin Translate
Example: Crowdin TMS offers 8 different MT brands and is working on more

In contrast, in the world of subtitling, the support for customizable engines does not exist yet. Popular captioning and subtitling tools such as Ooona, Limecraft, Syncwords, Dotsub, Yella Umbrella, VoiceQ still don’t offer the ability to use trained models.

SubtitleNext was one of the first tools for video localization to introduce custom machine translation models via Microsoft Custom Translator. Most others integrate with Apptek’s MT which can be trained but does not yet offer a self-service training console to support trainers.

I hope for this to change in the next 3-6 months.

What Subtitling Tools Need to Do

  1. Integrate more MT brands and specifically customizable MT systems

Ooona, Limecraft, Syncwords, SubtitleNext, DotSub, Yella Umbrella, VoiceQ, and others should add integrations with customizable machine translation, at least with popular brands.

  • Big IT: Google, Microsoft, Amazon ACT, Yandex
  • Dedicated EU vendors: ModernMT, Globalese, Kantan, Systran, PangeaMT
  • Dedicated Asian vendors: Niutrans, Cloud Translation, Mirai and Rozetta, Tencent

There are more than 100 machine translation providers on the market, having integrations with at least 7-10 of them either directly or via middleware would be a crucial first step for your video localization product.

2. Support for full-text subtitle translation instead of per-line

Translating subtitles is different from translating normal text due to line breaks. Subtitles have a limitation of two lines max and a certain number of characters, for example, 37 characters with the BBC.

Sending 37 characters to MT without context results in poor quality. It’s like translating dialogue without knowing what the dialogue is about. To fix it, professional subtitling tools instead send the full text, then re-segment the translation again and put it in the right place in the video.

3. Effort analysis

As AI gets better, it reduces the amount of manual work. Captioning a video takes 5-10 hours per hour of source manually, and only 2-4 hours with the help of speech recognition and automatic time coding. The same with translating subtitles: machine translation helps to the extent of its accuracy.

To measure the impact of speech recognition and machine translation, subtitling tools need to add effort measuring functionality with the following metrics:

  • Edit distance in characters, words, and lines
  • Time tracking functionality

This will allow subtitling specialists to decide which ASR and MT model is better for them, and also to quantify savings from using them. Once the industry can establish the correlation between AI performance and the price per minute of video, there will be a clear financial incentive to build better AI.

An example of a comparison table made by Post-edit compare plugin for RWS Trados.

The table provides a breakdown of post-edit modifications applied to a collection of segments after an initial translation. It shows that the largest portion of the segments, 277 out of 366, did not require any modifications, representing 65.33% of the total. These segments were perfectly accurate from the start.

A smaller number of segments needed some level of editing. Segments with 95% to 99% accuracy amounted to 6 segments or 5.62% of the total. Those with 85% to 94% accuracy comprised 24 segments, making up 11.81%, while segments with 75% to 84% accuracy also numbered 24, accounting for 8.39% of the total. Segments with 50% to 74% accuracy were 24 in number as well, representing 7.04% of the total.

There were 11 segments that were entirely new and required full editing, which constituted 1.81% of the total segments.

In summary, the table categorizes 366 segments into different bands based on the percentage of accuracy after initial translation or processing, with a total word count of 1549 and a total character count of 9675. The data indicates that while a significant majority of the segments were accurate without any need for further editing, there was still a range of segments that required varying degrees of post-edit modifications.
Post-edit compare plugin for RWS Trados measures modifications

Subtitling software can simply copy the functionalities of text translation systems. Or, invest resources into doing something more glamourous. Perhaps, create a neural network that will tell the professional users automatically how much they saved with AI.

4. MT Training controls from within subtitling tools

Microsoft and some other MT brands already offer APIs for model training. Training then can happen without logging into the MT console. It could be a button in the subtitling tool.

With some coding shenanigans, subtitling software providers can allow their users to train MT without learning the consoles for training, configuring tokens, and exchanging JSON scripts. This would make training available to everyone, from engineers to linguists.

Subtitling studios and subtitles will be able to create models for each domain they specialize in and have enough training data. News, Football, or Star Wars Universe could all be examples of domain models trained and managed from within a subtitling tool.

Conclusion

There is more need than ever for subtitling with the TV series explosion on streaming platforms, the increase in scripted television programs, eLearning, and the general rise of video as the means of communication in business.

To get more videos subtitled professionally, the service needs to be more affordable on a per-minute basis. One way to achieve it is to add better support for AI in the tools. And this is something very practical to accomplish for subtitling software developers.

A portrait of Custom.MT Co-Founder Konstantin Dranch

Konstantin Dranch is the Co-Founder of Custom.MT. He is a localization expert with a background in journalism, market research and technology and has worked in the localization industry for several years.

machine translation trainers toolssubtitling
Konstantin Dranch
Konstantin Dranch
Language Industry Researcher | Founder Custom.MT learn something new every week, create transparency in specialized markets

Comments are closed.

Stay in the loop
Subscribe to receive the latest industry news, updates on MT & LLM events, and product information

Categories

  • Blog post
  • Case studies
  • Guides
  • Infographics
  • Interview
  • Press Release
  • Related Posts
  • Uncategorized
  • Webinars

Webinars

  • AI Prompt Engineering for Localization – 2024 Techniques
  • AI Prompt Engineering for Localization
  • Managing Machine translation in LSPs in 2023
  • Natural Language Processing for Business Localization (Webinar)
  • Let’s Machine Translate Our Website!
  • hello@custom.mt
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in