Custom.MT
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in

Search

GPT-4 Turbo and Vision in Localization
  • Blog post

Table of Contents

Toggle
  • Translating Image PDFs
  • Automated Localization Testing for Screenshots
  • GPT-4 Turbo at the price of NMT 
  • Automated Audio Description 
  • Afterword – Will Anything in AI Stay Defensible?

On Monday, November the 6th, OpenAI upveiled the new improved models. Let’s explore the main features of GPT-4 Turbo and its possible implementations in the localization field.

  • GPT-4 Turbo with a 3x lower price and 16x larger context window
  • GPT-4 Vision availability in API
  • Synthetic voice, speech, and vision in prompts
  • Ability to create agents
  • Tools to get repeatable generation
  • And more

The release is a gauntlet of challenge to a broad range of AI companies. Not just LLM builders like Antropic and Cohere, but also speech recognition, synthetic voice, and machine translation companies. OpenAI positioned itself as a future one-stop shop for models. 

While getting image, voice, speech, and text AI processing from a single source and having all data handled by one organization is a worrying prospect, localization stakeholders stand to benefit from the intensifying competition in GenAI. Here is how:

Translating Image PDFs

With vision and text capabilities from a single source, it is possible to tackle a perennial problem in translation – image PDFs, including document scans, phone photos of invoices and receipts. Moreover, the model can extract information from the receipts and understand it.

For the test, our developer Martin developed a script that was able to take a receipt from lunch in a Czech restaurant, extract all information from it, and translate it into English. It took less than 20 minutes.

Yum!

A video of coding a python script that was able to take a receipt from lunch in a Czech restaurant, extract all information from it, and translate it into English.

Automated Localization Testing for Screenshots

Localization testing application of user interfaces in multiple languages is the next problem localization PMs may attempt to automate with GPT-4 Vision. Today, program managers responsible for UI translation take screenshots manually or automatically and then pass them to linguists to verify if the text fits the boxes, and if it reads right in the target language.

UI localization testing is labor-intensive, but it is paramount to getting a smooth user experience across multiple locales. With a vision/text model operated via a prompt, localization managers will be able to automate the testing and highlight potential issues.

We’ve run an experiment with the GPT-4 vision in preview by fabricating bugs in our own login screen (screenshot below). The model identified them correctly:

1. Text Overflow: The button labeled “Přihlásit se do tohoto krásného nástroje” has a text overflow issue where the text is too long for the button size and breaks out of the button boundaries. This could lead to problems with readability and may affect user interaction.

2. Text Consistency: The language used in the button mentioned above is Czech (“Přihlásit se” means “Log in” in Czech), which is inconsistent with the rest of the interface that appears to be in English (e.g., “Email”, “Password”, “Not registered? Create an account now”). This inconsistency could confuse users.

3. Incomplete Localization: If this screen is intended for Czech users, all visible text should be localized to Czech. Key phrases such as “Authorization,” “Email,” “Password,” “I forgot my password,” and “Not registered? Create an account now” should be translated to maintain consistency and ensure user comprehension.

Authorization dialogue box in Custom.MT Console

GPT-4 Turbo at the price of NMT 

GPT-4 Turbo prices are 3x lower than GPT-4. 

In the past months, many localization stakeholders in our network experimented with GPT-3.5 Tubro and GPT-4 models. While they found GPT-4 superior in quality, they couldn’t often use it with glossaries and detailed style guide instructions due to a high cost. Long prompts applied on a per-segment basis in CAT tools and TMS caused expenses to soar up to $0.15 per segment, which is close to manual translation costs. Other issues with the model were speed and reliability. Managers of large-scale localization programs reverted back to classic machine translation after experimenting with OpenAI. Only the most persistent and optimistic ones carried on despite setbacks.

With a move to reduce price, it feels like the setbacks are temporary, and that GPT may replace NMT in the observable future. The price of GPT-4 Turbo applied with a glossary and a style guide in the prompt should be on par with stock machine translation models like DeepL, Google Translate, and Microsoft Translator.

It is becoming easier and in some cases more viable to customize translation for style guide compliance with a prompt, than to fine-tune models such as Google Translate AutoML, or Microsoft Custom Translator.

ServiceUSD per million characters
GPT-4 Turbo Document translation, no CAT/TMS$3
DeepL$27
GPT-4 TurboSegment translation in a CAT/TMS$30
Google Translate AutoMLFine-tuned $80
GPT-4Document translation, CAT/TMS short prompts$60-120
GPT-4 maxed-out glossary applied to tiny segments$22,350

NB! Speed with GPT-4 remains an issue, especially at scale. Until OpenAI addresses speed, we expect the utility of the model in large-scale scenarios to remain limited.

Automated Audio Description 

Less than 24 hours after GPT-4 Vision & Voice, a developer built an automated sports commentator by passing every frame of a football video via a simple prompt to generate a narration.

A demo of AI video narration has already been showcased on the OpenAI cookbook. As a proof of concept, it clearly shows that videos can have audio descriptions fully automatically at a low cost. 

The ease of building video narration and audio description may be deceptive – it will actually take months to get the technology working reliably. We believe that audio descriptions for prime-time shows and theatrical releases of Hollywood content will remain manual, going at the average rate of $27 per minute. The rest of the market might go for an embedded AD at a fraction of the cost.

Afterword – Will Anything in AI Stay Defensible?

OpenAI cast a wide net across different niches and boxes of the growing AI industry. Previously seen as a chatbot builder, it now will be perceived as a threat to thousands of startups, including the full extent of language technology providers.

  • Automated video dubbing companies
  • Machine translation providers 
  • Speech recognition providers 
  • Text & video analytics
  • AI quality assurance

In the past, any language tech startup would hear “How are you better than Google Translate?” In 2024, the question is rapidly turning into “Wouldn’t OpenAI do all that in 1-2 minutes?” Entrepreneurs and buy-side localization program managers, prepare to show how specialized systems built for localization are the only way to completely address business needs. It’s a far cry from a minute-made prototype to a working solution.

PS: at the time of writing this, OpenAI’s API went down in a major outage. The doom of AI companies was averted – for today, at least 🙂

machine translationOpenAI
Konstantin Dranch
Konstantin Dranch
Language Industry Researcher | Founder Custom.MT learn something new every week, create transparency in specialized markets

Comments are closed.

Stay in the loop
Subscribe to receive the latest industry news, updates on MT & LLM events, and product information

Categories

  • Blog post
  • Case studies
  • Guides
  • Infographics
  • Interview
  • Press Release
  • Related Posts
  • Uncategorized
  • Webinars

Webinars

  • AI Prompt Engineering for Localization – 2024 Techniques
  • AI Prompt Engineering for Localization
  • Managing Machine translation in LSPs in 2023
  • Natural Language Processing for Business Localization (Webinar)
  • Let’s Machine Translate Our Website!
  • hello@custom.mt
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in
  • Home
    • For Localization Teams
    • For LSP
    • For Product Managers
    • For Translators
  • Services
    • Machine Translation Model Fine-Tuning
    • Machine Translation Evaluation
    • On-Premise Machine Translation
    • Translation Memory (TMX) Cleaning
    • Language dataset acquisition
    • Workshops – Train Your Team in Language AI
  • Products
    • AI Translation Platform
    • Custom Translation Portals
    • For Trados
    • For Smartling
    • For memoQ
    • Shopware Translation Plugin
    • API
    • Documentation
  • Resources
    • Blog
    • Case Studies
    • Events and Webinars
      • GenAI in Localization
    • MT Leaders
  • About Us
    • About Us
    • Terms and Conditions
    • Privacy Policy
  • Book a Call
  • Sign in