The “ChatGPT in Localization” event, held in March 2023, was an unprecedented success. In just three months, ChatGPT brought awareness about large language models to over 100 million people, revolutionizing communication in the process. This event focused solely on new technology in translation and localization, and it exceeded all expectations. Over 2560 people registered for the event, and 1767 viewed it live, with an additional 9000+ watching the recording later.
The event featured panel discussions on three key topics: Vendor, Research, and Buy-side. Each discussion provided valuable insights into the current state of the industry and highlighted the ways in which new technology is shaping its future.
We summarized the main ideas in the article below.
Vendor Panel: ChatGPT in Language Services
Vendor Side
Olga Beregovaya, VP, AI and Machine Translation at Smartling
Olga has led the AI team that implemented language models in real-life scenarios in multiple Fortune-500 brands. Smartling, Olga’s new post, has implemented AI in authoring already in 2017, and launched a transcreation tool in 2022.
- The opportunity for the vendor side is to become end-to-end content partners, to expand into content, not just translation partners. I think that’s a huge business model changer, and it also impacts pricing: you don’t charge per word as a constant partner.
- A new job type has been created – a content engineer.
- We all probably would agree that ChatGPT is not quite there yet for machine translation. Instead, focus on the use cases around it: source content pre-optimization, automated post-editing, improving the tone of voice, correcting grammar, and evaluating language quality. In the enterprise, we invest a ton of money into language quality estimation and validation. This is an actual possibility to utilize large language models.
- Localization is probably the biggest playground for implementing large language models.
- For AI implementation professionals, it’s most important to capitalize on what’s available on the market, to benchmark and choose the best, similar to implementing machine translation.
- In compliance there is always a balance between driving innovation and holding on to your data. What would you like to do?
Perspective of smaller companies
Diego Cresceri, Founder of CreativeWords
- On the panel, I share the perspective of smaller companies. We’ve been testing and using ChatGPT in different scenarios, though not translation. If you think of the name of my company, Creative Words: it was my dream from the start to compete with marketing agencies in content creation, as well as translation. We’ve been doing that for a long time already for product descriptions, marketing emails, and campaigns.
- We carried out an experiment for a customer, a shoe company, for Italian and for English, mixing ChatGPT and a copywriter. It went great and generated product descriptions with keywords. We didn’t tell the client yet, but we will soon: the level of fluidity and creativity goes far beyond non-native English copyrighting. For Italian, I don’t like the performance very much for the moment, but it is already a big time saver for us away even without spending a lot of effort on implementation.
- There are classic concerns around ChatGPT about privacy, and putting client data on the Cloud. But that’s for content that will be published the next day anyway. Are we not going too much about privacy?
Speaking from Experience in TMS company
Frederik Pedersen, Co-founder of EasyTranslate
EasyTranslate, an LSP in Denmark, has existed for 30 years, and in the past 3 years, it has changed the business model, going into a hybrid of translation management software and a freelancer marketplace. They don’t charge for translations anymore, only for the software and payments to freelancers.
EasyTranslate already integrated GPT-3 into their translation management system, allowing the customers to generate e-commerce product descriptions inside the system, and then send them to translation in the same environment. The content creation interface uses a collection of predefined prompts and provides length controls, the ability to build templates, create content for products in a specific category, and to include keywords for SEO optimization. This functionality is being beta-tested by a group of early adopters.
- EasyTranslate is working on model fine-tuning functionality so that the content generated actually is getting more and more refined towards specific audience groups.
- The big opportunity here is creating multilingual content for your local markets from scratch. This year will be a race to find the best secret ingredients to generate the most comprehensive content.
- The next step is ChatGPT in other content tools that are connected to translation. For example, if you are prototyping in Figma, you can use ChatGPT to generate UI content directly there, and then use MT for instant cost-effective translation.
Benefits for a Language Service Company
Manuel Herranz, Founder of Pangeanic
Pangeanic, a language services and AI company specializing in machine translation and data anonymization
- I think the key takeaway is that there is not one technology that wins, but a combination of ChatGPT with existing technologies.
- The game has changed in the way we search, I like Marco Trombetti’s point about traffic being diverted from search engines into private dialogues with a large language model. You know I don’t want OpenAI to steamroll my websites. But on second thought, maybe I want because I want to influence the way AI is built.
- ChatGPT risks: 1) the risk of privacy and uploading your content to the cloud that can train AI on it; 2) the risk of plagiarism and copyright infringement, when you’re generating content based on somebody else’s work; 3) the risk of creativity, resulting in different companies writing identical or similar content.
- Impact on translators. I don’t think the freelance work for linguists will disappear, but it is going to change. Translators will go through an adaptation. Is there a role, for example, for a prompt engineer, someone who can ask the right questions from the software – this remains to be seen.
Research Panel: Pathways to a
European Answer to ChatGPT
“Foundational models are the new AutoBahns.To leave that power to a few private companies? No way!”
Jochen Hummel, CEO of Coreon
CEO of Coreon, creator of Trados, former chairman of LT Innovate, EU expert on Language Technology
“Foundational models are the new AutoBahns.To leave that power to a few private companies? No way!”
Current Situation
- I think what we are facing is nothing less than an industrial revolution. An industrial revolution requires very serious industrial policies. In the coal and steel revolution, governments fell and wars were waged.
- We have allowed web search to be monopolized by a few private companies. Google, for example, knows exactly what the world is looking for, which is giving this company an incredible amount of power. Now imagine the same would happen with ChatGPT or similar APIs. You interact with the AI, and you are not only searching for some keywords, but you are also talking to the AI. As you engage in a dialogue, you provide way more information about yourself, the topics you’re interested in, and your needs. To leave the power of conversational AI to a few private companies? I say “no way”.
- Foundational models are the new AutoBahns. Public roads and infrastructure are built into certain areas where private companies can’t get a quick economic benefit, to connect all communities into the national network. Today’s models are primarily trained with US content, and the responses are culturally and ethically US-centric. In reality, ethics are different from nation to nation, and smaller languages are at a disadvantage.
Perspectives
- For a European initiative, there is not much hope. I used to be the Chairman of the LT Innovate association, and I’ve lobbied a lot for language technology policies with leading commissioners, but to my shame, with little result. They always give Sunday speeches saying data is the new oil, and but they don’t say that this oil is only drilled for in English. There is a Danish version of the House of Cards TV series called Borg, and a political opponent is humiliated by being sent to Brussels as Commissioner for Multilingual. It’s that bad!
- But national initiatives at coming up. In January there was in Berlin a meeting of an organization called LEAM short for Large European AI Models. There, they launched a feasibility study of how a large AI model for Germany could look like, trying to get support from the Government to invest serious money into computing infrastructure.
- Define our goal: should Europe have an OpenAI “Me-Too” product? My take: there needs to be a serious investment in infrastructure, in billions of euros, and then a fund that enables startup companies to raise money and build applications on top of this infrastructure.
“Europe needs to offer competitive alternatives in models, and provide an ecosystem – a safe harbor for data so that it does not have to go to the US Clouds”.
Ariane Nabeth-Halber, AI director at Via Dialog
European Commission expert on language technology, AI director at Via Dialog, and one of the most influential women in European speech tech.
“Europe needs to offer competitive alternatives in models, and provide an ecosystem – a safe harbor for data so that it does not have to go to the US Clouds”.
Current Situation
- To have some part in the geopolitical AI game, one has to take a position right now. Only a few players have resources to train foundation models: Google, Meta/Facebook, Microsoft which is backing OpenAI, and now Amazon with Hugging Face.
- Another ongoing “AI war” is how to deploy those models. Startups and large industry companies that start applying foundational models want to fine-tune them, either with transfer learning, or prompt engineering. The price to pay is that you share your data with the cloud provider. Large European banks, insurance, and telecoms that have sensitive and confidential data prefer to host everything in a private environment.
- There are just a handful of platforms that are equipped with this ability. Hugging Face is now becoming very important in that area. OVH is a French cloud infrastructure provider that has also been working intensively on deployment, but a large part of their AI team has been hired by Hugging Face. And while Hugging Face has been funded by the French government for 10 million euros to train the Bloom model on the Jean Zay supercomputer, it is really a United States-based startup.
Suggestions
- We need alternatives. One kind of alternative is a research/government-funded initiative like Gema’s HPLT and Nicolas’s OpenGPT-X. Another alternative is smaller models coming from expert industry players like my company.
- The next ongoing development is shared standards. Microsoft and Nvidia are collaborating on web frameworks, on model compression, on how to run live or scheduled inference on them. If we want to keep some AI sovereignty in Europe, we need to master these frameworks and offer a compatible safe harbor, where the data can reside, and is not forced to go to a US Cloud.
- Europe needs competitive alternatives from different countries: a large language model in Germany, another one in France, etc. And then we want to have a framework, a business ecosystem, with lots of active companies that build around these alternative models and promote them.
“European high-performance computing centers must be adapted for language”
Gema Ramirez-Sanchez, CEO Prompsit
CEO Prompsit, EU Expert, computational linguist. Promsit is a spin-off from a research group of the University of Alicante in Spain.
“European high-performance computing centers must be adapted for language”
- Prompsit designed the High-Performance Language Technologies project and is an active member of the consortium that runs it. HPLT has been funded by the EU to convert 12 petabytes of the Internet Archive into 100 open-source datasets useful for language model training. HPLT makes data and models available for research, and implementation by NLP teams. The EU funding for the project covers the next 3 years.
- Industry help is welcome: language industry agents can contribute to model evaluation, governance, application to client needs, and cultural adaptation. European administrations may provide help by unlocking local repositories: libraries, TV and radio archives, digitized museums, etc cetera.
- Europe’s existing high-performance computing centers are not ready for natural language processing tasks and need to be adapted. Let’s make these centers hubs of foundational language models.
- Challenge: not enough quality data in European languages for R&D groups. For example, the University of Turku in Finland just released the Finnish large foundational model, but the text corpora available for Finnish were not sufficient for all their needs.
- Democratization of large models. With distillation, they can be made smaller and more sustainable to avoid costs and electricity requirements exploding every week. Build smaller and more manageable, transparent, and reproducible.
“Commercial computing centers should now host OpenGPT-X models”
Nicolas Flores-Herr, Project lead at OpenGPT-X
Team lead OpenGPT-X project and Thema Conversational AI at Fraunhofer Institute for Intelligent Analysis and Information Systems
“Commercial computing centers should now host OpenGPT-X models”
- OpenGPT-X is publicly funded by the German Ministry of Economics, and it the models are currently employed in 3 domains: mobility, where we collaborate with BMW, insurance, and public broadcasters. The model performs various NLP tasks, primarily conversational AI, document analytics, and controlled language.
- Challenges: as we go forward, we cannot run the models from our University service anymore, we would like to launch partnerships with commercial computing centers, to host these models and provide them to a broader audience.
- We would like to invest in use cases and applicability for business scenarios.
- Fact-checking the data is one of the most challenging things we are working on. And it’s also a very scientific problem so a model not only relies on a sequence of words but also learns facts and can separate the truth from hallucinations.
- Challenges: better filtering of crawled data. Some model qualities emerge only from a certain size onwards.
- Europe needs more end-to-end environments for language models. LEAM organization could be one of the drivers of the European ChatGPT and maybe those 5-7 similar associations, could be good for Europe, maybe with a specific focus on domains, such as medical or automotive. Medium-sized organizations bring research and application as close to each other as possible.
Buy-side Panel: Applying ChatGPT in Localization
Technical Writing using AI
Jose Palomares, Localization Director at Coupa
Use cases: Coupa has been using ChatGPT for technical writing, with mixed results. Technical writers disagree on how useful it is. Everyones agreed that it’s good for simplification, summarization, and spelling correction.
It can be used to write responses to RFPs but there are privacy concerns and everything needs to be double-checked. Personally I found it helpful for long-form content and ideation, as it helped me overcome the writer’s block.
A localization manager can evangelize ChatGPT, write guidelines on what to expect, in where to use it and not to. Be the first to write about privacy and accuracy concerns. Use it as a chance to raise visibility for your team and be perceived as an expert.
Using for Translation/Localization: I tried it myself with a couple of languages that I can speak and there were very basic mistakes that the English counterpart doesn’t make. So I would say, don’t use ChatGPT for Translation, it’s not ready yet. Wait for the experts to tell us that this is working great and then do it. But before that I will stick to the things that are already working.
Be careful not to avoid demonizing ChatGPT the same way we did with machine translation for so long. Someone sharing “an epic fail by ChatGpt” can potentially throw us back from trusting a technology that can be potentially great for some use cases.
GPT-3 for Product Descriptions
Bea Verdasco, Head of Localization at Trendyol
eCommerce use-case: Our company has a potential use case in creating product descriptions, particularly in specific language pairs that machine translation addresses poorly: English-Turkish, German – Turkish. We tested GPT-3 and found results good, but compliance and factual accuracy remain a concern. Our company sells different products, including products for children. And what you say about the product needs to be true, precise and exact. We hope to test the new and improved version in the future.
Legal summarization use case: I partnered with our Data Protection Officer who found summarization functionality useful for legal environments, but was concerned about data protection with free version of the technology.
Localization team leads the way: We’re not the only ones who are testing ChatGPT at Trendyol. The Data office uses it, the SEO team uses it. As early adopters, we are partnering with other interested teams. We compile information, look at the risks and come up with the plan to introduce some guidelines about using these tools.
Future application in source content quality: We can make source content top-notch and obtain great output from translation memory and customized machine translation with less resources.
Using for Translation/Localization: I agree with Jose that results haven’t been tested. But if you have a machine learning scientist in your team who is keen on fine tuning GPT for a specific content type, you have try it and see how it works for you. I would love to do it myself, I am just a tiny bit more adventurous than Jose I guess.
Other Applications for ChatGPT
Anna Golubeva, Localization Manager at IKEA
Implementing process in a big enterprise: I’m thinking of different areas around localization and translation, not direct. For example, we have a lot of people who are subject matter experts but not writers. And using ChatGPT can help reduce the number of steps from receiving a query and responding to it. With ChatGPT we might not need that many writers as ChatGPT will help experts to compile the answers.
Ikea being a big organization hasn’t started to use ChatGPT institutionally yet. But our team has started to create awareness of what ChatGPT is. We explain how you should and maybe should not use it for business purposes. And we address data privacy and the ethical part. Innovation and Development team will be key in helping us map use cases and digital ethics.
Comments are closed.