data:image/s3,"s3://crabby-images/8ff86/8ff861243e0f36b5769514ab139a7f25e8733b53" alt=""
Close to 200 people attended the Custom.MT workshop on AI prompt engineering in localization on April 18-19. The event spanned 7 hours and covered a wide array of material. Participants created 535 prompts and executed generation tasks a total of 38,947 times. The models used in the exercises included OpenAI GPT-4 Turbo, Anthropic Claude 3, and Mistral 7B, among others.
In 2024, one year into the GPT boom, language teams strive to implement generative AI better and more effectively. Prompts are becoming increasingly elaborate, with engineers and project managers employing various techniques to reduce hallucinations, enhance accuracy, and lower costs.
Here is a quick recap of the key points.
1. Live examples of GenAI industrialization by language teams
More than 40% of the participants indicated that generative AI helps their organizations’ localization workflows. Five presenters provided an overview of their work.
– Terminology extraction for translation (Marta Castello, Creative Words)
– Generating product descriptions in multiple languages for eCommerce (Lionel Rowe, Clearly Local)
– Translation + RAG glossaries (Silvio Picinini, eBay)
– Copy generation and GenAI hub in a gaming company (Bartlomiej Piatkiewicz, Ten Square Games)
– Video voiceover and summarization of descriptions in eLearning (Mirko Plitt, WHO Academy)
2. Recommended prompt structure
data:image/s3,"s3://crabby-images/3d987/3d9878bba16023285be020388b036c4f9b03155d" alt="The picture is an example of how to structure your prompt. The picture is an example of how to structure your prompt.
A good prompt consists of sections. Each section has a name surrounded by ### before and ### after the name. The name is followed by a short and clear instruction.
At the picture you can see typical sections and examples of the their content:
###Role###
Start by defining the role or identity that the model should assume. This helps set the context for the task. For example, "You're a professional translator."
###Instructions###
Clearly state what you want to be done. In this case, the task is to "analyze the Website below and then extract terminology from it."
###Output Format###
Specify how you want the information to be presented. Here, the following is requested:
' - First, categorize the results.
- Then, for each category, return a numbered list.'
###Examples###
Provide examples to illustrate what you're looking for. This prompt includes examples like "1. Greenhouse gas" and "2. Emission inventory" each in a separate line.
###Website###
Indicate the source material to be used for the task. In this prompt, a placeholder """({{website_page}})""" is used to represent where the website link or content should be placed.
In the end you should emphasize the importance of the task with an emotional marker. In this prompt, the example is:
'Now, please extract the terminology, it's very important for us:'"
The first practical task involved building a terminology extraction prompt using best practices:
– Separating prompts into ### Sections ### to make them modular
– Variables – to be able to use the prompt again and again with new content
– Minimizing the instructions
– Using examples
– Adding emotional markers to improve the output potentially
– Using temperature 0 to have reproducible results
3. Spreadsheet integration for LQA
The second task was to integrate prompts with spreadsheets to unlock working with variables across multiple lines of content. Participants created their own language quality assurance bots by asking LLMs to detect and label translation errors. We used the public DEMETR dataset from the WMT competitions as task material.
data:image/s3,"s3://crabby-images/ee767/ee7671e0ec408804468b899799bbc4e95e5d68e3" alt="The screenshot features an example of LQA done automatically in Google Spreadsheets. The screenshot features an example of LQA done automatically in Google Spreadsheets. The table's contents ae as follows:
Source Text
French: Toutes les entrées des grottes, qui ont été baptisées « Les Sept sœurs », ont un diamètre d'au moins 100 à 250 mètres.
Czech: Proti Uberu v Brně protestovali taxikáři, poukazovali na to, že Uber nesplňuje podmínky jako například označení jako taxi nebo taxametr.
Japanese: 紀元前3世紀に建てられたピラミッドによって表現されたエジプト王、死んだファラオを讃えるために建てられた多くのピラミッドの一つとして「大ピラミッド」が挙げられています。
Language
French
Czech
Japanese
Translation
French: All the entrances to the caves, which have been dubbed "The Seven Sisters" have a diameter of at least 100 to 250 meters.
Czech: Taxi drivers protested against Uber in Brno, they pointed out that Uber does not meet requirements such as taxi vehicle marking or taximeter.
Japanese: Built by the Egyptians in the third century BCE, the Great Pyramid is one of many large pyramids built to honor dead Pharaoh.
Classification
Severity
French: Major error
Czech: Minor error
Japanese: Major error
Explanation
French: The translation error in the sentence "All the entrances to the caves, which have been dubbed "The Seven Sisters" have been having a diameter of at least 100 to 250 meters." involves a fluency issue. The phrase "have been having a diameter" is not natural or standard in English. The correct expression should simply be "have a diameter." This error disrupts the natural flow and grammatical structure of the sentence, making it sound awkward and incorrect in English.
Czech: The translation inaccurately renders "označení jako taxi" as "taxi vehicle mark," which should be more accurately translated as "marking as a taxi." Additionally, the phrase "missing taximeter" in the translation might mislead by implying the taximeter is lost, rather than not being present as required. The original phrase "nesplňuje podmínky jako například" is better translated as "does not meet conditions such as."
Japanese: The translation omits the phrase "紀元前3世紀に" which specifies that the construction happened in the third century BCE. Additionally, the translation uses "the Great Pyramid" to refer to "大ピラミッド", which while technically accurate, could be misleading as it might be interpreted as referring specifically to the Great Pyramid of Giza rather than a general pyramid."
4. Chain of Thought prompts
Giving an AI model too many instructions, for example, a 20-line localization style guide leads to many instructions being skipped. Instead, in this exercise the participants split the prompt into a chain of smaller prompts or provided instructions to the model to work on step by step.
In our spreadsheet exercise, the 1st prompt evaluated the severity of the error, and the 2nd prompt classified it by type based on the results of the previous generation.
5. Vision in multimodal models
In this task, participants worked with vision model GPT-4 Turbo to complete assignments with images:
– image translation
– getting text from scanned PDFs
– screenshot localization testing
– generating multilingual product descriptions from image
– identifying fonts
etc
data:image/s3,"s3://crabby-images/779b2/779b2881df707063b401b1c3f914cd3a788d13f1" alt="The screenshot features an example of UI translation The screenshot features an example of UI elements' translation done using a prompt and a website variable that uses a URL as an input."
Example: a participant translates the menu of a sample website from an image screenshot.
6. Retrieval-augmented generation (RAG)
Using retrieval, a large language model may generate output based on facts from a linked database. Such databases include company data, glossaries, style guides, and translation memories, and the answer can be based on facts contained within, instead of relying on LLM’s general memory.
In the exercise, we used retrieval for the following use cases:
– translate with glossaries
– check existing translation for terminology compliance
– create a “chat-with-your-website” bot
data:image/s3,"s3://crabby-images/55fbd/55fbd3a2e6368425db97d2ad4ea8a0f5558da63d" alt="RAG Glossary The screenshot displays an example of the database split done with RAG. The text is divided into chunks, and the search of the relevant chunks is based on semantic similarity.
The example percentages are:
Title: Wine Glossary
Saignée 82.4%
Cépage 81.4%
Vieille Vignes 80.7%
Blanc de Blancs 80.5%
Blanc de Noirs 80.5%"
Example: for retrieval, the database splits the text into chunks, and the search of the relevant chunks is based on semantic similarity, expressed in percentage in the screenshot above.
7. Agents
This module covered the ability of LLMs to operate external apps via the API. We explored translating with DeepL and proofreading with ChatGPT in a chat interface. While not immediately applicable to workflows, the overview of Agents showcases a potential future scenario where LLMs act as a user interface to other applications.
data:image/s3,"s3://crabby-images/e22f5/e22f5cc0f99d97398f0c24d32df0e6dbc8e40720" alt="A screenshot of GPT-4 Turbo calling DeepL Agent A screenshot of GPT-4 Turbo calling DeepL to translate a paragraph of text.
The dialog is as follows:
Request: '- Translate the following with DeepL into French then make it informal and colloquial.
Taxi drivers protested against Uber in Brno, pointing out that Uber does not meet the conditions, such as being labeled as a taxi or a taximeter.'
UI element: Translate via DeepL OK
Reply: '- Les chauffeurs de taxi à Brno ont manifesté contre Uber, en faisant remarquer que Uber, ça respecte pas les règles, genre avoir l'étiquette de taxi ou un compteur.'"
Example: GPT-4 Turbo calls DeepL to translate a paragraph of text.
8. Fine-tuning GPT-3.5 Turbo
By giving translation memory and glossaries to GPT-3.5 Turbo, it is possible to improve its output closer to GPT-4, while minimizing the cost below the cost of conventional machine translation. In this module of the workshop, we covered how fine-tuning is done, and its impact on quality and cost of LLM localization.
data:image/s3,"s3://crabby-images/3064e/3064eebc50d0f88e5186d420df604d63efe83581" alt="A screenshot of how fine-tuning in an OpenAI instance. A screenshot of how fine-tuning is done in an OpenAI instance."
Presenters: Dominic Wever (Promptitude) and Konstantin Dranch (Custom.MT).
The workshop recording is available at:
https://www.youtube.com/watch?v=MJNlhyStv14 – part 1.
https://www.youtube.com/watch?v=QPPRtquyvgQ – part 2.
Next workshop
The next installment is planned for Jun 18, 2024, before the TAUS Massively Multilingual Conference in Rome.
https://www.taus.net/events/massively-multilingual-conference-rome-2024
Comments are closed.