📹 Recordings for “ChatGPT in Localization” available on demand. Watch here.
This guide is useful to train your own Amazon translation model. For example, you can train a domain model, such as medical, legal, video games, financial reporting with your translation memory accumulated over the years. Alternatively, you can make an organization-specific model that knows all the product and people names, and follows your individual styles.
Unlike brands like Globalese, ModernMT and Systran, training Amazon model is a technically complex task that requires some developer skills and knowledge. But with our guide a technically-savvy project manager or a solution architect on a language team can take on the ML operations.
If you’ve used the Google Cloud Platform before, each part of the training will take:
TIP: You need to fill in the company details, such as billing address and phone number.
They might ask you to add your payment method as well, anyway we’ll explore how to add the payment by another way
TIME: 20+ min
What is it: Amazon Web Services, Inc. is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis.
Choose the “Payment preferences” from the menu on the left and click the button “Add payment method”
TIME: 10+ min
1. To start the training process, go to main menu or drop-down menu and choose the S3
What is it: Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network
2. Go to Buckets and create a new one
3. On creating the section, fill up the name and the region, all the rest can be the default version.
TIP: Choose carefully the region. It should be exact or close to the region that you will work in.
4. After creating the bucket, go into it and create three folders. The names can be different.
1st – for inputs (for files to translate)
2nd – for outputs (translated files)
3rd – for training data (TMX files, etc.)
TIP: For the training we are going to use only the 3rd one folder which contains the training data. All the rest folders needed for translation.
5. Go to 3rd Folder and upload the TMX file for the training
TIME: Usually takes a couple of mins. Depends on your internet speed and size of the file.
6. Go back to the menu and choose the Amazon Translate
7. Go to Parallel data and create a new one
What is it: Amazon Translate is a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. You can use Amazon Translate to translate unstructured text documents or to build applications that work in multiple languages.
TIP: If you already had Parallel data and you can not find those, check the current region you use and switch it to needed one.
8. On the creating page, fill the name, the data set from the 3rd folder in S3 and choose the format of the file. In our case it’ll be in the TMX format. Click the button Create parallel data.
TIME: It could take up to 30 hours, depend on size of your training data
What is it: Parallel data consists of examples that show how you want segments of text to be translated. It includes a collection of textual examples in a source language, and for each example, it contains the desired translation output in one or more target languages.
9. That’s the process of the training. Check the status – if it’s an Active – you can use it for translations.
TIP: Be aware what region you use
Create a new job
TIP: The translator translates longer than other providers, better to start from it first to save time.
File format, for example TXT
And the last, Output S3 location – that’s the 2nd folder for translated files
TIP: When you choose the locations – click on the directory where the files are located. Not on files itself.
If you want to make a stock translation – don’t fill in any parallel data section.
Otherwise, if you need a trained model – use the parallel data that you created
Role name – put a “Translator”
TIP: The translation will take approx. 20+ min depending on the size of the file.