Authors: Will Pace, Konstantin Dranch and Silvia Schiavoni
Globalese is an affordable machine translation system tailored for trainers. Thanks to a great sandbox and a direct connection of users with the company’s founders, Globalese has become a very popular choice with the MT trainer community. Custom.MT presents the story of Globalese and the founders’ views on the market. Our Marketing Manager, Silvia Schiavoni, sat down with Globalese CEO Gábor Bessenyei and his Co-Founder, Greg Horváth to talk about everything MT.
In our recent case study, a trained Globalese engine overperformed the other engines by 10 BLEU scores and aced human evaluation against Google, Amazon, Microsoft and Yandex for Russian to English Technical/Aviation. This test prompted us to get to explore Globalese better.
Clean Dataset > Larger Dataset
What should global corporations, the likes of Nike or Volkswagen, look out for in machine translationin 2021?
Gábor: We are already in a situation where non-critical content can be translated to a very high level of quality by machine translation, basically for free. I guess the main fields companies like Nike and Volkswagen would use MT for would be e-commerce, knowledge bases and other forms of catalogs. These are some of the best applications for customized MT engines. But these engines can only be successful if you are feeding in the right training data.
The technology has matured to a state where data has the biggest impact on the output quality. For the user, the best way to leverage technology better is to invest as much as possible into data management, terminology consistency, and linguistically strong datasets.
Data must be relevant and qualified for the purpose. The biggest amount of training data in the world is not the solution. In many cases, the high volume only creates noise. Yes, you need some mass for your training, but it isn’t the most important aspect. It’s better to half your training data and have it all clean and qualified than double it with low-quality input.
Clean data, relevant data, consistent data: this is everything.
“It’s better to half your training data but have it all clean and qualified”.
David vs. Goliath: Competing vs Google Translate
Globalese is a small company approaching its first million dollars. Compared to Google and Microsoft, their ability to buy datasets or invest into feature development is more limited but it is also more focused. Globalese is growing every year.
How can a small company like Globalese survive and compete against the IT giants like?
Gábor: Yes, it’s not easy, the competition is huge. Google Translate, DeepL, Bing it’s all available either for free or for a very low price, and to be fair, they provide excellent results.
Greg: My first idea would be customisation, I’m pretty sure at this point that Google, Microsoft, or even Amazon do not double-down on customizable engines. Our advantage will always be in training an engine from scratch, so it only includes your data, and nothing else. You can train your own engine with Google, for instance, but that doesn’t mean training it from scratch - you are just tuning what is already there.
Gábor: Apart from that, the price-quality ratio offered is really strong with Globalese. On small volumes the difference in cost is negligible, but with millions of words, we have a big advantage. Finally, the option for on-premise systems for customers with confidentiality requirements. These are the USPs that set Globalese apart from competition. All while we maintain focus on customer serviceand direct engagement with users.
What inspired this approach?
Greg: My role models - and I think Gabor shares this with me - were always the team at memoQ. They are renowned for their excellent customer service. That was always the aim for me, always to be as direct and involved with users as you can, even smaller customers, whereas IT giants will never be able to do that and will probably never care enough to try.
Gábor: In our case, it is a question of mere days to implement new features on demand. This will always be one of the chief advantages for a smaller company like us.
“My role models - and I think Gabor shares this with me - were always the team at memoQ”.
How Globalese Started
Globalese began as an offshoot of MorphoLogic Localisation translation company back in 2012. Sharing a mutual passion for languages, founders Gábor and Greg both moved from Hungary - to Germany and Iraq respectively - at an early age as their families pursued work abroad. Gábor now shares his time between Hungary and Germany whilst Greg resides in Australia (with a brief spell in Sweden prior to settling there).
Gentlemen, how did it all begin?
Greg: I was eavesdropping. By education, I’m an American Studies major, which qualified me for absolutely nothing. After university I went on to teach English. So here I was on the suburban train going home having just quit my first job as an English Teacher, with no real idea what I should do next.
There’s this guy on the phone, talking possibly to a freelancer asking ‘can you take this job?’ Here is the subject, here is how many characters, and here is the deadline. At the end of the conversation, he spelled out his email address. So I thought, ‘I’ve got to give this a go!’, so I went home and sent this complete stranger an email.
That stranger turned out to be one István Lengyel, former CEO of memoQ and longtime friend and colleague of Gábor, who connected him and Greg around 2004.
Greg was brought into the fold at MorphoLogic Localisation, a translation agency. MorphoLogic was one of the first companies in Hungary for Natural Language Processing. If you open the credentials for Microsoft Office, you will still find MorphoLogic credited for the Spell Checker for Hungarian. The LSP provided a launchpad for the software company. A few years later, in 2011, a young university graduate joined the team and put together a statistical machine translation model for English to Hungarian.
Greg: At the time, I was just a very enthusiastic amateur web programmer and I said to Gábor, ‘perhaps I can build something around this that makes it accessible in a web browser’. Just like that, Globalese was born.
“I was eavesdropping. Then I thought, ‘I’ve got to give this a go!’, so I went home and sent this complete stranger an email”.
From an LSP to a software company..
Gábor: We always had friends and colleagues in different translation companies and this is why we designed Globalese with their needs in mind at the beginning. Because we have LSP roots, we know what is required for professional translation.
After we went to a few trade shows, it became clear that there would be strong interest from the buy-side. One of the first clients was a multinational enterprise software company Infor.
What the Future Holds
For machine translation, the important change brought about by COVID-19 is the boom of eCommerce. The share of eCommerce versus brick and mortar shops has risen sharply because traditional shops are closed. Add the fact that 60% shop exclusively in their native language only, and you start to get the picture. Hundreds of thousands of product descriptions need to be translated.
Alongside e-commerce, what sectors do you see NMT having a big impact in 2021?
Greg: Before Covid had started, Gábor and I were planning our travel calendar for 2020, what exhibitions and tradeshows to attend. We were thinking this would be the year that we step out of the close circle of LSPs and we’ll have a look at what is happening in game development, travel and ecommerce. The pandemic put the brakes on attending those tradeshows but those are still sectors we’d like to explore.
Gábor: As Greg says, it’s gaming, the travel industry, ecommerce, anything sold or handled on the internet.
Which geographies will see the highest MT growth in 2021?
Gábor: The Asian markets are growing and of course the combinations and language pairs within this. For example, Chinese to Japanese, or Korean to Japanese, the demand for translation in these pairs will grow. That region will grow significantly from my point of view.
The Japanese market is one of the most interesting for MT. MT was simply not good enough for Japanese before but with NMT that changed, and people there are getting excited with the results from NMT. It’s a double switch as the language industry in Japan is getting more technical, they are starting to use more CAT Tools, TMS, and MT so they will very soon fill all the gaps they had from the past in a very short time.
So who is going to get rich in MT business?
Gábor: Really rich? It’s not easy to get rich with MT (laughs). I don’t believe there will be one big game-changer who will gain all of the users. I would say that the market will be the same as it is now. Big players and niche players, there is space for everyone.
The year 2021 marked the arrival of speech to speech translation in the commercial world. Scientists are working on making the underlying technology smoother and more accurate, engineers are integrating it into practical use cases. At the same time, there is an explosion in neural voices. Between July and September, three companies in this area […]
Partner Spotlight: Pangeanic Smart governments are hiring data scientists to further automate what governments do for their citizens. These data scientists work on creating data highways, so that the information that flows into systems is structured, and a thousand different applications can spring forth from it in the future. In the meanwhile, Manuel Herranz and his company […]
Case Study Engines from IT giants such as Google Translate, Microsoft, and Yandex often win in quality because search engine companies possess the whole internet as their data pool. However, with very specialized content and excellent translation memory, this advantage is nullified. In this case study, the engine from a smaller MT vendor Globalese won […]