The move is the beginning of a major rollout of neural machine translation, which promises smoother syntax and grammar
Google has begun to roll out its major machine learning update to Translate, the company announced late Tuesday. The update means switching from its “phrase-based machine translation” (PMT) to a “neural machine translation” (NMT), which as you might expect is supposed to produce more accurate translations. The results would be more natural with syntax and grammar that better reflects the way people actually speak.
“In 10 years, Google Translate has gone from supporting just a few languages to 103, connecting strangers, reaching across language barriers and even helping people find love,” Barak Turovsky, the product lead for Google Translate, wrote in a blog post, saying Google was turning over its use of statistical machine translation (SMT), another term for PMT, in favor of NMT (sorry for the abbreviation).
The shift to NMT was first announced in September and will initially target nine languages: English, French, Spanish, Chinese (Mandarin), German, Turkish, Portuguese, Japanese, and Korean, but ambiguously said they were starting with only eight language pairs “within Google Search, the Google Translate app, and website.” A language pair is any two languages, English→French and French→English, for back-and-forth languages between the languages.
Sometimes languages are easier to translate into one other language than another, meaning that machine translation will need to build up a repository of data exclusive to a certain pair of languages. They did not mention which eight pairs the system would start with and when the other pairs among those nine languages might be fully vamped and ready to go.
If you’re trying to get a clue as to which languages would likely be next on the list, expect them to be among the most common languages in the world: Arabic (standard), Vietnamese, Italian, and Thai. Indian languages would likely be released close together to ensure Google’s Indian users get the news all at the same time, namely Hindi, Bengali, Punjabi, Urdu, and Marathi but not limited to those. Google has thus far failed to bring other Chinese languages to the forefront, namely Cantonese.
It would be interesting to see if Google’s use of AI here will allow for better gathering of language samples from informal but widely spoken languages, particularly the several “dialects” of Arabic (which are sometimes mutually unintelligible and better defined as separate languages) and the several Chinese languages like Wu and Shanghaiese that Beijing has not supported in its national education policy. Some Indian languages are also missing from Translate, which are widely spoken in under-connected areas of the country.
There are not many startups that are capable of competing with Google Translate, especially now with this upgrade. Two rare exceptions might be Austria’s LingoHub and Portugal’s Unbabel, the latter having raised a $5 million Series A round two weeks ago and considered a riding star of the Portuguese startup ecosystem.
Turovsky committed not just to making sure the rollout eventually reached all of Google Translate’s 103 languages, but would not undercut their goal to bring more languages to the platform in the future.
“We’ll also continue to rely on Translate Community, where language loving multilingual speakers can help share their language by contributing and reviewing translations. We can’t wait for you to start translating and understanding the world just a little bit better.