Big Data

Google Translate's 2016 neural machine translation upgrade shows how hard it is to launch a meaningful startup in the translation vertical (image credit, Google)

Why Google Translate’s neural machine translation is a major gambit on startups

Alphabet's move to upgrade Google Translate really solidifies that at any moment, the nascent world of machine translation startups could be knocked off course


Google made a massive announcement about its Google Translate service on Tuesday while the world waited to cover Elon Musk’s plans to blast humanity’s best and bravest to colonies on Mars. The Google Brain Team announced results of research into replacing its “phrase-based machine translation” (PMT) with a “neural machine translation” (NMT). What that means is that rather than analyzing individual words and sometimes groups of words, the new algorithms will consider the entire sentence as a single phrase, as well as clause and word combos within the sentences entered for translation.

“Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality,” Quoc V. Le and Mike Schuster jointly posted on Google Translate’s official blog. Their full study was also linked to in the announcement, dubbed “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.

Illustration of Google's new neural machine translation in action (Google)

Illustration of Google’s new neural machine translation in action between Mandarin and English (Google)

That is a game-changing prospect in a surprisingly uncrowded field of machine translation startups. The unfortunate reality for translation is that not only is everyone a critic of your work, but most people criticizing you actually are qualified to trash bad translations. There is very little wiggle room in the business for people with ‘so-so’ or ‘okay’ translations. You can’t get money if you don’t make sense.

“GNMT reduces translation errors by more than 55%-85% on several major language pairs measured on sampled sentences from Wikipedia and news websites with the help of bilingual human raters,” Le and Schuster claimed in their report, which they illustrate with the following example:

Sample translation of Google PMT, Google NMT and human translation (Google)

Sample translation of Google PMT, Google NMT and human translation (Google)

This is a very strange example considering the researchers did not note that the GNMT translation is actually better than the human translation. Another graph shows survey results of human reviewers who usually ranked human translations better than GNMT results. But that might be the intention here, to show that there are still some common mechanical errors humans make when they cannot figure out phrasing, while their new algorithm will not have that kind of issue. That is not to say it won’t have problems, but they might be implying that they have solved some problems in producing flowing translated sentences that can challenge the strongest emergent technology that can do this: translation memory.

Translation memory collects data from previous translations, sometimes client or industry-specific, and uses that as a bank of previous in-context translations of words, phrases and entire sentences.

Geektime has profiled three really strong startups in the field who have built their business on translation memory: Sino-Indian startup StepesLingoHub in Austria and Unbabel in Portugal. We have only profiled three solid startups in the business, but that’s not for lack of trying to find more. There aren’t many that are doing well and new players are having a hard time getting funding and staying in the game. Some have folded and others reached dead ends. Unbabel has Silicon Valley support while LingoHub is still bootstrapped.

What Google has done adds immense pressure to these three companies and anyone who is clever enough in both human language and coding language to mount a successful, game-changing translation service. Algorithms can by their nature collect data much more quickly than memory banks.

One other simplified way to describe it is the same way we read words, as a unit. An old meme has people believing that if all the letters in a word are scrambled only slightly but the first and last letters are correct, we will be able to read the misspelled word because of its resemblance to the correct order of letters. Research shows that reading this way is still slower, so it’s not a completely true idea, yet it’s not exactly a fictitious myth either. The same idea works similarly here. Some languages write sentences in a Subject-Verb-Object order (SVO), as in “Joshua buys the book.” Others work in different arrangements or are flexible, as you can say in Spanish, “compra Josué el libro” in certain contexts. Except, of course, algorithms will group the terms together in the most logical way in the output language.

On the other hand, the algorithm is not proven yet. It’s best metric is comparison to previous Google Translate results, which can sometimes be humorously awful and embarrass corner-cutting 5th graders too lazy to do their Spanish homework as well as well-meaning politicians just trying to convey greetings to people. There are plenty of multilingual developers out there who might want to take a crack at this, and I’m sure plenty who work at Google who might be harboring some good ideas and looking for an excuse to get incubated somewhere. The game is by no means over, but Google’s updates on translation can prove themselves to be as debilitating for translation startups as their Google Search updates can be on SEO companies.

“Machine translation is by no means solved,” Le and Schuster go on to say. “GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page. There is still a lot of work we can do to serve our users better.”

“However, GNMT represents a significant milestone. We would like to celebrate it with the many researchers and engineers—both within Google and the wider community—who have contributed to this direction of research in the past few years.”

Google’s full research report includes the following, intimidatingly brilliant authors: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes and Jeffrey Dean.

Photo courtesy: WikiMedia Commons

Lists

Top 10 tech startups clicking in Cardiff and Wales

Older than the Great Pyramid and Stonehinge, Cardiff carries the startup banner for most of Wales


Photo courtesy: Pixabay.com

Lists

Top 10 tech startups bustling in Belfast Ireland

The capital and largest city of Northern Ireland, birthplace of the RMS Titanic, and high-tech companies all help startups prosper


Smiling young business woman in Seoul downtown, South Korea. Photo Credit: LeoPatrizi

Smiling young business woman in Seoul downtown, South Korea. Photo Credit: LeoPatrizi

Entrepreneurship

Amazon is scared of Korea. But here’s how foreign entrepreneurs can succeed

Hint: It takes a little more than knowing 'Gangnam Style'


Bitcoin Source: Getty Images Israel

Bitcoin Source: Getty Images Israel

FinTech

Fintech and blockchain – a new wave of startups in the making?

Despite its relative infancy, blockchain technology is quickly proving its worth


Photo courtesy: Pixabay.com

Lists

Top 10 tech startups making progress in Minneapolis

A financial center that anchors the upper Midwest, Minneapolis and St. Paul spawn serious startups


Industry
mexico

mexico pd cc0 pixabay

Socially-focused startups tackle rural Mexico’s energy problems

Mexican startup companies are turning on the lights and treating water in the country's most impoverished regions


Industry
lima

lima pd cc0 pixabay

Endless Lima traffic spawns innovative startups

Let's talk about Lima


Industry

New concept: Booking meeting rooms at the heart of Tel Aviv by the hour

Meet in Place is a new venture that will allow you to book meeting rooms for 2 to 30 people at an hourly-based rate. Price: from 98 ILS for a classic room with coffee, soda and WiFi connection


Industry

Beginnings and beyond: a snapshot of Cisco’s investments in Israel 

When it comes to Cisco's history in Israel, it's easy to let the numbers tell the story of growth


Health
medtech

Entering a growing market: considerations for entrepreneurs in the field of digital health

With a market estimated to be worth $140 billion and expected growth to more than $400 billion in 2025, it is no wonder that many Israeli entrepreneurs are active in the thirst-for-innovation health sector. Still, many of them are closing down. We've prepared some tools to help you overcome the statistics


Security

Photo Credit: Tim Robberts / Getty Images Israel

4 Network Security Tips Progressive Small Businesses are Implementing

Network security isn’t a topic reserved for massive enterprises with a huge digital footprint. Even smaller businesses have to think about how they’re going to establish and monitor a network so they can be successful both now and in the future.


Entrepreneurship

Group of coworkers discussing project on digital tablet at office workstation Photo Credit: Thomas Barwick / Getty Images Israel

5 Tips for Diversify Your Portfolio as an Entrepreneur

diversification plays an important role in financial security so how can you diversify your assets for maximum ROI?


Security

Photo Credit: Colin Anderson Getty Images Israel

For Retail Startups, Security is Paramount



The Red Mail