Robotics

An illustrated artificial neural network (ANN) (CC BY SA 4.0 LearnDataSci via Wikimedia Commons)

No, Google Translate did not invent its own language called ‘interlingua’

The system's 'neural network' is advanced, but its abilities are being exaggerated by observers


I have a fascination with translation, primarily because I have an interest in languages. I’m what I like to call “an aspiring polyglot,” with the implication that I don’t have time to practice (and reach complete fluency in) the few foreign languages I have some knowledge of, yet I give myself plenty of time to learn about said languages, how they are all different and by extension how they all work.

As a technology- and startups-focused journalist, that makes the evermore popular topic of machine translation (MT) and “translation memory” fascinating, giving me the chance to cover companies like Austrian startup LingoHub (an essential service for apps) or Portuguese startup Unbabel (the next-level stuff they’re doing is very cool). I can ask people how they communicate with lovers from other countries and report on developments like Google Translate’s upgrade from “phrase-based machine translation” (PMT) with a “neural machine translation” (NMT).

It’s the last one that has me going right now.

“Google Translate invented its own language to help it translate more effectively,” wrote UX developer Gil Fewster on Medium, with the bold emphasis his own. He was reacting to a blog post from Google posted in late November titled Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System co-written by Mike Schuster and Nikhil Thorat of the Google Brain Team, as well as Google Translate product team member Melvin Johnson.

Fewster continued, “What’s more, nobody told it to. It didn’t develop a language (or interlingua, as Google call [sic] it) because it was coded to. It developed a new language because the software determined over time that this was the most efficient way to solve the problem of translation.”

That’s not true.

It would be unfair to say Fewster is solely responsible for this marathon of triumphant assumptions, but he merely reiterated a collection of terrible clickbait headlines from TechCrunch and New Scientist that exaggerate (if not outright lie about) what this upgrade has accomplished.

“If the computer is able to make connections between concepts and words that have not been formally linked… does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another?” asks TechCrunch contributor Devin Coldewey. “In other words, has the computer developed its own internal language to represent the concepts it uses to translate between other languages?”

No. It hasn’t.

“In a sense, that means it has created a new common language, albeit one that’s specific to the task of translation and not readable or usable for humans,” wrote New Scientist‘s Sam Wong.

I’m going to scream. Let’s explain why this is an exaggeration.

Right away two problems come to mind with this understanding of what Google is doing.

1) Referring to the process of translation as a language in and of itself is inaccurate, or at least blurs definitions.

2) It ignores the fact that a secondary, indirect translation is always flawed compared to a primary, direct translation. Implying it is amazing demonstrates not so much a lack of knowledge of how machine learning works as much as it does how translation works.

Those are both red flags, and indeed the system has not done this.

Fewster, Coldewey, and Wong refer to this language as “interlingua,” a special internal “language” that enables translation from Language A into Language C by first filtering it through Language B.

Zero-shot translation vs. direct translation

Schuster, Johnson, and Thorat give an example of four “language pairs” that the program knows how to translate: Korean-English, English-Korean, Japanese-English, and English-Japanese translation.

“Our multilingual system, with the same size as a single GNMT system, shares its parameters to translate between these four different language pairs,” the trio explains. “This sharing enables the system to transfer the ‘translation knowledge’ from one language pair to the others. This transfer learning and the need to translate between multiple languages forces the system to better use its modeling power.”

Google Neural Machine Translation in action (image via Google Research Blog)

In this scenario, the machine uses how Japanese or Korean might be translated into English like a bridge to create translations to, from, and between Japanese and Korean. GNMT is also using stored data from other language pairings with its two target languages to better approximate the translation.

It’s the equivalent of you having identical homework assignments for French and Spanish at the same time, realizing your French class translation looks weird and double-checks it against your work in Spanish.

This is just an example (there is clearly a lot of translation history between Japanese and Korean). Yet, there probably isn’t so much translation history between, say, Hebrew and Marathi. How those two languages look when translated to more widely-spoken tongues is valuable in building bridges between Israelis and Marathis.

“The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an ‘interlingua?'”

In other words, is the system matching different representations to a primary paradigm? Is it matching words like “baum” in German or “tree” in English to the same specific node in the neural network that signifies the same thing?

Even if it is, that isn’t the creation of a new language. Translate isn’t creating a wholly separate language at all, but merely checking its interpretation from one language against translations in another.

Knowledge of two languages in humans improves understanding a third

Why are people raised bilingual from birth better at learning new languages? It’s the same reason Google Translate’s upgrade works so well: It draws on knowledge of multiple languages to better understand new ones.

“Gaining command of a number of languages improves proficiency in native languages,” Professor Salim Abu-Rabia of the University of Haifa said in 2011 following a study of monolingual Hebrew speakers and bilingual Russian-Hebrew speakers trying to learn English. “Our study has also shown that applying language skills from one language to another is a critical cognitive function that makes it easier for an individual to go through the learning process successfully.”

Translate indeed does utilize knowledge from other languages to better approximate a translation into a target language.

However, it still treats those other languages as separate data sets, just like you might. Spanish and English have their own particular lists of vocabulary, rules for sentence structure, and guidelines for verb conjugation. Just like a multilingual student learning a new language, Google Translate now has multiple data sets from which to pull information and double-check its work.

Here’s a simple example.

Let’s consider a Yeshiva student who knows how to speak English, speak a bit of and read some Hebrew and Arabic, but is learning Babylonian Talmud without the aid of a Babylonian Aramaic dictionary (please, anybody, someone digitize the Jastrow dictionary already).

With no formal Aramaic training, sees a word like חמרא (pronounced khamra) when talking about wine. Where are the Hebrew words for wine? יין or תירוש? Ah, but I know خمر (pronounced khamr) is the word for wine in Arabic, so I use that knowledge to fill in my missing Aramaic. Now the Hebrew-like Aramaic is complete and I can translate it all into English, the language I am using to learn and discuss the ancient text.

The one question to ask before you accept a headline about “AI”

When reading technology news, always operate on the assumption there is nothing close to Skynet or the machines who created the Matrix. Building something like that is still years, if not decades, away. In the interim, always treat a story that claims a computer program did something without being asked with a healthy dose of skepticism. In all likelihood, the program simply did something its operators did not realize they had programmed it to do, or the explanation is far simpler than it first appears.

Photo courtesy: WikiMedia Commons

Lists

Top 10 tech startups clicking in Cardiff and Wales

Older than the Great Pyramid and Stonehinge, Cardiff carries the startup banner for most of Wales


Photo courtesy: Pixabay.com

Lists

Top 10 tech startups bustling in Belfast Ireland

The capital and largest city of Northern Ireland, birthplace of the RMS Titanic, and high-tech companies all help startups prosper


Smiling young business woman in Seoul downtown, South Korea. Photo Credit: LeoPatrizi

Smiling young business woman in Seoul downtown, South Korea. Photo Credit: LeoPatrizi

Entrepreneurship

Amazon is scared of Korea. But here’s how foreign entrepreneurs can succeed

Hint: It takes a little more than knowing 'Gangnam Style'


Bitcoin Source: Getty Images Israel

Bitcoin Source: Getty Images Israel

FinTech

Fintech and blockchain – a new wave of startups in the making?

Despite its relative infancy, blockchain technology is quickly proving its worth


Photo courtesy: Pixabay.com

Lists

Top 10 tech startups making progress in Minneapolis

A financial center that anchors the upper Midwest, Minneapolis and St. Paul spawn serious startups


Industry
mexico

mexico pd cc0 pixabay

Socially-focused startups tackle rural Mexico’s energy problems

Mexican startup companies are turning on the lights and treating water in the country's most impoverished regions


Industry
lima

lima pd cc0 pixabay

Endless Lima traffic spawns innovative startups

Let's talk about Lima


Industry

New concept: Booking meeting rooms at the heart of Tel Aviv by the hour

Meet in Place is a new venture that will allow you to book meeting rooms for 2 to 30 people at an hourly-based rate. Price: from 98 ILS for a classic room with coffee, soda and WiFi connection


Industry

Beginnings and beyond: a snapshot of Cisco’s investments in Israel 

When it comes to Cisco's history in Israel, it's easy to let the numbers tell the story of growth


Health
medtech

Entering a growing market: considerations for entrepreneurs in the field of digital health

With a market estimated to be worth $140 billion and expected growth to more than $400 billion in 2025, it is no wonder that many Israeli entrepreneurs are active in the thirst-for-innovation health sector. Still, many of them are closing down. We've prepared some tools to help you overcome the statistics


Security

Photo Credit: Tim Robberts / Getty Images Israel

4 Network Security Tips Progressive Small Businesses are Implementing

Network security isn’t a topic reserved for massive enterprises with a huge digital footprint. Even smaller businesses have to think about how they’re going to establish and monitor a network so they can be successful both now and in the future.


Entrepreneurship

Group of coworkers discussing project on digital tablet at office workstation Photo Credit: Thomas Barwick / Getty Images Israel

5 Tips for Diversify Your Portfolio as an Entrepreneur

diversification plays an important role in financial security so how can you diversify your assets for maximum ROI?


Security

Photo Credit: Colin Anderson Getty Images Israel

For Retail Startups, Security is Paramount



The Red Mail