Unbabel’s machine learning algorithm learns from its translation mistakes and promises a formidable new standard for digital translation
Translation is a growing industry. It has never been out of fashion, but globalization has created a new opportunity to create networks of translators working more frequently than ever before. While technology is producing some fairly sophisticated open tools like Google Translate, you can’t trust the machine with an extensive project.
Unbabel wants to change that. Co-founders CEO Vasco Pedro, CTO João Graça, CMO Sofia Pessanha, Bruno Prezado, and Hugo Silvahave founded a startup that applies machine learning to the translation process. Taking a page out of Fiverr’s book, they have an army of 42,000 translators around the world who are correcting their solution’s mistakes. But Unbabel raises the bar on refinement. As editors smooth over translators’ work, their AI solution learns from its own initial mistakes.
“Machine learning sucks and the translation is not there. Human translation is expensive and not scaleable,” CEO Vasco Pedro tells Geektime. “[Our tech] very thoroughly automates the process of language but uses humans in the parts you need to use, like the creative parts. Machine translation is the initial basis.”
Unbabel has raised $3 million in venture funding from Google Ventures and Matrix Partners alongside investors in China and Japan, and have 26 full-time employees between their two offices in Lisbon and San Francisco.
But how do you select the ideal translator for a given project and how do you augment that translator’s abilities? Pedro says another key component of their AI is something they call Smartcheck, i.e.e “spellcheck on steroids.” Pedro compared Unbabel’s strategy to what he says made the Industrial Revolution successful, breaking down the process of manufacturing something into multiple stages and letting each stage in that process scale. Pedro thinks translation hasn’t been brought to that level of sophistication yet, but a multi-layered approach mixing AI and human translators could make the product more sophisticated and easier to scale.
The rates they pay their massive network of translators sound lucrative compared to what Fiverr pays its lowest content writers. Interpreters start at $8/hour but could make as much as $20/hour, depending on their experience, speed and of course accuracy. The initial translation is looked over by several translators working in their free time on desktop or mobile, then sent to an editor whose work is then checked over by another editor. Depending on the language, there might be more steps in the process. Interestingly, Pedro says the best way to start the process is with a non-native speaker of the target language, who might take a more mechanical approach similar to machines, and then allow a more senior expert to interpret what common mistakes the AI and lower-level translators might have made in the process.
While Unbabel is bringing in the dough, it hasn’t turned a profit yet. They launched a subscription-based service five months ago, but they anticipate it will take some more investments to reach the level of growth they want.
“We’re starting to see the internet actually diverge; English is now only 35% of the web. As communities online grow, more and more communities speaking a particular language are contributing content in their own language,” Pedro notes.
Stacked against Google Translate, but we need to go deeper
When asked about the utility of Google Translate, Pedro was complimentary but quick to point out that even though Google has been in the game for a while and has calibrated its product fairly well in widely spoken tongues, Unbabel has made major strides quickly.
Pedro explains, “We have a fundamentally different approach from Google. They’re doing an amazing job creating a general purpose system. We started a general system but then adapt it to particular domain and [a] particular customer that resolves a lot of the ambiguities in general systems. We outperform in certain language pairs that we’re developing, like English to Spanish.”
“When you initiate the action as the customer, if you want to translate a message, it has a big error rate. If you want to control what you’re saying, it’s very risky.”
Neither Google Translate nor its competitors (Microsoft Translator and Yandex Translate) can compete with a system that evolves to read into the nuance or emotional undertones behind certain word choices and phrasing. Also, Unbabel aims to customize its language profiles according to the customer.
“It adapts to the particular vocabulary of the customer and every domain has specific words in specific meanings. Pinterest wants to use pins everywhere that they say ‘pin.’ Google Translate’s sentences would lose their meanings. What we’re seeing with these machine learning models is that the learning is fundamentally different than Google Translate. It’s a more generative model that they’re trying to figure out; what’s the closest and most similar?” he asks.
While the goal is to deepen the learning capabilities of their technology, Pedro says they are still getting there. The platform still needs more input in terms of metaphor interpretation, allegory and other aspects of everyday sophisticated language that would make things sound and be more human.
Of interest to linguists, teachers and psychologists may be the similarities Pedro and his team have noticed between the way the machine makes mistakes and the way a young child might. Referencing my own experience as an aspiring-yet-unfulfilled polyglot father, I brought up my own son’s abilities and mistakes shifting between English and Hebrew, noting that he might make errors in English word order based on how a phrase might be structured in Hebrew. With that in mind, I asked Pedro if he had noticed a similar pattern of errors in Unbabel’s AI that might indicate it’s on course to learn in a similar matter as humans. His response: Well, kinda.
“The errors it makes sound more human, like a 4-year-old learning how to speak,” he reflects. Still, it is tough to extrapolate if its experience with one language affects how it translates another. “We haven’t tried doing it bidirectionally, so it’s hard. We haven’t seen that phenomenon yet. Typically when it gets it right, it gets it ‘righter,’ but when it gets it wrong, it gets it very wrong. Sometimes [there are] errors of over generalization that are creepy, like the way we learn.”
Asked what his timeline for developing a much more sophisticated language engineering machine that will overtake translation, Pedro offered an insight that is just as exciting as it is mysterious.
“We don’t fully understand how we learn languages. The amount of sentences someone young learning 8-12 hours a day are far less than our system, but we are far more efficient in learning. Something is going on there that we just don’t yet understand.”