Microsoft yesterday announced it would translate live group conversations through the Microsoft Translator brand, its answer to Google Translate. The announcement comes on the heels of their subsidiary Skype’s similar declaration that they would start testing real-time translation of Skype calls to mobiles and landlines.
“This new feature allows people to communicate in different languages face to face using their own language on their own device,” read Microsoft’s announcement on their product page. “This feature opens the door to a whole new world of communication among people, regardless of their language.”
It was misreported by some outlets that this was a new app. It’s simply an update of the current Translator app, which was already more stylish looking than Translate.
Google Translate outpaces Microsoft easily by offering about twice as many languages (103-52) in its written form online. But Microsoft is making a statement with its AI by processing speech in nine of the world’s major vernaculars, claiming their technology can keep up with the interpretation of a single sentence in eight languages in addition to the language of origin.
Of course, Google already has this capability worked into Translate. A side-by-side comparison is likely the best approach to understanding whose app has the advantage on translations themselves, but Google has yet to release such a feature that would potentially keep up with the neck-breaking pace of a group conversation.
Machine voice translation needs to incorporate emotional and visual data
There is also the issue of what is presented in the video of needing to look down for the interpretation after every sentence, so it remains to be seen how quick Microsoft can make their new technology. But Microsoft shouldn’t sit too high and mighty upon the hill, because Google can easily make adjustments to catch up. More importantly, Google is not the only game in town when it comes to real-time conversational translation.
Waverly Labs released an earpiece device earlier this year on Kickstarter that hears and transmits translations automatically, mimicking the universal translators of Star Trek that Microsoft themselves seem to reference in their announcement by saying, “The personal universal translator has long been a dream of science fiction, but today that dream becomes a reality.”
There is also still a ways to go toward making translation perfectly reliable.
Machine translation and applying natural language processing to the translation process is extraordinarily difficult. It will be a while before machines are truly able to process what someone from another culture is saying, simply because the tech we’re playing with right now doesn’t grasp nuance, emotion, tone, or mood, among other things.
Take for example emotional analytics company Beyond Verbal, whose technology picks up data about customers for marketers during phone calls and sales and is now being used to find bio-indicators of disease pathology in people’s stutters and pauses.
Also consider the role of face and hand gestures in conversation. Developers might be more keen to work with an obvious technology conduit like a phone to process linguistic input, but they can’t factor in things like someone winking when they speak or pointing in a certain direction.
The SignAloud motion-detecting glove developed by two students at the University of Washington seeks to finally process sign language for translation purposes with their motion-detecting gloves. Of course, not everyone is Michael Jackson and it’s not always cold outside; people won’t wear gloves to communicate. But it’s a step worth taking on the path to something more encompassing.
People do not speak like they write
Not having a natural language processing (NLP) background to go with my linguistics minor, it could very well be I’m disparaging an amazing accomplishment by Microsoft (and Google for that matter). I’m not trying to. It’s an achievement. However, the selection of languages itself by Microsoft kind of indicates they might be missing something.
NLP developers constantly refine the ability to recognize dialectal differences when people speak, which sometimes can include an entirely different definition to a certain word (like “tabling” in British and American English or “torta” in Mexican Spanish and other dialects).
They are aware of the dialect issue. The company says only the Brazilian dialect of Portuguese is available so far, which has vast differences with the form spoken in Portugal.
I personally would like to see how encompassing Microsoft Translator’s ability to discern someone from rural West Virginia speaking to someone from rural Scotland, two very deep accents whose pronunciation can be difficult to understand in certain circumstances to people used to the English often broadcast in mass media.
But in the case of Arabic, Microsoft might have exposed themselves, because no one actually speaks Standard Arabic. It’s only heard in scripted news and is often dropped when anchors have conversations with guests and pundits. Across the Arab World, so-called “dialects” are virtually different languages. In some cases, speakers cannot understand each other. And don’t even think about speaking to a Moroccan, the differences being overwhelming.
Linguists could use Machine translation to resolve academic arguments about the similarities and differences in language between cultures, assuming enough samples can be collected. It could also help compile a standard written form for the “dialects” of Arabic that I just referenced: Palestinian Arabic, Egyptian Arabic, etc.
We’re not there yet, because we have only just begun to bridge the river that separates written translation and spoken translation. Hell, written machine translation is still only focused on modern standardized languages. You can’t use Microsoft, Google, or startups like Unbabel and LingoHub to translate online shorthand or any ancient classical tongues.
I’m confident, however, that as these projects advance that innovators, enthusiasts, future 30-year NLP veterans, and insightful speakers themselves will help us mechanize cross-border conversation.