Facebook alert: Translating languages like Urdu to be more accurate on social media platform

Written By: PTI San FranciscoPTI Updated on: September 02, 2018, 06.26 PM IST

The existing machine translation systems can achieve near human-level performance on some languages but they require access to parallel corpus vast quantities of the same sentences in different languages in order to learn, it said.

Researchers at Facebook have developed a quicker and more accurate way of translating low-resources languages like Urdu and Burmese using Artificial Intelligence, said a media report.

COMMERCIAL BREAK

SCROLL TO CONTINUE READING

The breakthrough, which will be presented at Empirical Methods in Natural Language Processing or EMNLP, could prove to be important for Facebook, as the social media giant uses automatic language translation to help its users around the world to read posts in their preferred language, the Forbes reported.

The existing machine translation systems can achieve near human-level performance on some languages but they require access to parallel corpus vast quantities of the same sentences in different languages in order to learn, it said.

The team from the Facebook AI Research (FAIR) division were able to train a machine translation (MT) system by feeding it large pieces of different text in different languages from publicly available websites like Wikipedia.

The key thing to note is that these pieces of text were independent of one another. When you have different pieces of text in different languages they're referred to as monolingual corpora, it said.

"Building a parallel corpus is complicated because you need to find people fluent in two languages to create it. For instance, if you wanted to build a parallel corpus of Portuguese/Nepali, you would need to find people fluent in these two languages, which would be very difficult," Antoine Bordes, a research scientist and the head of FAIR's Paris research lab, was quoted as saying in the report.

He said: "On the other side, building monolingual corpora Portuguese/Nepali is very easy: you just need to download webpages from Portuguese and from Nepali websites, it doesn't matter if they are not parallel sentences or if they talk about different things".

Watch Zee Business video here:

Most language translation computer systems use both monolingual corpora and parallel corpus to learn.

"The novelty in our approach is that we can train MT systems from monolingual corpora only, we don't need any parallel corpus. Potentially, given a book written in an alien language, we could use our model to translate it into English," Bordes said.

Get Latest Business News, Stock Market Updates and Videos; Check your tax outgo through Income Tax Calculator and save money through our Personal Finance coverage. Check Business Breaking News Live on Zee Business Twitter and Facebook. Subscribe on YouTube.