Machine to replace human translation. However, it is still a long way to go. (Image via Adweek)

Facebook uses mathematics to enhance translations

American social media platform, Facebook, sought to improve its translation system to reach out to people all around the world. It is not only Facebook but also all tech giants around the globe, wanting to get their contents delivered according to their customers’ native tongue.

While machine translation was found based on the dictionaries, researchers on Facebook went all science to deliver Facebook content comprehensible on a universal scale. By changing words into figures and statistics, the researchers look for mathematical similarities between languages.

All along, Facebook’s automatic translation compares parallel texts between languages to seek for its similarities and delivers the meaning. However, for some language pairs, the parallel text alone is insufficient.

Antoine Bordes, European co-director of AI research for Facebook in one of its research labs in Paris, said that Facebook had already known 200 languages. Currently, Facebook is developing a system that produces renders words into a mathematical representation.

Mathematically speaking, a word is rendered into a vector in dimensions. If words – as vectors – have a close relation to other words in the dimension, they will stick close together, explained one of the system designers, Guillaume Lample.

Giving the example, “cat” or “dog” will stick together under the “Animal” dimension. These dimensions then will be linked to each other by the algorithm.

Lample and Bordes, of course, did not promise that it would be perfect in one trial. However, as Facebook accepts many inputs, the algorithm and dimensions will be improved until reaching margin error, or no error.

Grading from the worst to the best word vector system, English – Romanian translation is the worst. While for English – Urdu, the system is the most superior to all.

Therefore, with the new findings, Facebook hopes to revive some dead languages. However, the problem is that to revive the dead language with the word vector system a large amount of reference texts is needed. We are not talking about hundreds, but hundreds of thousands, said Lample.

Source: https://bit.ly/2MAUBOe