AlphaGo: Implications for Machine Translation
Machine defeats man at the game of Go.
The entire world was stunned at the 4–1 win by Google’s Deep Mind over Lee Se-Dol, one of the world’s best Go players. Some say that Deep Mind is a highly specialized intelligence that only knows how to play Go. But the principles, techniques, and algorithms underlying Deep Mind do in fact have wider application to so-called AI-complete problems. What do they mean for Machine Translation (MT)?
The development of go programs and machine translation programs have followed a parallel path.
The initial generation of solutions to both problems were based on “classical” AI techniques of encoding human knowledge. The go programs used rules of “good shape”, and human-style “reading”. The MT programs used grammars and rulesets built by human linguists. The results, in the case of go, were programs which played at the amateur sub-dan level (meaning the top 10% of all players). The results, in the case of MT, were programs which could at best produce vaguely understandable translations.
The second generation of solutions to both problems were based on statistics. The go programs used the so-called “Monte Carlo” techniques to sample large subsets of possible games, with many interesting optimizations. The MT programs used matrices of rules derived statistically from huge bilingual corpora. This allowed the go programs to play at the amateur high-dan level (meaning the top 1% of all players). It allowed the MT programs to produce quite acceptable output for simple inputs in languages which were linguistically close. This is the current state-of-the-art for Google Translate and most other commercial machine translation systems.
Now the third generation of Go programs has arrived in the form of Deep Mind. It combines “deep learning” using neural networks, in combination with the former statistical/Monte-Carlo tree search. As is now well-known, Deep Mind actually uses two neural networks: the policy network for identifying moves to examine further, and the value network for evaluating board positions. It is the combination of these neural networks and tree search, of course in conjunction with massive amounts of computing power, that allowed AlphaGo to beat the world champion.
So where is the third generation of MT systems? It is under active development. This problem is actually much harder than Go, which after all is just about black and white stones on a 19×19 grid. And for translation, there is no simple win/loss metric such as there is in go. However, we can imagine an equally compelling application of deep-learning neural networks to the MT problem. Just as the conceptualization of the policy network and the value network was key to the strength of AlphaGo, it will be crucial in MT to design the right set of networks.
Just as the third-generation Go solution did, the MT solution will need to solve the “long-distance” problem. In Go, the problem is that one stone far away can have a decisive impact on another part of the board. In MT, the problem is that one word far away can have an equally decisive impact on the overall meaning. This is the issue that has bedeviled MT from and into non-Western languages such as Japanese, where critical elements can occur at the far end of a sentence.
For years, experts have been debating whether and when MT will “take over” from human translators. Some said in a few years, some said in a few decades, some said never. Of course, being able to “take over” is completely conditional on the complexity of the input and required quality of the output. As a translator from Japanese to English myself, I deal with topics ranging from a board presentation about some company’s new retail strategy, to a discussion of Japan’s response to the tragic earthquake in 2011, to academic papers involving econometric models of rice trade in medieval Japan. Certainly we would like MT to be able to handle the first two easily.
I believe that the introduction of neural networks and deep learning into MT could result in astonishingly good MT systems within the next 12–24 months. Yes, many have predicted such rapid progress in the past. Yet we are now at a qualitatively different point in time, with much more sophisticated understanding of machine learning, not to mention massive distributed computing power. One result will be that the lower tier of human translation will fall away. This will be third generation of MT.
So what is the fourth generation, of Go programs, or of MT? The fourth generation of Go programs will play at the level of God. God’s skill at Go has long been discussed; many pros think it’s approximately four stones stronger than they are. A four-stone difference would imply a winning percentage of 95%. AlphaGo’s winning percentage against Lee Se-Dol was 80%, so simplistically we can say it seems to be two stones stronger than Lee Se-Dol. In other words, to reach the level of God, the next generation needs to take it up another two stones, or another fifteen points in terms of winning percentage. My personal opinion is that the fourth generation will involve bringing back explicit intuition. In other words, in generation two we discarded intuition for brute force.In generation three, we added learning. In generation four, we will add back intuition and thought. The fourth generation will also be able to tell us why the machine played a certain move, in terms we can understand.
What is the analog for MT? After second-generation statistics, and third-generation deep learning, the fourth generation will be a reintroduction of linguistic intuition. With a particular sort of human guidance, the machine will learn to enrich its analysis with a particular kind of understanding. At that point, the human translator will gradually be relegated to specialized cases such as translating novels. Given the current pace of development, we are not more than ten years away from reaching this point.