Archive for the ‘language’ Category

AlphaGo: Implications for Machine Translation

Thursday, April 7th, 2016

In this March 15, 2016 photo, South Korean professional Go player Lee Sedol reviews the match after finishing the final match of the Google DeepMind Challenge Match against Google's artificial intelligence program, AlphaGo, in Seoul, South Korea. Google's Go-playing computer program again defeated its human opponent in a final match on Tuesday that sealed its 4:1 victory. (AP Photo/Lee Jin-man, File)

Machine defeats man at the game of Go.

The entire world was stunned at the 4–1 win by Google’s Deep Mind over Lee Se-Dol, one of the world’s best Go players. Some say that Deep Mind is a highly specialized intelligence that only knows how to play Go. But the principles, techniques, and algorithms underlying Deep Mind do in fact have wider application to so-called AI-complete problems. What do they mean for Machine Translation (MT)?

The development of go programs and machine translation programs have followed a parallel path.

The initial generation of solutions to both problems were based on “classical” AI techniques of encoding human knowledge. The go programs used rules of “good shape”, and human-style “reading”. The MT programs used grammars and rulesets built by human linguists. The results, in the case of go, were programs which played at the amateur sub-dan level (meaning the top 10% of all players). The results, in the case of MT, were programs which could at best produce vaguely understandable translations.


Mobile Phone Localization in India

Sunday, March 20th, 2016

languagepanelThe Telecom Regulatory Authority of India (TRAI) is proposing, as part of the government’s Digital India push, to make regional language interface capability mandatory on feature phones. These standards are expected to go into effect in six months.

One of the world’s leading localization companies, Moravia, has weighed in with a blog post entitled “India Gets Serious about Mobile Phone Localization“. Unfortunately, this post reveals a fundamental confusion about almost every aspect of this situation: the difference between feature phones vs. smart phones, between language support on phones themselves vs. applications, and between localization of interface vs. content. The post says that India is:

already one of the fastest-growing smartphone markets in the world, and the second largest overall. India’s vast hinterland is smartphone-equipped and hungry for local language content. The communities are cash-rich, and India’s e-commerce companies will have to reach out to them in earnest if they want to stay in the game.

this single paragraph neatly encapsulating all the confusion.


The Elusive Uberization of Translation

Friday, January 1st, 2016

The title of this post is borrowed shamelessly from the excellent article by Florian Faes found here.

Over the last decade, I’ve lost track of the number of times that the translation industry was gonig to be “reinvented” by this that or the other new model, usually one based on some sort of new technology. And each “reinvention” is accompanied by breathless hype and wonder in more blogs and posts by industry “experts” than you can count, many of whom should know better. Each “reinvention” is based on some plausible-sounding theory about the nature of the translation market that turns out to be just wrong.

These false theories and assumptions have no end. There would be an infinite demand for translation if only it were cheaper. There would be an infinite supply of translators if only we could tap the millions of bilingual people our there. Or no, the real problem is the efficiency of translation tools. Machine translation (MT) will eventually take over the lion’s share of translation work. Or it won’t. What we really need is to make it easier to order translation; where is my one-click translation button?

Problem is, none of these assumptions have proven to be true. The largest translation companies in the world continue to derive most of their revenue from large contracts with enterprises that have ongoing needs for translated documentation and software, carried out via well-defined processes and requiring well-defined quality levels. Startups based on the latest idea for a shiny new translation gadget flounder, ending up doing down-rounds to keep the lights on. Mid-tier translation companies continue to dominate the business in revenue terms, most doing semi-specialized medium-sized projects for medium-sized clients.

So how do we uberize this? The fundamental problem is to what extent translation is a commodity. Ride-hailing is the ultimate commodity: a car comes and takes you from point A to point B. There may be different sizes of cars, or special services like handicapped accessibility, but these amount to nothing more than different grades of coal. Ride-hailing is commoditizable because it is a commodity. We can aggregate demand, aggregate supply, and then tweak the economics of the business to death. To speak of uberizing translation implies that translation is a commodity.


James Austin's new book: Zen-Brain Reflections

Tuesday, January 3rd, 2006

Zen-Brain Reflections is James Austin’s follow-up to “Zen and the Brain.”

Austin’s work and study has led him to a deep understanding of what it means to translate ancient philosophical texts. Below I quote at length from his discussion of translating the Sandokai by Sekito (Shih T’ou) (p. 330; my emphasis, slightly edited for clarity):

Can any translation today have the same meaning as did the original, a work composed of only 220 Chinese characters? Suppose you were to insist on having only a direct, literal translation of each original Sino-Japanese ideogram. It would be a crude version in broken pidgin English. Professional translators can only be humbled by all the major compromises they have had to make. Beyond the basic problem, the casual Western reader may not suspect how many other major semantic compromises can enter in.

Begin with the title itself. One soon discovers that this same Sino-Japanese title has been translated into English in diffferent ways. Some options from our own era are

  • coincidence of difference and sameness
  • merging of difference and unity [Loori]
  • inquiry into matching halves
  • realizing unity [Cleary]
  • the coincidence of opposites
  • the harmony of difference and equality [Shunryu Suzuki]
  • the identity of relative and absolute [Glassman]

and so on.

The above examples suggest that different translators…might have chosen to insert aspects of either their own private experience, or earlier personal opinions, or even some doctrinal belief system into a given phrase. Moreover, each translator can have several other subjective needs

Let us be more specific, citing only a few potential conflicts that a contemporary translator might need to resolve. Must I adhere rigidly to literal interpretations, to traditional doctrinal formulas (and often multiple footnotes) to remain within acceptable scholarly traditions? Or can I remain true to what experience tells me is the direct, immediate flash of Zen insight itself? Because surely this deepest experiential truth entails letting go of my own tendencies…to attach arcane, dated references that overburden a line and blur the central message.

Nor do the translator’s conflicts and compromises end there. Can I still be true to those few old original ideograms, yet express their flowing spirit and intent in a readable contemporary literary style? Furthermore, must I conspire with the original author in old mystifications , thereby perpetuating the notion that everything about Zen is forever mysterious, if not unknowable?

Austin then proceeds to give his own translation of the Sandokai, which, although I know little Chinese and have never studied this poem in detail, appears to be a major improvement over existing translations in terms of both fidelity and readability

Here is an aspect of translation that often goes unnoticed, whether the document in question is a philosophical tract or a computer manual: the fluent translation is often actually more accurate. In other words, sloppiness on the part of the translator in understanding the original text tends to be correlated with sloppiness in rendering that understanding into the target language.

I’ll have more on Austin’s new book in the coming weeks.

All the World’s a Stage, in Japanese

Wednesday, March 23rd, 2005

We recently went to see As You Like It at the Ahmanson Theater. I’m not a theater critic, so I’ll limit my comments to noting that Rebecca Hall, who played Rosalind, should get out of Shakespeare’s way. We don’t really need every single phrase to be accompanied by giggles, sighs, extraneous eye movements, pauses, hand motions, and pseudo-dramatic twirls.

What I want to write about is the Japanese translation of Jaques’ famous “All the World’s a Stage” soliloquy.

All the world’s a stage,
And all the men and women merely players;
They have their exits and their entrances,
And one man in his time plays many parts,
His acts being seven ages.

(By the way, this speech later contains the first recorded usage of the word “puke” in the meaning of “vomit”.)

The Japanese translation we got our hands on, by Fukuda Tsuneari, goes like this in romaji:

Zen-sekai ga hitotsu no butai, soko-dewa danjo wo towanu, ningen wa subete yakusha ni suginai.

It’s amazing, although somehow not surprising, that a famous Shakespeare scholar could do such a bad job translating this passage. Given its visibility, it seems he could have spent at least a little more time on it. Here’s how I translate his Japanese back into English (a dangerous endeavor, as I am well aware, but sometimes inevitable):

The world in its entirety is one stage.
There, whether man or woman, all humans are nothing more than actors.

Our professor has managed to pack an astonishing number of bad translation decisions into such a short sentence. Here’s just a few:

  • “world” should not be “sekai”, which is a Sino-Japanese compound with nuances of “world of nations”; much better is the native Japanese word “yo”, a common word indicating the world around us
  • “all” of “all the world” is translated by placing the Sino-Japanese prefix “zen” in front of “sekai”, again yielding a non-colloquial, stiff result, but more importantly, the implication is of complete geographical coverage, rather than “all aspects” as Shakespeare presumably intended. The Japanese “issai” captures the correct meaning of “all” perfectly
  • whereas Shakespeare uses “men and women” just to indicate all the people in the world, perhaps liking the phrase’s meter, Fukuda reads too much into this and inserts the unwieldy “whether man or woman” into his translation
  • Fukuda translates the article “a” in “a stage” as “one, single”, although Shakespeare is certainly not emphasizing the singleness of the stage
  • after having gummed up his translation with “whether man or woman”, Fukuda ends up needing another word to serve as the subject of the next phrase, and goes with “ningen” (“human”), again too stiff, compared to the colloquial “hitobito” (“people”)

Here is Bob’s translation:

Butai da yo, kono yo wa issai. Hitobito mo mina, tan-naru yakusha.

A quantitative metric we can apply to comparing my translation with Fukuda’s is Bob’s Rule of Comparative Length, which states that bad translations are longer. Good editing, then, will tend to reduce the length of the translated text. In this case, the original English is 51 Roman characters; Fukuda’s translation 77; and mine a close match at 50.

Neuroconservatism, the latest neuroword

Wednesday, March 23rd, 2005

In its most recent issue, Fortune magazine coined the word “neuroconservatism”. The image is of conservative policies backed up by, or possibly tweaked to take into account, neuroscientific insights.

Example: A “pure”, libertarian-oriented conservative might like to offer dozens or hundreds of private plans to replace Social Security, but neurosicence tells us that people’s brains aren’t “wired” to deal with having so much choice, so they may end up choosing poorly or not at all. Neuroconservative solution: Give them fewer choices, or at least give them an intelligent default in line with good public policy.

At the moment, this word get zero Google hits.

Foundations of Language: Brain, Meaning, Grammar, Evolution

Saturday, February 12th, 2005

Foundations of Language: Brain, Meaning, Grammar, Evolution is Ray Jackendoff’s new book which tries to build a bridge between traditional linguistics, neuroscience, and evolution.

But after slogging through more than 400 pages, I was dismayed to find in his Concluding Remarks that all he himself claims have accomplished in the book was to “sharpen some questions.” I read the book to get answers to the questions—about, for example, how syntactic categories are instantiated in the nervous system—not to get them “sharpened.”

One particular annoying thing about the book is Jackendoff’s use of the prefix “f-”, as in f-knowledge or f-mind, to refer to some magic stratum between body and the regular non-f-mind. He integrates the body and mind, in other words, by inventing an imaginary layer where they are integrated.

There are gems of insight in this book. The overall insistence that language is not purely syntax-driven is extremely welcome; Jackendoff calls this the “parallel architecture”, where the parallel components in question are phonology, syntax, and semantics. This makes a great deal of sense. There are also some tantalizing hints of coming closer to how evolution could have built up our language facility—but unfortunately, they remain mere hints.

Other problems with this book include that it spends too much time on the academic politics of linguistics. Sorry, if you have real insights you don’t need to spend all your time talking about fights you and other people had. He fails the self-citation criterion, referring to his own works (including future ones) hundreds of times. His prose desperately needed an editor. And he can’t escape the linguists’ disease of trotting out example after example, without ever really figuring out what they mean.

The question of how evolution could have resulted in brain structures that support our linguistic ability is an absolutely critical one. It’s just too bad that this book doesn’t answer it.

A New Kana

Wednesday, February 2nd, 2005

I’m extremely pleased to announce the on-line availability of my important proposal for a major reform of Japanese orthography: A New Kana (PDF, 646K).

Based on a sophisticated statistical analysis of the pronunciation profile of Sino-Japanese compounds, this innovative proposal promises to dramatically simplify the Japanese writing system while preserving its spirit and uniqueness.

New Kanjis for the Rest of Us

Monday, January 24th, 2005

I’ve often thought over the years of coming up with a new ideographic written language. Now I find a man named Charles K. Bliss has already done this, creating something called Blissymbols (or “Semantography”).

One useful-looking book is Heffman’s Biosymbolics: Speaking without Speech, which talks about using Blissymbols to help handicapped children to communicate.

For more information, visit Douglas Crockford’s site (Blisssymbolics link is on the left). (You may also want to check out his amazing materials on Javascript, of which he is doubtless the most advanced practitioner in the world.)

TODO: Check out languages mentioned by Umberto Eco in his book The Search for the Perfect Language.

There have been any number of proposals for visual alphabets, some quite recent. We might cite Bliss’s Semantography, Eckhaardt’s Safo, Janson’s Picto and Ota’s Locos Yet, as Noth has observed, these are all cases of pasigraphy (which we will discuss in a later chapter) rather than true languages. Besides, they are based on natural languages. Many, moreover, are mere lexical codes without any grammatical component (p. 175).

Crockford comments that semantography (Blissymbolics) does not belong to the class of visual alphabets that Eco is dismissing.

Statistical machine translation in New Scientist

Sunday, January 23rd, 2005

New Scientist reports on statistical machine translation and the commercialization being done by Language Weaver.