Archive for the ‘language’ Category

William Safire’s muddled orthographical Esperanto

Wednesday, April 6th, 2005

Transliteration refers to conversion between phonetic alphabets. In Japanese I write ��, and I can transliterate this into the Roman alphabet as hana. In English I write cherry, and I can transliterate this into the Japanese katakana syllabary as �ェリー.

Transliteration tries to accomplish two quite different things. The first is to write a word in another alphabet so that when it is pronounced according to the rules of the language using that alphabet, it sounds as much as possible “like” the original word in its original language. The second is to provide a unique, bidirectional orthographical mapping from one alphabet (that I don’t know) to another (that I do). Among other reasons, when entering text into a computer this lets me use a keyboard mapping I am more familiar with. Even many Japanese prefer to input Japanese content using the Roman alphabet keyboard mapping.

(There is a third, less important goal in some transliteration systems: to reproduce structural aspects of the original alphabet. The example I’m familiar with is Japanese. The syllabary is organized into rows (vowels) and columns (consonants). The “ha” column contains ha, hi, hu, he, and ho. The “hu” sound is perceived by most English speakers as being closer to “fu”. Thus, a transliteration system which emphasized phonological fidelity would represent this syllable as “fu”, whereas one emphasizing source-alphabet structural integrity would represent it as “hu”. Does this problem exist in other transliteration systems?)

The above is just a basic introduction to transliteration; another is at Wikipedia. What motivated me to post about the topic is the horribly garbled discussion that recently appeared in William Safire’s column in the New York Times, our national newspaper of record.

Safire starts off on the wrong foot, revealing a weak understanding of the distinction between orthography and phonology, making absurd statements such as “The closest I can get in Roman spelling [he means English spelling] to the sound of [Putin’s] name is…”. He then lapses into bemusement at the fact that if for some unknown reason the French were to use the English-style transliteration of Russian President Putin’s name, it would come out sounding like the French word for “prostitute”, and so gee, that must be why they adopted their own weird transliteration. How confused this all is is analyzed in detail by our friends at Blogos.

In a follow-up article, Blogos expresses shock that “there are still people out there, writing columns in some of the most influential newspapers in the world, who think that computers and the Internet can only work with roman alphabets”.

But that’s not exactly where our famed pundit is confused, if you read his closing paragraph closely:

Here’s the problem for globocrats: most computer operating systems are based on the Roman alphabet, Maybe the United Nations will find a new raison d’etre (that’s ray-ZON DET-ra) in standardizing a system to encode Roman and Cyrillic letters and Chinese and Japanese characters to make them computer-friendly on all the world’s screens.

Now he’s started talking about “encodings”, something else he plainly does not understand. It turns out there is a widely-implemented encoding making all the world’s characters “computer-friendly”, called Unicode. Clearly Safire has no idea what is going on in multilingual computing, and one must certainly question his judgment in writing such nonsense in a national newspaper without a minute’s worth of checking. The problem is not that these characters cannot be displayed on “all the world’s screens”, since they can; it’s that, once displayed, they still cannot be read by people that don’t know the alphabets. He continues:

…For users of tomorrow’s Internet to accurately cross cultures, experts in phonetics and transliteration will first have to create and agree on a standard system.

Ignoring the fact that Safire now is confusing “cultures” with languages and writing systems, he’s apparently saying that there could be, or should be, some type of orthographical Esperanto that would magically meet the two conflicting objectives of transliteration systems: to be faithful to the original orthography while also being pronounced by native speakers of any world language, according to their language’s phonological rules, in a way which is close to the phonology of the original word. Sorry, all the “transliterati” in the world won’t be able to pull off that trick.

Only then will President Poutine get his real name back.

Bill, he doesn’t need his name back, he never lost it. It’s a Russian name written in Cyrillic. The French didn’t “take it away”, they just tried to write it in their alphabet so people can read it.

So much for our reigning language maven, of whom, I should add, I am a great fan and faithful reader of his column.

All the World’s a Stage, in Japanese

Wednesday, March 23rd, 2005

We recently went to see As You Like It at the Ahmanson Theater. I’m not a theater critic, so I’ll limit my comments to noting that Rebecca Hall, who played Rosalind, should get out of Shakespeare’s way. We don’t really need every single phrase to be accompanied by giggles, sighs, extraneous eye movements, pauses, hand motions, and pseudo-dramatic twirls.

What I want to write about is the Japanese translation of Jaques’ famous “All the World’s a Stage” soliloquy.

All the world’s a stage,
And all the men and women merely players;
They have their exits and their entrances,
And one man in his time plays many parts,
His acts being seven ages.

(By the way, this speech later contains the first recorded usage of the word “puke” in the meaning of “vomit”.)

The Japanese translation we got our hands on, by Fukuda Tsuneari, goes like this in romaji:

Zen-sekai ga hitotsu no butai, soko-dewa danjo wo towanu, ningen wa subete yakusha ni suginai.
全世界がひとつの舞台、そこでは男女を問わぬ、人間はすべて役者に過ぎない。

It’s amazing, although somehow not surprising, that a famous Shakespeare scholar could do such a bad job translating this passage. Given its visibility, it seems he could have spent at least a little more time on it. Here’s how I translate his Japanese back into English (a dangerous endeavor, as I am well aware, but sometimes inevitable):

The world in its entirety is one stage.
There, whether man or woman, all humans are nothing more than actors.

Our professor has managed to pack an astonishing number of bad translation decisions into such a short sentence. Here’s just a few:

  • “world” should not be “sekai”, which is a Sino-Japanese compound with nuances of “world of nations”; much better is the native Japanese word “yo”, a common word indicating the world around us
  • “all” of “all the world” is translated by placing the Sino-Japanese prefix “zen” in front of “sekai”, again yielding a non-colloquial, stiff result, but more importantly, the implication is of complete geographical coverage, rather than “all aspects” as Shakespeare presumably intended. The Japanese “issai” captures the correct meaning of “all” perfectly
  • whereas Shakespeare uses “men and women” just to indicate all the people in the world, perhaps liking the phrase’s meter, Fukuda reads too much into this and inserts the unwieldy “whether man or woman” into his translation
  • Fukuda translates the article “a” in “a stage” as “one, single”, although Shakespeare is certainly not emphasizing the singleness of the stage
  • after having gummed up his translation with “whether man or woman”, Fukuda ends up needing another word to serve as the subject of the next phrase, and goes with “ningen” (“human”), again too stiff, compared to the colloquial “hitobito” (“people”)

Here is Bob’s translation:

Butai da yo, kono yo wa issai. Hitobito mo mina, tan-naru yakusha.
舞台だよ、この世は一切。人々も皆、単なる役者。

A quantitative metric we can apply to comparing my translation with Fukuda’s is Bob’s Rule of Comparative Length, which states that bad translations are longer. Good editing, then, will tend to reduce the length of the translated text. In this case, the original English is 51 Roman characters; Fukuda’s translation 77; and mine a close match at 50.

Neuroconservatism, the latest neuroword

Wednesday, March 23rd, 2005

In its most recent issue, Fortune magazine coined the word “neuroconservatism”. The image is of conservative policies backed up by, or possibly tweaked to take into account, neuroscientific insights.

Example: A “pure”, libertarian-oriented conservative might like to offer dozens or hundreds of private plans to replace Social Security, but neurosicence tells us that people’s brains aren’t “wired” to deal with having so much choice, so they may end up choosing poorly or not at all. Neuroconservative solution: Give them fewer choices, or at least give them an intelligent default in line with good public policy.

At the moment, this word get zero Google hits.

Statistical machine translation in New Scientist

Wednesday, February 23rd, 2005

New Scientist reports on statistical machine translation and the commercialization being done by Language Weaver.

Foundations of Language: Brain, Meaning, Grammar, Evolution

Saturday, February 12th, 2005

Foundations of Language: Brain, Meaning, Grammar, Evolution is Ray Jackendoff’s new book which tries to build a bridge between traditional linguistics, neuroscience, and evolution.

But after slogging through more than 400 pages, I was dismayed to find in his Concluding Remarks that all he himself claims have accomplished in the book was to “sharpen some questions.” I read the book to get answers to the questions—about, for example, how syntactic categories are instantiated in the nervous system—not to get them “sharpened.”

One particular annoying thing about the book is Jackendoff’s use of the prefix “f-”, as in f-knowledge or f-mind, to refer to some magic stratum between body and the regular non-f-mind. He integrates the body and mind, in other words, by inventing an imaginary layer where they are integrated.

There are gems of insight in this book. The overall insistence that language is not purely syntax-driven is extremely welcome; Jackendoff calls this the “parallel architecture”, where the parallel components in question are phonology, syntax, and semantics. This makes a great deal of sense. There are also some tantalizing hints of coming closer to how evolution could have built up our language facility—but unfortunately, they remain mere hints.

Other problems with this book include that it spends too much time on the academic politics of linguistics. Sorry, if you have real insights you don’t need to spend all your time talking about fights you and other people had. He fails the self-citation criterion, referring to his own works (including future ones) hundreds of times. His prose desperately needed an editor. And he can’t escape the linguists’ disease of trotting out example after example, without ever really figuring out what they mean.

The question of how evolution could have resulted in brain structures that support our linguistic ability is an absolutely critical one. It’s just too bad that this book doesn’t answer it.

Kanji Topology

Wednesday, February 2nd, 2005

Every Westerner exposed to Kanjis immediately senses their topological nature. But this inherent aspect of Kanjis is still not reflected in any fontographical computing model. Bob has now put on-line his unique, if dated, survey of research into models of Kanji topology (PDF, 612K).

A New Kana

Wednesday, February 2nd, 2005

I’m extremely pleased to announce the on-line availability of my important proposal for a major reform of Japanese orthography: A New Kana (PDF, 646K).

Based on a sophisticated statistical analysis of the pronunciation profile of Sino-Japanese compounds, this innovative proposal promises to dramatically simplify the Japanese writing system while preserving its spirit and uniqueness.

New Kanjis for the Rest of Us

Monday, January 24th, 2005

I’ve often thought over the years of coming up with a new ideographic written language. Now I find a man named Charles K. Bliss has already done this, creating something called Blissymbols (or “Semantography”).

One useful-looking book is Heffman’s Biosymbolics: Speaking without Speech, which talks about using Blissymbols to help handicapped children to communicate.

For more information, visit Douglas Crockford’s site (Blisssymbolics link is on the left). (You may also want to check out his amazing materials on Javascript, of which he is doubtless the most advanced practitioner in the world.)

TODO: Check out languages mentioned by Umberto Eco in his book The Search for the Perfect Language.

There have been any number of proposals for visual alphabets, some quite recent. We might cite Bliss’s Semantography, Eckhaardt’s Safo, Janson’s Picto and Ota’s Locos Yet, as Noth has observed, these are all cases of pasigraphy (which we will discuss in a later chapter) rather than true languages. Besides, they are based on natural languages. Many, moreover, are mere lexical codes without any grammatical component (p. 175).

Crockford comments that semantography (Blissymbolics) does not belong to the class of visual alphabets that Eco is dismissing.

Performant

Saturday, October 9th, 2004

Is “performant” a word? I came across it in an article about Microsoft Visual Studio .NET 2005:

C++ is the easiest language to use for native interop and is often the most performant.

Stamping out the loan-word disease in Japanese

Tuesday, June 29th, 2004

The “Foreign Loan-word Committee” has issued recommendations for replacing 33 common katakana-isms with “native” Japanese.

Thank God they backed off on some of their worst proposals, like replacing “online” with “kaisen-setsuzoku”.

Of their new proposals, I especially like “setsumei sekinin” for “accountability”. In other words, the Japanese view accountability as the question of who has to explain something.

A lot of the proposed replacements are to just use the obvious Japanese, such as “dougu” for “tool”. Ditto for replacing “stance” with “tachiba”, or “conference” with “kaigi”.

But that begs the question: why did people start using “tool” in the first place, when they already had “dougu”? That’s a critical question of linguistic philosophy which the grayhairs on the committee didn’t even try to answer. I know the answer. The centuries-old Chinese compounds have been rounded and smoothed like rocks in a river-bed by the forces of linguistic nature over time. The English words are young, agile, opinionated, angular, with a personality (make that PA-SONARITI). In that sense, they have a different semantic profile. Simply put, they mean something different. That’s why people started to use them and will continue to use them.

But what’s really weird is that what they’re proposing to replace the 30-year-old borrowings with are themselves borrowings into Japanese, just much older ones!