Archive for the ‘computing’ Category

Navigating Philip Glass\’ music

Saturday, July 12th, 2003

IBM has come out with a fabulous interface for navigating the works of Philip Glass. All his works are organized along a number of dimensions, including date, “joy”, and “intensity”. You slide the mouse along any of these to view the works meeting those criteria, and when you stop the music in question plays for you.

Computer Go Overview

Tuesday, June 10th, 2003

Bob has written a good overview of the computer go problem.

Launching Igowalker

Thursday, May 29th, 2003

Today’s the big day—launching Igowalker. We’ll see how it goes.

Reality-based computer go

Tuesday, May 27th, 2003

From my position in the armchair, meaning you should take my comments with several grains of salt, I’m still thinking that for computer go there is useful work to be done in the area of goal structures, rich intermediate abstractions, and reasoning based on these. Perhaps this is considered old-fashioned today, but I’m biased, I suppose, by my experience as a mid-level human go player; I just don’t see how else programs will get to 1-dan and beyond.

Lately, though, I’ve been thinking about a different (complementary?) approach, which I call “reality-based computer go”. This is based on a couple of research directions in other fields. One is CG. For instance, to make the latest “Matrix” sequel, they developed a new CG approach which involves “painting” or “molding” actual photographic content onto computer-generated models (see related article). (This is not really new per se—people have been “painting” clothes on models for a long time now, for instance.) The point is that compared to previous approaches, where they tried to model everything down to the hairs on somebody’s chin, now they get the hairs “for free” just by distorting a picture of a real actor’s face (with real hair) to map onto the mathematical model of the face. Voila;—much more realistic-looking results at less cost (and modeling the hairs is expensive).

A similar direction can be seen in music synthesizers (of which I know virtually nothing). It seems that the latest approach is to take actual recorded sounds and transform and blend them, instead of trying to create sounds totally from scratch mathematically. Same idea.

I’ve got a passing interest in computational linguistics, and it seems to me that the same model should be applicable there as well. Of course, people have been doing corpus-based CL for years, Statistical and corpus-based approaches do somwhat presage the “paint reality onto the model” idea, but in practice are still basically limited to post-processing (in the CG model, “smoothing”) model-based output, to creating word or phrase-level dictionaries, or dealing with local problems such as disambiguation. We have “example-based MT”, but this has not yet reached the stage of being generally applicable. It seems attractive to me to consider “painting” linguistic content onto mathematically-generated language models.

In the go area, and I realize this is abstract in the extreme, we should consider “painting” low-level go content (individual moves and sequences) onto a higher-level model-based framework. (I suppose you could make the case that this even mimics a possible human mental structure involved in playing go—a higher-level “thought”-based process and a lower-level “pattern”-based process.) Leaving aside long-term research topics like what is the higher-level framework (well, obviously it’s the goal structures and rich intermediate abstractions I mentioned above), the low-level go content to be painted onto that framework, just as in the CG case, is derived from “reality”—in this case, game collections. In the CG case, in order to be able to morph and strectch and snap the content onto the model, the photographic/reality images need to be “marked”—for example, with points giving the location of Keanu Reeve’s chin. So in the go case, we also need to develop libraries of reality-derived content with the appropriate mapping indicators that show how that content is fitted onto the model.

I don’t claim to be fully up on current research based on professional game collections (in CL terms, “corpora”), but I’d like to do a research project, or work with someone on one, which attempts to do a broad-based analysis of professional games in terms of the low-level move sequences. To do that, we need a “vocabulary” for types of moves. Then the “grammar” (allusion to CL intended) is a series of rules or empirical patterns tying together those vocabulary items. Now, instead of arbitrarily imposing our own vocabulary (“hane”, “tobi”), the initial phase of the analysis should be based on well-known cluster analysis techniques which will result in identifying the vocabulary based on co-occurence patterns. (A fascinating by-product would be if this process actually identified new groupings or types of moves not identified as such by humans yet.) One type of grammar that could then be developed from this vocabulary is an n-gram grammar; this type of approach has already found wide application in computational linguistics. A computer go engine based on this type of thinking would be more focused on sequences of moves which make sense together. At a minimum, such a low-level vocabulary and grammar could be effective in move generation, or choosing or optimizing possible moves found by “traditional” techniques.

A trivial example of this is where Black pushes and White extends. A more sophisticated example might be the case where Black commonly makes a peep on one side of a one-point jump before jumping himself on the other side.

XSLT as a full functional language

Thursday, May 8th, 2003

XSLT obviously has much in common with functional programming languages. But it’s not really functional, because you can’t pass around functions (which in the XSLT world are the things called “templates”) as first-class objects, right?

Wrong. Dimitre Novatchev has come up with the amazing hack of using namespaces as a way to identify functions. To pass a function, he passes a placeholder element from that namespace; then the function can be invoked using XSLT’s basic template matching mechanisms. In his article The Functional Programming Language XSLT —A proof through examples—he then goes on to implement major parts of a FP library using his technique. A must-read for the XSLT geek.

Reality-based MT?

Friday, April 18th, 2003

Here is an interesting article on the making of the “Matrix” sequel, focusing on the CG techniques used.

In one big fight scene, Keanu Reeves (who I occasionally see working out at the gym I go to, looking a bit smaller than he seems on the big screen) has to battle 100 clones of the bad guy. The approach they used, essentially, was to generate the basic mechanics and kinetics of the scene using traditional CG modeling algorithms (based on motion capture data), but then to actually generate the finished scene by “painting” or “molding” actual pictures of the actors faces onto the CG surfaces. This creates a much more realistic result, with much less effort, than trying to mathematically model the face in detail. We’ll call this “reality-based CG”.

This kind of thinking has been around for a while. Back in the mid-90’s, I was involved in a project to “paint” or “wrap” photographic images of actual kimonos onto models in different poses to create catalog images. Today, many on-line clothes stores use a simplified version of this technique to show potential buyers what a piece of clothing would look like when worn by someone of a particular body type.

As an example of a similar technique, the article mentions music synthesizers. Originally, the attempt was again to model the sound entirely and create it artificially—resulting, unsurprisingly, in something that sounded, well, computer-generated. The alternative approach which is now in widespread use is to record actual instrument sounds and “paint” them onto computer-generated rhythm sequences. And I think we are all familiar with the analog in the voice synthesis area, where approaches based on recorded snippets of actual human voices being woven together are supplanting the original approach of completely computer-generated speech which ends up sounding like a robot.

Existing machine translation approaches still largely take the approach of trying to mathematically model and generate everything. Statistical and corpus-based approaches do somwhat presage the “paint reality onto the model” idea, but in practice are still basically limited to post-processing (in the CG model, “smoothing”) model-based output, to creating word or phrase-level dictionaries, or dealing with local problems such as disambiguation. We have “example-based MT”, but this has not yet reached the stage of being generally applicable.

I was thinking about the implications of this sort of idea for natural language processing and machine translation—“reality-based MT”. I think you can map the CG approach used in the Matrix sequel to an MT approach quite easily. A mathematical-type grammar corresponds to the CG model. Corpuses correspond to photographic images taken of actual reality. Treebanks and other tagged databases correspond to the step taken when photographic images are mapped to models through the use of “anchor” points, designating, for example, 10 key points around Keanu Reeve’s lips. The above elements are all well-known components of present-day MT solutions. In reality-based MT, though the transformations linking one grammar, for instance that of the source language, to another, that of the target language, need to be recast as a particular type of motion capture—remember that motion capture can not only identify linear transformations such as a walking leg, but more structural transformations, such as someone doing a somersault, as well. The totally new element in reality-based MT will be the “painting” or “wrapping” process, where “reality fragments” (of the target language) are “pasted over and around” the transformed model.

Who will be the first to create a simple proof-of-concept prototype of reality-based MT?