Representing branching sequences in XML

Branching sequences are common in real life. For instance, a recipe can be represented as a sequence of steps, with branches corresponding to variations. Games of go or chess , of course, are the classic example of branching sequences, where branches handle the “could have/should have played there” comments. Branching sequences can even be used to handle linear text , with branches used for optional or alternate material.

If we have complete control over the programming environment we can implement branching sequences in any way we want, most of them quite obvious. But in today’s web-based world, there are good reasons to represent such structures using XML (for transformations, interoperability, or even storage in XML databases) and HTML (for display). What is the best way to do so?

It seems obvious that the sequential aspect of branching sequences should be represented by a sequence of XML siblings:

<elt>step 1-1</elt>
<elt>step 1-2</elt>
<elt>step 1-3</elt>

So how do we handle the branching aspect? We have only one more dimension to work with, going down in the child dimension. So we might try:

<elt>step 1-1</elt>
<elt>step 1-2
<elt>step 2-2</elt>
</elt>
<elt>step 1-3</elt>

The problem with this is that node 2-2 comes after 1-2 in traversal order. Put another way, an element that is not in branch 2 comes in between step 1-1 and its follower in branch 2, which is 2-2. We could try to solve this problem by placing the branch as the child of the preceding node:

<elt>step 1-1</elt>
<elt>step 2-2</elt>
</elt>
<elt>step 1-2
<elt>step 1-3</elt>

But now we have the odd asymmetry that the successor of step 1-1 in branch 1 is found as its sibling, whereas its successor in branch 2 is found as its child. This is going to result in more complicated logic than necessary to walk, subset, or transform the branching sequence.

Another alternative is to explicitly represent branches and/or sequences with special tags such as <branch> or <seq>. But this will have the effect of muddying our data representation by mixing structural and content tags.

The solution is surprising and elegant: use the parent-child dimension of XML data structure to represent sequences, and the sibling dimension to represent branches. The “next” element in the sequence is always the child of the preceding one. The “next” alternative is always the following sibling of the preceding one. All alternative steps are cleanly represented as a group of children of the node corresponding to the preceding step:

<elt>step 1-1
<elt>step 1-2
<elt>step 1-3</elt>
</elt>
<elt>step 2-2</elt>
</elt>

This structure has a number of satisfying properties. First, any sequence on any branch is represented by one unique path of parent-child relationships; in XPath terms, the sequence ending at element END is precisely $END/ancestor::elt. Second, transformations using systems such as XSLT are extremely clean, as can be seen in the following program to extract only the main branch:


<xsl:template mode="copy" match="elt[position() > 0]"/>

Third, the structure works well with CSS, where properties are inherited along the parent-child axis. For instance, to terminate the display of the sequence at elt X (display only up to element X) requires nothing more than setting the class of that element to LEAF:

.LEAF elt { display: none; }

This approach might strike some as counter-intuitive, since a sequence of any meaningful length will result in a hierarchy as many steps deep as the sequence is long. But that should not pose a problem for any but the most poorly constructed processing agents.

We’ll call this design pattern SCAS, for “successor as child, alternative as sibling.”

Leave a Reply