Lots of Theoretical Stuff

I’ve spent the last couple of days digging myself through a pile of (both print and online) literature. While originally I had intended to start out with a „simple“ kind of node/tree implementation, gradually enhancing and abstracting it while putting the focus on cost efficiency, I became aware of previous discussions on boostification of tree libraries, and now, this seems only secondary. People seem to yearn much more for a design of a degree of abstraction comparably high to the (often-quoted) BGL. Hervé Brönnimann points out a couple of aspects a unified concept should account for.

The following is supposed to distill these past days‘ reflections, and to provide a short introduction to the subject.

Grand Unified Treeory

Talking about the STL makes many people (including me) think of containers and iterators and functors and templatized algorithms and… Luckily, an important number of operations dealing with trees require actually traversal of the tree in a specified way (like preorder, postorder, inorder etc.), which goes over nodes in a well-defined, linear way. This has lead developers of existing tree implementations to provide iterators for the mentioned kinds of traversal, which I consider a reasonable approach to this problem.


This pushes abstraction quite far for tree structures as used for mapping things like filesystems, xml files etc. (sometimes called forest). These kinds of tree should offer easy ways of appending child nodes (up to a number initially unknown) to a given parent nodes, deletion etc. This suggests implementations similiar to those of STL containers like lists or vectors, where we also need to account for an initially unknown number of objects.
Technically speaking, there is a correspondence between forests and binary trees (whose nodes have two child nodes at most) by viewing the „left“ of the binary tree’s children as the first child of the forest’s node and the binary tree node’s right child as the forest node’s next sibling. This is an important property of trees; however, we should not overestimate its importance as the proven equivalence of binary trees and forests does not give us any information about its applicability in terms of efficiency depending on different implementations. For a more practical point of view, we should rather consider the range of applications for binary trees apart from their use as forest implementation; among them is the vast field of binary search trees (BSTs) used for (sortable) data allowing storage and retrieval of information (which in this case is not necessarily of a tree-like structure, of course).
For their practical importance, BSTs have been the subject of extensive research focussing on improving efficiency, yielding a number of specialisations that are mainly trying to keep the maximum distance of the leaf nodes (the ones without any further children) from the root node low, as this distance roughly translates to the maximum lookup (and insertion) time. (Called Self-balancing BSTs; among them AVL trees, B-trees, Red-Black trees and many more.)

Divide and Rule

SBBSTs like the latter are, regardless of their equivalence to forests, significantly different in their application. Importantly, while in the case of forests, we might want a new node to be created as the jth child of a given existing node, we would want our SBBST to decide for us where exactly to put a value passed to it (and therefore creating a node in the suitable position).
I believe we can therefore view SBBSTs as adapters to „simple“ binary trees just like stacks and queues are sequence adapters to containers like vectors, deques or lists in case of the STL. If we want our adapters to work as generically as the STL’s adapters do, our trees have to provide a well-defined set of types and functions that the adapters will use.
If we take a closer look at SBBSTs, we will see that most obviously functions for the insertion of a new node both as left and as right child of a given node are required; furthermore, the SBBST decides where to put a new node depending on other nodes‘ „values“ (i.e. the user-provided data that is stored in them), so we will also need functions for access of a node’s data as well as for „navigating“ to its left and right child. This is again reminiscent of iterators; but this time, iterators are not suitable, as we have two possible directions that cannot be viewed as „anti-parallel“, i.e. if we call our „navigation type“ position, then (position.go_left()).go_right()
(or even (position--)++, to illustrate this more drastically) will not be equal to position, as opposed to an (bidirectional) iterator’s desired behavior, ((iterator--)++) == iterator.

Bottom Line

These considerations are far from complete; there are for example still exotic things like ternary trees with „middle“ children and, even more importantly, a wide range of implementation possibilities for binary trees (and, consequently, forests). Relationship between classes and algorithms is still vague, but I hope and intend to clarify these things in the near future a little, with some illustrative code starting to appear within the next couple of days…

To sum up for the moment what I believe to have found:

  1. We want to re-use the STL’s principles for linear structures, read: use iterators (and related concepts) for access to forests, viewing nodes as containers of their children and iterators‘ operator* providing direct access tothe node’s value; Consequently, the objects that are to be our iterators have to provide a container’s functions (insertion, deletion) at the same time. (Cf. containerator posting)
  2. Depending on its implementation, not every forest will be able to provide relevant traversal methods of equally good time complexity. While for some important ways of traversal we will perhaps have to put up with e.g. operator++ complexity > O(1) (unlike STL container iterators), for some cases it will be better to remove the costly feature entirely, particularly if another implementation can provide it more efficiently (though possibly not offering another feature provided by the first implementation); this could lead to a set of characterisic standard implementations, much like vector and list are for STL containers.
    Note that providing binary tree access methods might be especially hard to achieve without adding otherwise unnecessary information; so not every forest will be equally easy to use as a binary tree.
  3. Binary tree nodes should be accessed via a position object providing methods as mentioned above, making their role roughly equivalent to an iterator’s.
  4. We might extend the binary concept to n-ary by introducing template goto_nth() members (and specialisations).
    Note that this is not superseded by forests as they serve another purpose, namely insertion of (almost) arbitrarily many child nodes to a node; but for e.g. ternary search trees (TSTs), it would be desirable to provide links to three child nodes serving different roles (essentially as in the case of BSTs, where the fact that we have two children relates to the fact a less-than comparison can have two different results, true or false).

3 Gedanken zu „Lots of Theoretical Stuff“

  1. Hi there!
    So, at least there is another one playing around…
    I am developping Boost.Misc for SoC, so this make as colleagues.
    Boost.Tree is a very amazing project, you are lucky to have the challenge in your hands!
    I want us (the ten boost SoC students) to be close together, we can make the proffit bigger and improve the coding experience if we know we are not working alone… Please help me to encourage the other 8 ones by joining in

    OK, once upon a time there was a tree…
    I have read your post and want to give you my personal opinion about the interface of my desired tree.
    (*) iterators are a must.
    As you state, a positional object will be the key point in defining a tree concept. This will allow us to write generic algorithms, that can be apply to very different tree classes.
    (*) I have spent some time thinking about what operators can be used to represent the moving through the tree. You said in your post that:
    position.go_left() == position–
    Think about this other approach for binary trees

    typedef binary_search_tree Tree;

    Tree tree;

    Tree::tree_iterator iter = tree.root();
    iter ++; // go right
    ++ iter; // go left
    iter– // back to the parent

    I know this can be quite strange at first but it is very clear once you get used to it, just see this:

    assert( iter == (( iter++) — ) ); // right and back
    assert( iter == ((++iter ) — ) ); // left and back

    And to support ternaries trees, what about defining something like:

    namespace child {

    struct left {};
    struct center {};
    struct right {};


    typedef ternary_tree Tree;

    Tree tree;

    Tree::tree_iterator iter = tree.root();

    assert( iter == ( ( iter += child::left ) — ) );
    assert( iter == ( ( iter += child::center ) — ) );
    assert( iter == ( ( iter += child::right ) — ) );

    Note that this has zero overhead.
    If you want the notation += child::[left/right] can be supported for binary trees too…

    More generic trees, like for example property_tree ( Have you look at this library? ) that are more like a patricie trie can be supported in the same consistent way…

    ptree tree;

    ptree::tree_iterator iter = tree.root();

    iter += „data“;
    iter += „debug“;

    OK… that is all for now…
    Good luck!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.