School’s out for Summer (of Code)…

… so I guess I’ll have much more time for the project than during those past exhaustive weeks. Judging from the mailing list, there’s interest in a tree component uttered in every other post, but people haven’t commented on anything I’ve posted to the wiki yet. Okay, I’ve been promising code for quite some time now, so maybe it’s my fault and people just want to see something concrete. But alas, I hope I can realize my promise quite soon… Though making people a little more aware of my design thoughts might nourish fruitful discussion which wouldn’t be a bad thing to have…
René has set up a mailing list for related discussion and pointed me to another past mailing list thread on a proposed tree component, with even more technical remarks than the ones I’ve been aware of already plus one rank tree design posted by him. I’m going to read through his and the other apparently well received proposals tomorrow (one’s apparently offline, already e-mailed the author), and complete the section I started yesterday in the design part of the wiki…

Stress peak

So while the past weeks have already been exhausting, my stress level is about to reach its peak about now, with two exams tomorrow and on tuesday, mid-term evaluation pretty close and still no code uploaded, though I’m spending practically all the time I’m not using on university stuff on reviewing literature and online resources — and finally setting up bjam and boostbook successfully.

For now, I’ve just updated the Boost SoC wiki and the Google SoC application information page with a little abstract and a couple of links. I hope that after suriving the next two days, I’ll be finally able to commit a first stub of what is to become the tree framework, probably defining some iterator types via iterator_adaptor (with some generic traversal algorithms) around the yet-to-be-specified binary position object (should be able to propagate traits to the said iterators, work somewhat like an iterator — maybe I’ll find something in the BGL, but not enough time for a review of that atm, that’s for later) which in turn has to be implemented to cater for corresponding node implementations, still allowing also the iterator types to be specialized for more efficient traversal. This strategy is implying some kind of hierarchy which I hope is powerful enough for the kinds of trees I want this framework to account for (with some not-to-complex extension work to go, that is). It does still not account for the „container part“ of our iterators/“containerators“ which at first glance looks a little more sophisticated to me, but I hope to find a way profound enough for a first commit (and extension later on, again).
Everything should lighten a bit when some code to study is up, and I hope it’s going to work out as I imagine it to — there’d be especially some partial template specialization parts I’m a little anxious about…

Enough for now, I’d better get back to learning, still a lot of stuff I don’t have a clue about…

Lots of Theoretical Stuff

I’ve spent the last couple of days digging myself through a pile of (both print and online) literature. While originally I had intended to start out with a „simple“ kind of node/tree implementation, gradually enhancing and abstracting it while putting the focus on cost efficiency, I became aware of previous discussions on boostification of tree libraries, and now, this seems only secondary. People seem to yearn much more for a design of a degree of abstraction comparably high to the (often-quoted) BGL. Hervé Brönnimann points out a couple of aspects a unified concept should account for.

The following is supposed to distill these past days‘ reflections, and to provide a short introduction to the subject.

Grand Unified Treeory

Talking about the STL makes many people (including me) think of containers and iterators and functors and templatized algorithms and… Luckily, an important number of operations dealing with trees require actually traversal of the tree in a specified way (like preorder, postorder, inorder etc.), which goes over nodes in a well-defined, linear way. This has lead developers of existing tree implementations to provide iterators for the mentioned kinds of traversal, which I consider a reasonable approach to this problem.

Arboretum

This pushes abstraction quite far for tree structures as used for mapping things like filesystems, xml files etc. (sometimes called forest). These kinds of tree should offer easy ways of appending child nodes (up to a number initially unknown) to a given parent nodes, deletion etc. This suggests implementations similiar to those of STL containers like lists or vectors, where we also need to account for an initially unknown number of objects.
Technically speaking, there is a correspondence between forests and binary trees (whose nodes have two child nodes at most) by viewing the „left“ of the binary tree’s children as the first child of the forest’s node and the binary tree node’s right child as the forest node’s next sibling. This is an important property of trees; however, we should not overestimate its importance as the proven equivalence of binary trees and forests does not give us any information about its applicability in terms of efficiency depending on different implementations. For a more practical point of view, we should rather consider the range of applications for binary trees apart from their use as forest implementation; among them is the vast field of binary search trees (BSTs) used for (sortable) data allowing storage and retrieval of information (which in this case is not necessarily of a tree-like structure, of course).
For their practical importance, BSTs have been the subject of extensive research focussing on improving efficiency, yielding a number of specialisations that are mainly trying to keep the maximum distance of the leaf nodes (the ones without any further children) from the root node low, as this distance roughly translates to the maximum lookup (and insertion) time. (Called Self-balancing BSTs; among them AVL trees, B-trees, Red-Black trees and many more.)

Divide and Rule

SBBSTs like the latter are, regardless of their equivalence to forests, significantly different in their application. Importantly, while in the case of forests, we might want a new node to be created as the jth child of a given existing node, we would want our SBBST to decide for us where exactly to put a value passed to it (and therefore creating a node in the suitable position).
I believe we can therefore view SBBSTs as adapters to „simple“ binary trees just like stacks and queues are sequence adapters to containers like vectors, deques or lists in case of the STL. If we want our adapters to work as generically as the STL’s adapters do, our trees have to provide a well-defined set of types and functions that the adapters will use.
If we take a closer look at SBBSTs, we will see that most obviously functions for the insertion of a new node both as left and as right child of a given node are required; furthermore, the SBBST decides where to put a new node depending on other nodes‘ „values“ (i.e. the user-provided data that is stored in them), so we will also need functions for access of a node’s data as well as for „navigating“ to its left and right child. This is again reminiscent of iterators; but this time, iterators are not suitable, as we have two possible directions that cannot be viewed as „anti-parallel“, i.e. if we call our „navigation type“ position, then (position.go_left()).go_right()
(or even (position--)++, to illustrate this more drastically) will not be equal to position, as opposed to an (bidirectional) iterator’s desired behavior, ((iterator--)++) == iterator.

Bottom Line

These considerations are far from complete; there are for example still exotic things like ternary trees with „middle“ children and, even more importantly, a wide range of implementation possibilities for binary trees (and, consequently, forests). Relationship between classes and algorithms is still vague, but I hope and intend to clarify these things in the near future a little, with some illustrative code starting to appear within the next couple of days…

To sum up for the moment what I believe to have found:

  1. We want to re-use the STL’s principles for linear structures, read: use iterators (and related concepts) for access to forests, viewing nodes as containers of their children and iterators‘ operator* providing direct access tothe node’s value; Consequently, the objects that are to be our iterators have to provide a container’s functions (insertion, deletion) at the same time. (Cf. containerator posting)
  2. Depending on its implementation, not every forest will be able to provide relevant traversal methods of equally good time complexity. While for some important ways of traversal we will perhaps have to put up with e.g. operator++ complexity > O(1) (unlike STL container iterators), for some cases it will be better to remove the costly feature entirely, particularly if another implementation can provide it more efficiently (though possibly not offering another feature provided by the first implementation); this could lead to a set of characterisic standard implementations, much like vector and list are for STL containers.
    Note that providing binary tree access methods might be especially hard to achieve without adding otherwise unnecessary information; so not every forest will be equally easy to use as a binary tree.
  3. Binary tree nodes should be accessed via a position object providing methods as mentioned above, making their role roughly equivalent to an iterator’s.
  4. We might extend the binary concept to n-ary by introducing template goto_nth() members (and specialisations).
    Note that this is not superseded by forests as they serve another purpose, namely insertion of (almost) arbitrarily many child nodes to a node; but for e.g. ternary search trees (TSTs), it would be desirable to provide links to three child nodes serving different roles (essentially as in the case of BSTs, where the fact that we have two children relates to the fact a less-than comparison can have two different results, true or false).