Subsections

Trees

Both queues and trees are examples of recursive structures. Queues contain only one link between individual structures, trees contain at least two. Trees are another kind of linked-list and are interesting because they give more examples of how recursive procedures are used to manipulate recursively-defined data structures.

There are two principal kinds of trees in common use: B-trees and binary trees. B-trees (sometimes called balanced trees) are too advanced to be described here.

A binary tree consists of a number of forks, usually called nodes, which are linked with two links per node.

Here is an example of a small tree:

ch11-10.png

The topmost node is called the root (trees are usually depicted upside-down12.2). Each node consists of three parts: the data which each node bears and left and right references which can refer to other nodes. In the small tree shown above, there are seven nodes on five levels. There are 4 nodes on the left branch of the root and 2 on the right, so that the tree is unbalanced.

A binary tree is particularly suitable for the ordering of data: that is, for arranging data in a predefined order12.3. In the previous section, in procedure insert fan, we considered inserting a fan into a queue in ascending order of ticket number. This is an inefficient way of ordering data. For example, suppose there are 100 fans in the queue. Then, on average, we can expect to insert a fan halfway down the queue; which means 50 comparisons of ticket numbers. If the fans were stored as a balanced binary tree, the maximum number of comparisons would be only 7 (because 26 < 100 < 27). For larger numbers, the difference between the two kinds of linked-list is even more marked. For 1000 fans, a queue would need 500 comparisons on average, whereas a balanced binary tree would need 10 at most. While it is true that these figures are minima (they assume that the tree is balanced, that is, that there are as many nodes to the left of the root as to the right), nevertheless, on average, a binary tree is much more efficient than a queue for ordering data.

Here is a typical mode declaration for a binary tree:

   MODE WORD = STRUCT(STRING wd,
                      INT ct,
                      REAL fq),
        TREE = STRUCT(REF WORD w,
                      REF TREE left,right);

The mode of the data in the declaration of TREE is REF WORD so that if an item of data is moved around, it is only the reference which is moved. This is more efficient than moving the data item itself.

We shall give two example tree procedures: adding an item of data to the tree and printing the tree. We need to check whether the tree at some node is empty. For this, we use the declaration

   REF TREE leaf = NIL

Here is the procedure add word:

   PROC add word = (REF REF TREE root,
                    REF WORD w)VOID:
   IF   root IS leaf
   THEN root:=HEAP TREE:=(w,leaf,leaf)
   ELIF wd OF w < wd OF w OF root
   THEN add word(left OF root,w)
   ELIF wd OF w > wd OF w OF root
   THEN add word(right OF root,w)
   ELSE ct OF w OF root+:=1
   FI

The ordering relation in add word is the alphabetical ordering of the string in each data item. When the string in the data item to be added to the tree has been found in the tree, the occurrence number is incremented by 1 (see the ELSE clause above). Note the use of recursion.

Printing the tree follows a similar pattern, but when the “root” under consideration is a leaf, nothing happens:

   PROC print tree=(REF FILE f,
                    REF REF TREE root)VOID:
   IF   root ISNT leaf
   THEN print tree(f,left OF root);
        put(f,(wd OF w OF root,
               ct OF w OF root,
               newline));
        print tree(f,right OF root)
   FI

As you can see, recursion is vital here. Although it is true that recursion can be avoided by using a loop, recursion is better because it clarifies the logic.

The allocation and release of memory for linked-lists (including trees) are quite transparent to the program. When a tree is read, and nodes possibly deleted, all the lost memory is collected every so often by a garbage collector. You do not have to worry about the details of memory maintenance--it is all done for you by the compiler and the run-time system. If you write a program which relies heavily on global generators, then you should allocate extra memory to the heap (see the on-line information for details of how to use the Algol 68 compilation system).


Exercises

11.24
Write a program which reads a text book and creates a binary tree containing the number of occurrences of each of the letters A-Z and a-z (that is, case is significant). Print a report with the frequency of occurrence represented by a percentage of the total number of letters in the book to 2 decimal places. You should print the letters going downwards with 13 lines for each column: first the upper case letters, then the lower case. Only print lines for those letters which occur in the book (use mem channel to build the complete table in memory before printing). Ans[*]


Sian Mountbatten 2012-01-19