Courses/CS 2124/Lab Manual/The Binary Search Tree and Recursion

Introduction

This lab uses dynamic memory to implement the binary search tree data structure, which has the defining property that every CSZNode to the left of a given CSZNode is “smaller” and everything to the right of that same CSZNode is “larger”.

Characteristics of the Binary Search Tree

Under ideal circumstances, a binary search tree can provide rapid ( ${\textstyle \log_{2} n}$) storage and retrieval of information because every comparison can eliminate half of the elements remaining to be searched; in the worst case, this process continues until only one item is left to examine. Let ${\textstyle n}$ be the size of the original data set and let ${\textstyle m}$ be the number of times half are eliminated. ${\textstyle n / \underbrace{2 / 2 / 2 ... / 2}_{m}}$ ${\textstyle =}$ ${\textstyle 1}$ ${\textstyle \frac{n}{2^{m}}}$ ${\textstyle =}$ ${\textstyle 1}$ ${\textstyle 2^{m}}$ ${\textstyle =}$ ${\textstyle n}$ ${\textstyle \log_{2} 2^{m}}$ ${\textstyle =}$ ${\textstyle \log_{2} n}$ ${\textstyle m \log_{2} 2}$ ${\textstyle =}$ ${\textstyle \log_{2} n}$ ${\textstyle m}$ ${\textstyle =}$ ${\textstyle \log_{2} n}$

Iterative and recursive techniques for creation and traversal of the tree will be examined. As in the list and stack labs, the definition of the building block (CSZNode) used in the larger data structure is the starting point.

The Binary Search Tree CSZNode Class

A binary search tree is to be constructed for the storage of city names, states, and ZIP code numbers; see Figure 1 for the arrangement of seven CSZNodes (with state and ZIP codes omitted).

The objects themselves will be represented by the CityStateZip class defined in the header file “CityStateZip.h” provided as a resource for this assignment. Place a copy of the header file in your project directory and take a moment to look over the functionality of CityStateZip.

Jonesboro
/         \
/                 \
/          \              /       \
/            \            /         \
Charlotte    Jacksonville   Lubbock   San Diego

Fig.1 Example of a Binary Search Tree

A CSZNode in a tree differs from a CSZNode in a singly-linked list or stack in that there are two outgoing links. Unlike a doubly-linked list, these links don’t move “forward” and “backward”, but instead move “down” the hierarchy in the “left” and “right” directions. Therefore two pointers will be necessary; these pointers are typically referred to as left and right, as these labels reflect the depiction of the pointers in a diagram of the tree. Consider the following class definition for a CSZNode in a binary search tree.

class CSZNode {
public:
CSZNode( const CityStateZip& data ) : payload{data} {};
void         set_left( CSZNode* new_left );
void         set_right( CSZNode* new_right );
CityStateZip get_data( ) const;
CSZNode*     get_left( ) const;
CSZNode*     get_right( ) const;

void write( std::ostream& fout ) const;

private:
CSZNode*     left  = nullptr;
CSZNode*     right = nullptr;
};

std::ostream& operator<<( std::ostream& fout, const CSZNode& n );

Create a header file and implementation file pair "CSZNode.h and CSZNode.cpp" now, using the interface shown above.

The constructor accepts a CityStateZip and sinks it into the payload attribute. The inline implementation for the constructor is shown above.

As before, the set and get methods are access functions for the private data stored in the pointers. Create an implementation (“.cpp”) file and write these methods now.

The function for operator << calls method write, which should write the city, state, and ZIP using the functionality already defined in CityStateZip. Write this implementation now.

Create the file “city_list.txt”" so that it contains the following data.

Jonesboro,Arkansas,72401
Fort Wayne,Indiana,46774
Charlotte,North Carolina,28078
Jacksonville,Florida,32073
Lubbock,Texas,79382
San Diego,California,91902

Be sure that the cursor will not move past the last character on the line; otherwise, the end-of-file function eof may not behave as desired.

Start function main with the following lines:

std::ifstream fin{"city_list.txt"};
if ( !fin ) {
cout << "Error opening city_list.txt!\n";
exit( 1 );
}

cout << "test CSZNodes:\n";
cout << n1 << endl;
cout << n2 << endl;
cout << endl;

CityStateZip read_CityStateZip( std::istream& fin ) {
std::string  city, state;
unsigned int zip = 0;
getline( fin, city, ',' );
getline( fin, state, ',' );
fin >> zip;
fin.ignore( std::numeric_limits<std::streamsize>::max( ), '\n' );
return CityStateZip{city, state, zip}; // creates a temporary CityStateZip object and returns it
}

NOTE: You need to include <limits> for the numeric_limits object to be defined.

Execute the program to see that all CSZNode methods and function are working properly. FOR IN-LAB CREDIT: Show your progress here before moving on.

The Binary Search Tree Class

For data, the binary search tree class contains a root pointer, which serves a purpose similar to that of the head pointer of the singly-linked list or the top pointer of the stack. Consider the following class definition for a binary search tree, and begin your implementation of a library "BinarySearchTree.h" / "BinarySearchTree.cpp" using it as a guide:

class BinarySearchTree {
public:
BinarySearchTree( ) = default;
// ~BinarySearchTree( );  // TODO: un-comment and implement when instructed.
void add_recursively( const CityStateZip& new_city );
void add_iteratively( const CityStateZip& new_city );
void write_recursively( std::ostream& strm ) const;
void write_iteratively( std::ostream& strm ) const;

private:
// methods
void add_recursively     // called by public version
( CSZNode* new_node, CSZNode* current_root );
void write_recursively   // called by public version
( std::ostream& strm, CSZNode* current_root ) const;

// attributes
CSZNode* root = nullptr; // initially empty tree (null root)

// disallow copy-ctor and assignment:
BinarySearchTree( const BinarySearchTree& ) = delete;
BinarySearchTree& operator=( const BinarySearchTree& rhs ) = delete;
};

Note that there are three methods each associated with adding and writing. As discussed in class, trees can be manipulated either iteratively or recursively. Methods for each technique will be developed here. The public methods add_recursively and write_recursively are one-line functions which call the corresponding private methods to start their recursive solutions. More will be said about this shortly.

The destructor will be required to deallocate the CSZNodes that have been added to the tree — we will delay writing the destructor until later (it requires knowledge of tree traversals that we will develop along the way). For now this means that your tree will leak memory. You will define the destructor in the homework assignment.

Adding to the Binary Search Tree Iteratively

As each CSZNode in a binary search tree has two pointers, there must be a means of determining which of the two pointers to follow when storing and retrieving information from the data structure. One piece of data in each CSZNode must be unique; this piece is the key for the data. Keys are compared to decide whether to follow the left pointer or right pointer. Typically, the left pointer is followed when the key being compared against a tree CSZNode is less than the CSZNode’s key, the right otherwise. Consider the addition of a new CSZNode to the tree in Figure 2.

root
\
new_node--->Hays          Livingston
/          \
Green          Smith
/   \          /    \
Brown   Ivy    Owens    Wills

Fig. 2 Addition to a Binary Search Tree

The keys used in this example are names; in practice, names are often duplicated but that is ignored for the sake of the example. Additions to the tree can be accomplished iteratively or recursively. The method for adding iteratively is perhaps more intuitive and so is considered first here.

The first action that add_iteratively should take is to (dynamically) construct a new CSZNode containing the data new_city. This new CSZNode will be referred to as new_node below.

In order to compare the name stored in the new addition to names in CSZNodes already in the tree more conveniently, the < operator should be overloaded for the CSZNode class to perform a string compare on the city names of both object operands; write this method now.

bool CSZNode::operator < (const CSZNode& op2) const;

Returning to the example with names above, the iterative addition of Hays to the binary search tree must begin at the root, as there is no other entry into the tree. Comparison of Hays with Livingston indicates Hays should be stored somewhere to the left of Livingston. Moving down the tree, Hays is compared with Green; in this case, Hays should be stored somewhere to the right of Green. Moving down the tree once more, Hays is compared with Ivy and should be stored to the left this time. Since there is nothing to the left of Ivy, Hays can be stored here. A series of nested if statements can store any name in the tree of Figure 2.

if (*new_node < // "Livingston"
if (*new_node < // "Green"
if (*new_node < // "Brown"
// store *new_node to the left of "Brown"
else // *new_node > "Brown"
// store *new_node to the right of "Brown"
else // *new_node > "Green"
if (*new_node < // "Ivy"
// store *new_node to the left of "Ivy"
else // *new_node > "Ivy"
// store *new_node to the right of "Ivy"
else // *new_node > "Livingston"
if (*new_node < // "Smith"
if (*new_node < // "owens"
// store *new_node to the left of "Owens"
else // *new_node > "Owens"
// store *new_node to the right of "Owens"
else // *new_node > "Smith"
if (*new_node < // "Wills"
// store *new_node to the left of "Wills"
else // *new_node > "Wills"
// store *new_node to the right of "Wills"

Of course, this algorithm can only be guaranteed to work for the addition of just one more name to a tree of the exact shape of that in Figure 2. The algorithm can be generalized somewhat by introducing the pointer current to keep track of which node the new addition is being compared against.

NameExampleNode* current = root;
if (*new_node < *current) {             // "Livingston"
current = current->get_left();
if (*new_node < *current) {         // "Green"
current = current->get_left();
if (*new_node < *current)       // "Brown"
// store *new_node to the left of "Brown"
else // *new_node > *current    // "Brown"
// store *new_node to the right of "Brown"
}
else // *new_node > "Green"  {
current = current->get_right();
if (*new_node < *current)       // "Ivy"
// store *new_node to the left of "Ivy"
else // *new_node > *current    // "Ivy"
// store *new_node to the right of "Ivy"
}  // end else *new_node > "Green"
}
else ...

The next value assigned to current is based on the comparison of the new addition with current’s node. This can be expressed in a loop to carry out the comparison and reassignment to an arbitrary depth in a tree.

NameExampleNode* current = root;
if (*new_node < *current)
current = current->get_left();
else // *new_node > * current
current = current->get_right();

Observe that current takes on a nullptr pointer value from the the node which it should either precede or follow; this is a problem in that current is of no use in performing the actual addition to the tree. The addition must take place before the nullptr reassignment of current.

NameExampleNode* current  = root;
if (*new_node < *current)
if (current->get_left() == nullptr) {
current->set_left(new_node);
}
else  // left branch is not nullptr
current = current->get_left();
else // *new_node > *current
if (current->get_right() == nullptr) {
current->set_right(new_node);
}
else  // right branch is not nullptr
current = current->get_right();

For the sake of efficiency, the comparison for a nullptr value can be avoided each time through the loop by saving current’s value before it becomes nullptr and then re-testing the last meaningful value of current after the location has been found.

NameExampleNode* current  = root;
NameExampleNode* previous = nullptr;
while (current != nullptr) {
previous = current;
if (*new_node < *current)
current = current->get_left();
else  // *new_node > *current
current = current->get_right();
}
if (*new_node < *previous)
previous->set_left (new_node);
else  // *new_node > *previous
previous->set_right (new_node);

While both of these algorithms will work in the case of a tree with at least one node, each must be adjusted to add a node to an empty tree. In the former, the stated algorithm is bypassed entirely for an else clause which will store the new addition in the root. In the latter, the loop will never be entered so previous will still be nullptr; checking for this condition immediately after the loop will suffice.

Using the “names” tree example shown above as a guide, implement the add_iteratively() method for your BinarySearchTree so that you can add a new CityStateZip object in the correct location.

For testing, add the following code to your main program, following the testing code you added previously.

cout << "Press <enter> to continue...\n";
cin.get();
BinarySearchTree city_tree;

fin.clear();   // restore stream state so I/O may proceed
fin.seekg(0);  // seek "get" to file start (byte #0)

while (fin.good())

Compile and test to make sure your code is able to run correctly (you won't really know for sure if it is correct until you can see the values in the tree, but you will do that in the next section). If you get any obvious errors, fix those before moving on. FOR IN-LAB CREDIT: Show your progress here before moving on.

Displaying the Binary Search Tree Recursively

In order to see the results of additions to the tree, there must be a method to display the contents of the tree. This is somewhat complicated to do iteratively, so a simpler recursive approach is examined first. Consider again the “names” tree example Figure 2. The observation can be made that the root node (Livingston) will not to be displayed until all nodes connected to its left pointer (Brown, Green and Ivy) have been written; further, the root node must be displayed before any of the nodes connected to its right pointer (Owens, Smith and Wills) are written.

The essence of recursion lies in the realization that this same observation can in fact be made about every node in the tree. For example, Green is written after Brown and before Ivy. This also includes the leaves of the tree; their situation is trivial in that there are no nodes connected to either of their pointers, so they can be written when the time comes without regard for anything else in the tree.

These observations can be implemented in program code with two write_recursively methods, one public and one private. The public method invokes the private method, passing it the tree’s root pointer. The private method of course has access to the root, but it views its parameter as a current root (or, the root of the current sub-tree).

In the example, the private method is initially invoked by the public one with Livingston as its current root. The body of the private method essentially consists of three steps: a recursive call to itself with the current root’s left pointer (Green), a display of the current pointer’s own data node (Livingston), and a second recursive call to itself with the current root’s right pointer (Smith). The first step’s recursive call immediately sets about repeating the three steps with Brown, Green and Ivy, before the second step’s display of Livingston.

The three steps must be protected by an if control structure which insures that the current root is not nullptr; this nullptr value will be encountered as recursive calls for leaves of the tree pass null pointers to the next recursion of the method. This occurs when the three steps are again started for Brown, as both the left and right pointers of this node are null. (Encountering nullptr is the base condition for the recursive function.)

Implement your write_recursively() methods for displaying the binary search tree now.

Note that as the binary search tree class definition includes two write methods, operator << will not be overloaded; to do so would be confusing during the testing phase of the program, as there would be no indication in function main as to which write algorithm was being executed.

Add the following lines to function main() following the while loop you added above; execute the program to see that the binary search tree’s methods are working properly.

cout << "Recursive Tree Listing of Iterative Additions\n";
city_tree.write_recursively(cout);

Modify the sample data to test the program thoroughly. FOR IN-LAB CREDIT: Show your progress here before moving on.