When considering data structures and algorithms one inevitably comes across binary search trees. These are a particular type of tree consisting of nodes of which each can have at most two child nodes—a right and left node reference. Binary trees are efficient in a number of use cases but offer a particular advantage for searching and sorting.
Even though Python is one of the most popular programming languages it does not provide a Binary Search Tree (BST) class. One could arguably use the lxml
modules etree
class for such functionality—but that’s not its intended purpose. Fortunately, part of Python’s popularity has stemmed from the ease by which developers can implement custom classes.
Highlights
In this article, we will build a Binary Search Tree class in Python and the code required to traverse a tree in an in-order, pre-order, and post-order manner. We will cover the following topics in sequence:
- Develop a
Node
class to hold data and references - Develop an
Insert
method to add data - Develop the three traversal functions;
in-order
,pre-order
, andpost-order
- Creating and validating a Tree created using the
Node
class
For our basic Binary Search Tree tutorial, we will rely solely on the Node class to create our BST. More advanced implementations stand to benefit from a formal Tree class though we will not be diving that deeply into the possibilities of a BST to warrant such complexity. Before we start coding, let’s quickly consider the essential aspects of our BST.
Introduction: The Moving Pieces
Implementing a binary tree in Python requires some forethought into the different operations and data handling that will need to be done. At the most basic level, we need the following components:
- Node – A structure to encapsulate our data
- Insert – An operation to add Nodes to the Tree
That’s it! It might seem odd that there is no Tree structure mentioned here. Having a Node object which can reference a leftNode
and rightNode
gives us equivalent functionality.
Implementing a Tree object could be useful for retaining certain statistics such as overall Node count, tree height, or even tree depth. We’ll be keeping things simple for now. First, let’s create a Node class with some essential fields.
Note: Check out Visualgo.net’s Binary Tree Visualizer for an interactive intro to B trees
Step 1: The Node Class
The word Node is a convention used when referring to a unique item in a tree. In binary trees, each Node
object can have at most two references to other Nodes—a leftNode
and a rightNode
object. These are considered the child Nodes. Also, each Node
needs a way to store data. Below is an implementation of a Node class for binary trees in Python:
class Node: def __init__(self, data): self.data = data self.left = None self.right = None # Create a new Node node_one = Node(1)
In the example code above we’ve defined a Node
class that encapsulates an argument for data (which can be anything) and created two fields for referencing connected Nodes
. Note the references to left and right are None
by default and must be set explicitly after the creation of other Node
objects. Speaking of other Node
classes—let’s consider how to start building a tree of Nodes!
Step 2: The Insert Method
Our tree is going to be a nimble one—only referenced via the root Node
. There will be no Tree
class object—a decision taken here for simplicity. The insert
method will ensure added nodes are positioned in our resulting tree structure appropriately. We will forego any consideration for balancing our tree for now. We will approach inserting new Nodes as follows:
- Check there is data in the new Node;
- If that data is less than the current node’s data
- If no existing reference to a left Node, add as a new reference
- If an existing left Node reference exists, insert into that Node
- If that data is greater than the current Node’s data
- If no existing reference to a right Node, add as a new reference
- If an existing right Node reference exists, insert into that Node
It’s clear we will be comparing our Node’s data often. This highlights the importance that Node objects must be comparable to one another. This simple algorithm ensures that our Nodes form a Binary tree data structure where larger values trend towards the right and smaller values trend towards the left. Below is example code we can add to our Node class as a method to handle insertions.
def insert(self, data): # Do nothing if no data if self.data: # Where data value is less than current, branch left if data < self.data: if self.left is None: self.left = Node(data) else: self.left.insert(data) # Where data is greater than current, branch right elif data > self.data: if self.right is None: self.right = Node(data) else: self.right.insert(data)
Notice we aren’t handling the case where data from one Node is equal to that of the one into which it is being inserted. There are several approaches to handle duplicate values:
- Use a <= operator for either the right or left Node
- Use a list (or running count) for duplicates per Node
Binary search trees don’t typically accommodate duplicate values as unique Nodes. Keeping a count field per Node to account for duplicates is one approach and another more performant approach would simply be to ignore duplicates altogether. How duplicates are handled comes down to one’s intended use case.
Also, note our use of the insert()
method of existing Node
objects (e.g. self.right.insert). This illustrates the recursive nature of the insert method and also demonstrates the syntactic elegance by which a BST can be organized appropriately during the initial creation and subsequent insertion of data.
Creating the Binary Search Tree
Now that we have the core features of our Binary Search Tree complete we can begin working on an implementation for a set of test data. We’ll consider a very basic dataset from which we will generate our tree. Let us consider the following data:
# Node Values data = [13, 5, 23, 4, 6, 22, 28, 3, 3, 27, 32]
Here we have 11 values that contain two instances of 3 (a duplicate). Our final output will reflect the handling of the duplicate and the correct ordering of the data. To create a Binary Search Tree from our data we will first create a Node
with the value we choose for root. Then we will make use of the new Node
instance’s insert method to build our tree. This is done in the following example code;
# Create the root node from first data item root = Node(data.pop(0)) # Create and insert Node objects from remaining data for d in data: # Insert a new Node instance root.insert(Node(d))
This creates a root Node
from which our Binary Search Tree will originate and inserts new data accordingly to the tree structure. Running this will create our binary search tree—but we don’t have a way to visualize it or confirm the output yet. For that, we’re going to have to discuss several ways in which a Binary Tree can be traversed.
Note: The placement of particular Node values depends on the order in which data is added to the tree and which Node is selected as the root. To start a tree from a specific data item—one need only specify that item as the first Node to which all successive data items are inserted.
Binary Tree Traversals
Traversing Binary Trees relies on comparisons between left and right Node values, much like the insert method. Depending on the type of traversal we specify, the order in which we consider left, right, and root differs. Below are three common types of depth-first binary tree traversal algorithms and the order in which they check for data at each Node.
- In Order: Left, Root, Right
- Pre-Order: Root, Left, Right
- Post-Order: Left, Right, Root
The order in which values are checked greatly influences the order in which output is produced. The image below illustrates the process of navigation for each of these three traversal types. breadth-first and level-order traversals
Now that we have a good idea of what’s going on, we can begin implementing each traversal type. We will approach this by adding methods to our existing Node class. We’ll start with the In Order traversal type:
In-Order Traversal
This traversal type outputs the Left Node, the Root Node, and then the Right Node. As illustrated above, if the Left Node contains data, the algorithm will recursively output data until a branch has no further depth. This can be implemented as follows:
def in_order(self, root: "Node") -> List[Any]: """ Traverse the tree in an 'in order' fashion """ # Create and output container output = [] # Starting at the specified Node, # traverse in Left, Root, Right fashion if root: # Check Left output = self.in_order(root.left) # Check Root output.append(root.data) # Check Right, add output to current output output = output + self.in_order(root.right) # Returns the list of Nodes in-order return output
If we call the in_order method on our tree using root.in_order(root)
then we get the following output:
[3, 4, 5, 6, 13, 22, 23, 27, 28, 32]
Pre-Order Traversal
This traversal type outputs the Root Node, Left Node, then Right Node. As illustrated above, the Root Node output is collected first, then the algorithm will recurse down the left then right Nodes if present. The following code implements the pre_order method:
def pre_order(self, root: "Node") -> List[Any]: """ Traverse the tree in a 'pre-order' fashion """ # Create and output container output = [] # Starting at the specified Node, # traverse in Root, Left, Right fashion if root: # Get Root Node Data output.append(root.data) # Get the Left Node Data and append to current output = output + self.pre_order(root.left) # Get the Right node data and append to current output = output + self.pre_order(root.right) # Returns the list of Nodes in pre-order return output
If we call the pre_order
method on our tree using root.pre_order(root)
then we get the following output:
[13, 5, 4, 3, 6, 23, 22, 28, 27, 32]
Post-Order Traversal
This traversal type outputs the Left Node, Right Node, then Root Node. As illustrated above, the Left Node output is collected first, then the algorithm will recurse down the left then right Nodes if present, finally collecting the Root node data. The following code implements the post_order
method:
def post_order(self, root: "Node") -> List[Any]: """ Traverse the tree in a post-order fashion """ # Create an output container output = [] # Starting at the specified node, # Traverse in a left, right, root fashion if root: # Get left node data output = self.post_order(root.left) # Get right node data output = output + self.post_order(root.right) # Get root data output.append(root.data) # Returns a list of node data in pre-order return output
If we call the post_order method on our tree using the root.post_order(root) syntax we will get the following output:
[3, 4, 6, 5, 22, 27, 32, 28, 23, 13]
Comparing Traversal Types
Having noted the different traversal types it’s clear the output of a Tree is dependent on how one chooses to navigate through the data structure. For a clearer comparison, let’s consider each of our resulting output collections together:
# In Order [3, 4, 5, 6, 13, 22, 23, 27, 28, 32] # Pre Order [13, 5, 4, 3, 6, 23, 22, 28, 27, 32] # Post Order [3, 4, 6, 5, 22, 27, 32, 28, 23, 13]
The in_order and post_order seem to produce the most similar output while the pre_order most clearly reflects the root of our tree—with 13 being the first element of both input and output.
These methods can also be used to print trees as well as search them—though without much style. So which way is the best? That depends on the use case and expected outcome. Generally speaking, the traversal types listed here can be beneficial when applied in the following cases:
- In-Order: When one wants the data of a tree listed in an orderly manner
- Pre-Order: Used for prefix expressions (see Polish Notation)
- Post-Order: Used for deletions and freeing of trees
Tree structures are used for such a wide range of applications within computer science it’s hard to provide a concise summary. They are used for searching, ordering, and even the fundamental construction of languages. The three mentioned here are the most common depth-first search algorithms used in binary search tree traversals.
Note: The Node class along with its methods is available as a single class from Github.
Review
Implementing a binary search tree in Python is relatively straightforward. We’ve seen how a simple Node
class can be used to provide a robust framework for organizing and accessing data. The three traversal types implemented here are certainly not the only ones available but certainly the most common.
It’s important to note that Binary Search Trees are a single family of Tree data structures that offer certain performance advantages in specific scenarios. Trees are common structures for searching, partitioning, heaping, and syntactic parsing.
Binary Search Trees are particularly useful for sorting and searching—strongly hinted by the name. Even though Python’s standard library doesn’t contain a premade solution, we’ve seen here that to create a binary tree with Python one needs little more than a single custom class with a few helpers methods!
References
- Cormen, Thomas. Introduction to Algorithms, 3rd Edition (The MIT Press). 3rd ed., MIT Press, 2009.