Binary Trees in Python: Powerful Data Structures for Sorting & Searching

Binary search trees are powerful data structures that can make searching, sorting, and maintaining data a breeze. Python doesn't include a BST class by default—but allows developers to implement a custom one with ease!
python binary search tree alpharithms

When considering data structures and algorithms one inevitably comes across binary search trees. These are a particular type of tree consisting of nodes of which each can have at most two child nodes—a right and left node reference. Binary trees are efficient in a number of use cases but offer a particular advantage for searching and sorting.

Even though Python is one of the most popular programming languages it does not provide a Binary Search Tree (BST) class. One could arguably use the lxml modules etree class for such functionality—but that’s not its intended purpose. Fortunately, part of Python’s popularity has stemmed from the ease by which developers can implement custom classes.

Highlights

In this article, we will build a Binary Search Tree class in Python and the code required to traverse a tree in an in-order, pre-order, and post-order manner. We will cover the following topics in sequence:

  1. Develop a Node class to hold data and references
  2. Develop an Insert method to add data
  3. Develop the three traversal functions; in-order, pre-order, and post-order
  4. Creating and validating a Tree created using the Node class

For our basic Binary Search Tree tutorial, we will rely solely on the Node class to create our BST. More advanced implementations stand to benefit from a formal Tree class though we will not be diving that deeply into the possibilities of a BST to warrant such complexity. Before we start coding, let’s quickly consider the essential aspects of our BST.

Introduction: The Moving Pieces

Implementing a binary tree in Python requires some forethought into the different operations and data handling that will need to be done. At the most basic level, we need the following components:

  1. Node – A structure to encapsulate our data
  2. Insert – An operation to add Nodes to the Tree

That’s it! It might seem odd that there is no Tree structure mentioned here. Having a Node object which can reference a leftNode and rightNode gives us equivalent functionality.

Implementing a Tree object could be useful for retaining certain statistics such as overall Node count, tree height, or even tree depth. We’ll be keeping things simple for now. First, let’s create a Node class with some essential fields.

Note: Check out Visualgo.net’s Binary Tree Visualizer for an interactive intro to B trees

Step 1: The Node Class

node class basics
This illustration depicts three Node objects arranged into the most basic form of a BST—a root (parent) node with references to two child nodes (a left and right)

The word Node is a convention used when referring to a unique item in a tree. In binary trees, each Node object can have at most two references to other Nodes—a leftNode and a rightNode object.  These are considered the child Nodes. Also, each Node needs a way to store data. Below is an implementation of a Node class for binary trees in Python:

class Node:
    
    def __init__(self, data):
        
        self.data = data
        self.left = None
        self.right = None
        

# Create a new Node
node_one = Node(1)

In the example code above we’ve defined a Node class that encapsulates an argument for data (which can be anything) and created two fields for referencing connected Nodes. Note the references to left and right are None by default and must be set explicitly after the creation of other Node objects. Speaking of other Node classes—let’s consider how to start building a tree of Nodes!

Step 2: The Insert Method

bst insert node algorithm illustration alpharithms
This image illustrates the process by which a Node with a value of 4 is inserted into an example Tree.

Our tree is going to be a nimble one—only referenced via the root Node. There will be no Tree class object—a decision taken here for simplicity. The insert method will ensure added nodes are positioned in our resulting tree structure appropriately. We will forego any consideration for balancing our tree for now. We will approach inserting new Nodes as follows:

  1. Check there is data in the new Node;
  2. If that data is less than the current node’s data
    1. If no existing reference to a left Node, add as a new reference
    2. If an existing left Node reference exists, insert into that Node
  3. If that data is greater than the current Node’s data
    1. If no existing reference to a right Node, add as a new reference
    2. If an existing right Node reference exists, insert into that Node

It’s clear we will be comparing our Node’s data often. This highlights the importance that Node objects must be comparable to one another. This simple algorithm ensures that our Nodes form a Binary tree data structure where larger values trend towards the right and smaller values trend towards the left. Below is example code we can add to our Node class as a method to handle insertions.

def insert(self, data):

    # Do nothing if no data
    if self.data:

        # Where data value is less than current, branch left
        if data < self.data:
            if self.left is None:
                self.left = Node(data)
            else:
                self.left.insert(data)

        # Where data is greater than current, branch right
        elif data > self.data:
            if self.right is None:
                self.right = Node(data)
            else:
                self.right.insert(data)

Notice we aren’t handling the case where data from one Node is equal to that of the one into which it is being inserted.  There are several approaches to handle duplicate values:

  1. Use a <= operator for either the right or left Node
  2. Use a list (or running count) for duplicates per Node

Binary search trees don’t typically accommodate duplicate values as unique Nodes. Keeping a count field per Node to account for duplicates is one approach and another more performant approach would simply be to ignore duplicates altogether. How duplicates are handled comes down to one’s intended use case.

Also, note our use of the insert() method of existing Node objects (e.g. self.right.insert). This illustrates the recursive nature of the insert method and also demonstrates the syntactic elegance by which a BST can be organized appropriately during the initial creation and subsequent insertion of data.

Creating the Binary Search Tree

binary search tree diagram example alpharithms
This image depicts a BST created starting from a root Node with a value of 13 which dictates the order in which successive data is arranged.

Now that we have the core features of our Binary Search Tree complete we can begin working on an implementation for a set of test data. We’ll consider a very basic dataset from which we will generate our tree. Let us consider the following data:

# Node Values
data = [13, 5, 23, 4, 6, 22, 28, 3, 3, 27, 32]

Here we have 11 values that contain two instances of 3 (a duplicate). Our final output will reflect the handling of the duplicate and the correct ordering of the data. To create a Binary Search Tree from our data we will first create a Node with the value we choose for root. Then we will make use of the new Node instance’s insert method to build our tree. This is done in the following example code;

# Create the root node from first data item
root = Node(data.pop(0))

# Create and insert Node objects from remaining data
for d in data:

    # Insert a new Node instance
    root.insert(Node(d))

This creates a root Node from which our Binary Search Tree will originate and inserts new data accordingly to the tree structure. Running this will create our binary search tree—but we don’t have a way to visualize it or confirm the output yet. For that, we’re going to have to discuss several ways in which a Binary Tree can be traversed.

Note: The placement of particular Node values depends on the order in which data is added to the tree and which Node is selected as the root. To start a tree from a specific data item—one need only specify that item as the first Node to which all successive data items are inserted.

Binary Tree Traversals

Traversing Binary Trees relies on comparisons between left and right Node values, much like the insert method. Depending on the type of traversal we specify, the order in which we consider left, right, and root differs. Below are three common types of depth-first binary tree traversal algorithms and the order in which they check for data at each Node.

  1. In Order: Left, Root, Right
  2. Pre-Order: Root, Left, Right
  3. Post-Order: Left, Right, Root

The order in which values are checked greatly influences the order in which output is produced. The image below illustrates the process of navigation for each of these three traversal types. breadth-first and level-order traversals

binary search tree traversal types alpharithms
BST data can be accessed in several orderings determined by how the nodes are traversed. The in-order, pre-order, and post-order approaches depicted here are the three most common.

Now that we have a good idea of what’s going on, we can begin implementing each traversal type. We will approach this by adding methods to our existing Node class. We’ll start with the In Order traversal type:

In-Order Traversal

This traversal type outputs the Left Node, the Root Node, and then the Right Node. As illustrated above, if the Left Node contains data, the algorithm will recursively output data until a branch has no further depth. This can be implemented as follows:

def in_order(self, root: "Node") -> List[Any]:
    """
    Traverse the tree in an 'in order' fashion
    """
    # Create and output container
    output = []

    # Starting at the specified Node, 
    # traverse in Left, Root, Right fashion
    if root:

        # Check Left
        output = self.in_order(root.left)

        # Check Root
        output.append(root.data)

        # Check Right, add output to current output
        output = output + self.in_order(root.right)

    # Returns the list of Nodes in-order
    return output

If we call the in_order method on our tree using root.in_order(root) then we get the following output:

[3, 4, 5, 6, 13, 22, 23, 27, 28, 32]

Pre-Order Traversal

This traversal type outputs the Root Node, Left Node, then Right Node. As illustrated above, the Root Node output is collected first, then the algorithm will recurse down the left then right Nodes if present. The following code implements the pre_order method:

def pre_order(self, root: "Node") -> List[Any]:
    """
    Traverse the tree in a 'pre-order' fashion
    """
    # Create and output container
    output = []

    # Starting at the specified Node, 
    # traverse in Root, Left, Right fashion 
    if root:

        # Get Root Node Data
        output.append(root.data)
        
        # Get the Left Node Data and append to current
        output = output + self.pre_order(root.left)

        # Get the Right node data and append to current
        output = output + self.pre_order(root.right)

    # Returns the list of Nodes in pre-order
    return output

If we call the pre_order method on our tree using root.pre_order(root) then we get the following output:

[13, 5, 4, 3, 6, 23, 22, 28, 27, 32]

Post-Order Traversal

This traversal type outputs the Left Node, Right Node, then Root Node. As illustrated above, the Left Node output is collected first, then the algorithm will recurse down the left then right Nodes if present, finally collecting the Root node data. The following code implements the post_order method:

def post_order(self, root: "Node") -> List[Any]:
    """
    Traverse the tree in a post-order fashion
    """
    # Create an output container
    output = []

    # Starting at the specified node,
    # Traverse in a left, right, root fashion
    if root:

        # Get left node data
        output = self.post_order(root.left)

        # Get right node data
        output = output + self.post_order(root.right)

        # Get root data
        output.append(root.data)

    # Returns a list of node data in pre-order
    return output

If we call the post_order method on our tree using the root.post_order(root) syntax we will get the following output:

[3, 4, 6, 5, 22, 27, 32, 28, 23, 13]

Comparing Traversal Types

Having noted the different traversal types it’s clear the output of a Tree is dependent on how one chooses to navigate through the data structure. For a clearer comparison, let’s consider each of our resulting output collections together:

# In Order
[3, 4, 5, 6, 13, 22, 23, 27, 28, 32]

# Pre Order
[13, 5, 4, 3, 6, 23, 22, 28, 27, 32]

# Post Order
[3, 4, 6, 5, 22, 27, 32, 28, 23, 13]

The in_order and post_order seem to produce the most similar output while the pre_order most clearly reflects the root of our tree—with 13 being the first element of both input and output.

These methods can also be used to print trees as well as search them—though without much style. So which way is the best? That depends on the use case and expected outcome. Generally speaking, the traversal types listed here can be beneficial when applied in the following cases:

  • In-Order: When one wants the data of a tree listed in an orderly manner
  • Pre-Order: Used for prefix expressions (see Polish Notation)
  • Post-Order: Used for deletions and freeing of trees

Tree structures are used for such a wide range of applications within computer science it’s hard to provide a concise summary. They are used for searching, ordering, and even the fundamental construction of languages. The three mentioned here are the most common depth-first search algorithms used in binary search tree traversals.

Note: The Node class along with its methods is available as a single class from Github.

Review

Implementing a binary search tree in Python is relatively straightforward. We’ve seen how a simple Node class can be used to provide a robust framework for organizing and accessing data. The three traversal types implemented here are certainly not the only ones available but certainly the most common.

It’s important to note that Binary Search Trees are a single family of Tree data structures that offer certain performance advantages in specific scenarios. Trees are common structures for searching, partitioning, heaping, and syntactic parsing.

Binary Search Trees are particularly useful for sorting and searching—strongly hinted by the name. Even though Python’s standard library doesn’t contain a premade solution, we’ve seen here that to create a binary tree with Python one needs little more than a single custom class with a few helpers methods!

References

  1. Cormen, Thomas. Introduction to Algorithms, 3rd Edition (The MIT Press). 3rd ed., MIT Press, 2009.
alpharithms discord banner 1
Zαck West
Entrepreneur, programmer, designer, and lifelong learner. Can be found taking notes from Mother Nature when not hammering away at the keyboard.