Python’s DefaultDict: Hash Tables Made Easy

Hash Tables do wonders for search operations but can quickly cause syntactic clutter when used often. Python's DefaultDict class provide a simplified API to help create clearer code and reduce syntactic clutter.
alpharithms hash mapping defaultdict python

The DefaultDict class in Python operates much like the regular Dictionary class. However, it extends functionality to reduce the syntactic burden of several common dictionary-based operations. Namely, it allows for lighter syntax when adding potentially new key-value pairs to the dictionary object.

Quick Intro: Dictionaries & Common Operations

The Python Dictionary data structure is akin to the Java HashMap, a C++ Unordered Map, a C# Dictionary, a Rust HashMap, and a JavaScript Map. While implementations vary, the essentials of these structures are based on the abstract dictionary data type where “Keys” are connected to “Values” such that the following operational guarantees are provided:

Operation Average Case Worst Case
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

Essentially, a Dictionary connects pairs of values, called “Key-Value Pairs” such that, when given a key as a reference, the above operations are possible in the specified times. This is possible by the following (simplified) process:

  1. A Hash function transforms the Key into an integer value (Hash-Key).
  2. The Hash-Key serves as an index into an Array where Values are stored.
  3. Searching is done by using the Hash-Key as an index into the underlying Array.

In the illustration below, we can see a string value Hello, Hash Map being transformed by the hash function, into an integer value (hexadecimal in this case) which then serves as the index into the underlying Array data structure into which values are stored.

hash mapping
Figure 1: Hash Maps use hashing functions to get an index value used to store/access data into an underlying array.

Hash Maps like the one illustrated here allow object access like one was indexing into an array rather than searching for an object in a list. The same goes for insertions and deletions: there is no searching for an item, or array sorting, or tree re-balancing for the average case. The hash function essentially provides a way to determine the index.

There are a lot more details to understand about hash mappings than what we just described. However, the concept of a Key being mapped to a Value is enough for our conversation here. For a more complete discussion of Hash Maps, including concerns such as the underlying implementation details, handling collisions, and applications, check out this video by Gayle Laakman — author of the infamous Cracking the Coding Interview book.

Default Dicts: Sprinkling on Some Syntactic Sugar

Python’s DefaultDict class is part of the broader Collections framework. This module provides extended versions of standard data structures to help provide simpler workflows for common use cases. The DefaultDict is designed such that one doesn’t need to check if a value exists before altering it. Let’s see an example of when that might be useful.

Let’s say we have a list of orders placed at the deli. The deli offers several different types of meats and customers can order different sizes or each. At the end of the day, the deli manager would like to know several essential things:

  1. How many total orders?
  2. Average Order Size.
  3. Total amount of each product sold.

If we used a list, we might have to iterate over every item for each unique product. However, a hash mapping would help isolate just the products we wanted like this:

order mapping alpharithms
Figure 2: The order data is structured into a Mapping of Order types and lists of order amounts.

This transformation creates a Hash Mapping (Dictionary) of the orders and their amounts such that, for any single order type, the amounts are easily accessed as Mapping[order_type] such that the search, insert, and delete operations are possible in constant time for the average case. This could be implemented in vanilla Python using the standard dict data structure as follows:

# list of deli orders
orders = [
    ("fish", 1),
    ("beef", 2),
    ("chicken", 1),
    ("fish", 5),
    ("beef", 4),
    ("beef", 2),
    ("chicken", 7)
]

# create a dictionary object
mapping = {}

# loop through all orders
for item, amount in orders:
    
    # add to existing list
    if item in mapping:
        mapping[item].append(amount)
        
    # start a new list
    else:
        mapping[item] = [amount]

This works fine — there’s no doubt about it. However, for those of us who use Python dict objects often, this syntax becomes cumbersome quickly. Python’s DefaultDict can help achieve the same end result without the conditional statements as such:

from collections import defaultdict

# list of deli orders
orders = [
    ("fish", 1),
    ("beef", 2),
    ("chicken", 1),
    ("fish", 5),
    ("beef", 4),
    ("beef", 2),
    ("chicken", 7)
]

# create a dictionary object
mapping = defaultdict(list)

# loop through all orders
for item, amount in orders:

    # default dict magic
    mapping[item].append(amount)

Here we see the line mapping[item].append(amount) without any conditional statements to handle the case where that item might not already be in the dict object. Normally, this would raise an KeyError exception. However, Python’s DefaultDict will do the equivalent  conditional check for us and, in the case where there isn’t a key, will create a new key entry that maps to the data type provided in the instantiation statement — a list in this case.

Final Thoughts

The DefaultDict class in Python exemplifies how the language cuts down on development time. Its easy yet powerful interface provides a simple, no-nonsense extension of the basic dict class to help optimize and reduce the syntactic burden of common use cases. Is it entirely necessary for the example here? No. A very similar workflow could be achieved by using the following assignment statement:

mapping[item] = mapping.get(item, []) + [amount]

However, this would have to change slightly based on the data type, isn’t completely clear to the untrained eye, and leaves more room for potential errors. At the very least, it’s still more characters than the use of defaultdict.

Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.