The DefaultDict class in Python operates much like the regular Dictionary class. However, it extends functionality to reduce the syntactic burden of several common dictionary-based operations. Namely, it allows for lighter syntax when adding potentially new key-value pairs to the dictionary object.
Quick Intro: Dictionaries & Common Operations
The Python Dictionary data structure is akin to the Java HashMap, a C++ Unordered Map, a C# Dictionary, a Rust HashMap, and a JavaScript Map. While implementations vary, the essentials of these structures are based on the abstract dictionary data type where “Keys” are connected to “Values” such that the following operational guarantees are provided:
Operation | Average Case | Worst Case |
---|---|---|
Search | O(1) | O(n) |
Insert | O(1) | O(n) |
Delete | O(1) | O(n) |
Essentially, a Dictionary connects pairs of values, called “Key-Value Pairs” such that, when given a key as a reference, the above operations are possible in the specified times. This is possible by the following (simplified) process:
- A Hash function transforms the Key into an integer value (Hash-Key).
- The Hash-Key serves as an index into an Array where Values are stored.
- Searching is done by using the Hash-Key as an index into the underlying Array.
In the illustration below, we can see a string value Hello, Hash Map
being transformed by the hash function, into an integer value (hexadecimal in this case) which then serves as the index into the underlying Array data structure into which values are stored.
Hash Maps like the one illustrated here allow object access like one was indexing into an array rather than searching for an object in a list. The same goes for insertions and deletions: there is no searching for an item, or array sorting, or tree re-balancing for the average case. The hash function essentially provides a way to determine the index.
There are a lot more details to understand about hash mappings than what we just described. However, the concept of a Key being mapped to a Value is enough for our conversation here. For a more complete discussion of Hash Maps, including concerns such as the underlying implementation details, handling collisions, and applications, check out this video by Gayle Laakman — author of the infamous Cracking the Coding Interview book.
Default Dicts: Sprinkling on Some Syntactic Sugar
Python’s DefaultDict class is part of the broader Collections framework. This module provides extended versions of standard data structures to help provide simpler workflows for common use cases. The DefaultDict is designed such that one doesn’t need to check if a value exists before altering it. Let’s see an example of when that might be useful.
Let’s say we have a list of orders placed at the deli. The deli offers several different types of meats and customers can order different sizes or each. At the end of the day, the deli manager would like to know several essential things:
- How many total orders?
- Average Order Size.
- Total amount of each product sold.
If we used a list, we might have to iterate over every item for each unique product. However, a hash mapping would help isolate just the products we wanted like this:
This transformation creates a Hash Mapping (Dictionary) of the orders and their amounts such that, for any single order type, the amounts are easily accessed as Mapping[order_type]
such that the search, insert, and delete operations are possible in constant time for the average case. This could be implemented in vanilla Python using the standard dict
data structure as follows:
# list of deli orders orders = [ ("fish", 1), ("beef", 2), ("chicken", 1), ("fish", 5), ("beef", 4), ("beef", 2), ("chicken", 7) ] # create a dictionary object mapping = {} # loop through all orders for item, amount in orders: # add to existing list if item in mapping: mapping[item].append(amount) # start a new list else: mapping[item] = [amount]
This works fine — there’s no doubt about it. However, for those of us who use Python dict
objects often, this syntax becomes cumbersome quickly. Python’s DefaultDict can help achieve the same end result without the conditional statements as such:
from collections import defaultdict # list of deli orders orders = [ ("fish", 1), ("beef", 2), ("chicken", 1), ("fish", 5), ("beef", 4), ("beef", 2), ("chicken", 7) ] # create a dictionary object mapping = defaultdict(list) # loop through all orders for item, amount in orders: # default dict magic mapping[item].append(amount)
Here we see the line mapping[item].append(amount)
without any conditional statements to handle the case where that item might not already be in the dict
object. Normally, this would raise an KeyError
exception. However, Python’s DefaultDict will do the equivalent conditional check for us and, in the case where there isn’t a key, will create a new key entry that maps to the data type provided in the instantiation statement — a list
in this case.
Final Thoughts
The DefaultDict class in Python exemplifies how the language cuts down on development time. Its easy yet powerful interface provides a simple, no-nonsense extension of the basic dict
class to help optimize and reduce the syntactic burden of common use cases. Is it entirely necessary for the example here? No. A very similar workflow could be achieved by using the following assignment statement:
mapping[item] = mapping.get(item, []) + [amount]
However, this would have to change slightly based on the data type, isn’t completely clear to the untrained eye, and leaves more room for potential errors. At the very least, it’s still more characters than the use of defaultdict
.