Django comes with a powerful ORM framework by which one can easily design, implement, and interact with databases. The ManyToMany relationship field provides intuitive access to semi-complex data relationships. Knowing how to leverage it within Django filters can make your queries more efficient.
Example Database Models
To illustrate the ManyToMany field in action, let’s create some simple database models representing publicly-traded securities and their historic market prices. This could be used to model price changes over a period of time for a stock.
from django.db.models import Model, CharField, ManyToManyField, FloatField, DateField class Stock(Model): """ Model for single security """ symbol = CharField(max_length=8, unique=True) prices = ManyToManyField('Price', related_name='price_stock') class Price(Model): """ Model for historic Stock quotes """ price = FloatField() date = DateField()
With this design in place, one can query Stock objects, Price objects, or functional relationships between the two—Stocks with Prices, for example. The use-case here is to generate a query that returns all Stock objects with at least one Price relationship. Simply put: that’s retrieving any stock that has pricing data.
First, let’s generate some sample data for Stocks, Prices, and relationships between the two. There’s a lot of shortcuts in the following code, but it’s effectively creating a number of Stock entries then adding historic pricing data for a sample of those entries.
# Generate random securities random_symbols = set([ # Generate 50 random series of characters between 1 and 7 characters in length "".join(random.choice(string.ascii_letters).upper() for x in range(random.choice(range(1, 8)))) for y in range(50) ]) # Add all symbol data as Stock objects for random_symbol in random_symbols: # Create a new stock object # Note: the use of get_or_create here negates the need for unique symbol values stock, created = Stock.objects.get_or_create(symbol=random_symbol) # Add pricing data for 10 randomly-selected symbols. # Note: Symbols omitted to demonstrate query filtering later. for random_stock in random.sample([x for x in Stock.objects.all()], 10): # Create a new pandas dataframe with randomized pricing data for 1 years price_data = pd.DataFrame(columns=['date', 'price']) price_data['date'] = pd.date_range(start='1/1/2020', end='1/1/2021', freq='D') price_data['price'] = [random.uniform(1.5, 5.0) for date in range(price_data.shape)] # Create a Price entry and add as relationship to Stock object for each entry for index, row in price_data.iterrows(): # Create new price entry price_entry = Price.objects.create(price=row['price'], date=row['date']) # Add relationship to stock random_stock.prices.add(price_entry)
The above code has added a year’s worth of pricing data (1/1/2020-1/1/2021) for 10 randomly-selected Stock objects from the 50 added to the database. To get an idea of what things look like:
# import models from stocks.models import Stock, Price # Get count of total DB Stock entries Stock.objects.count() >>> 50 # Get count of total DB Price entries Price.objects.count() >>> 3670
Querying for Models with ManyToMany Entries
Now that our test data is created, we can begin to approach the problem: querying for all Stock entries that have price data. In other words, we don’t want to return any Stock object without pricing data. While Django’s ORM is incredible, this use-case can be approached via several ways—each with different performances.
# import models from stocks.models import Stock, Price import time # First Approach — doesn't return expected data st = time.time() Stock.objects.filter(prices=True).count() >>> 1 time.time() - st >>> 0.003003358840942383 # Second Approach - doesn't filter data st = time.time() Stock.objects.exclude(prices=False).count() >>> 50 time.time() - st >>> 0.0010004043579101562 # Third Approach - returns expected data super fast st = time.time() Stock.objects.exclude(prices=None).count() >>> 10 time.time() - st >>> 0.0 # Fourth Approach - returns expected data st = time.time() Stock.objects.exclude(prices__isnull=True).count() >>> 10 time.time() - st >>> 0.0010006427764892578 # Fifth Approach - returns expected data, but much more slowly st = time.time() len([stock for stock in Stock.objects.all() if stock.prices.count() > 0]) >>> 10 time.time() - st >>> 0.019017934799194336
To be honest, I’m not entirely sure why the first two examples don’t return the expected data. The syntax, while valid, doesn’t seem to produce the query I’d expect. The remaining examples, however, each return the 10 Stock objects for which pricing data was added.
Stock.objects.exclude(prices=None).count() produces the fastest query, though admittedly there could be much more test data to magnify the differences. Aside from the illustration of effective syntax, I’d like to emphasize that remaining within the syntactic realm of Django’s Queryset is crucial to optimize query speed. In the last example using Python’s list comprehension—the query is nearly 10-times as slow!
- The generation of random symbols is willy-nilly and doesn’t always produce the same number of unique values.
- The time-series data doesn’t account for days that actual security wouldn’t be trading—weekends, holidays, etc.
- Pricing data is completely random in range and doesn’t attempt to model any type of trend (price increase, splits, etc.)
- Adding this much sample data to a standard SQLite database is slow. Using another database type (like PostgreSQL) that accommodates parallel writes, in combination with Python’s
multiprocessinglibrary, is recommended.
I find myself using Django quite often simply for its ORM system. When I need to iterate quickly with some data that necessitates a relational design I often use while validating an idea/notion/curiosity. The Django ORM makes data modeling and basic querying a breeze.
For complex and—in the case here—the syntax of even semi-complex lookups can get a little quirky. Django provides the Q object to simplify more complex lookups. I would recommend anyone that finds themselves without a clue as to the proper query syntax to check that class out. If for no other reason, it’s insightful as to the possibilities of DB queries Django supports natively.