Django is an incredible Python-based web application with a powerful ORM model that makes rapid-prototyping and data-wrangling a breeze. By default, queries to Django-accessed databases are returned as Django Queryset objects.
Converting such a Queryset
object to a Pandas DataFrame
can offer several performance benefits as well as keep data in a familiar format when working with common data analysis pipelines. Fortunately, converting a Django Queryset
into a Pandas DataFrame
is a breeze!
The Scenario
Let’s say you’re doing financial analysis on stock market data—you’ve gathered historic stock prices over a long period of time. These data have been put into a database via Django’s super friendly ORM framework. Such an assortment of data might be represented by a Stock
and Price
model resembling the following:
class Stock(Model): """ Object model for Stock """ symbol = CharField(max_length=8, unique=True) class Price(Model): """ Object model for historic stock prices """ # Define core reqs of a price date = DateTimeField() opening = TextField() closing = TextField() high = TextField() low = TextField() adj_close = TextField() volume = TextField() # Relate to stock stock = ForeignKey('Stock', on_delete=CASCADE)
The Django Queryset
With the above two models, one would be able to query all historic pricing for a given Stock
symbol by doing the following:
# Get the Stock entry stock = Stock.objects.filter(symbol="ABCD").first() # Get the associated historic pricing data pricing = Quote.objects.filter(stock=stock) >>> <QuerySet [<Price: Price object (1)>, <Price: Price object (2)>, ... , <Price: Price object (2500)>
This is an example of Django database querying 101. The Queryset
object API provides several easy-to-access methods for doing things like getting the total number of elements, getting the first element, or even filtering based on given criteria.
The Pandas DataFrame
Django is designed to be a web application framework. As such, it’s APIs are designed, largely, with smaller queryset displays, manipulations, and storages in mind. 2,500 stock quotes are still well within Django’s wheelhouse but what if there were 500,000+ entries? Millions? More!?
Pandas is a Python library developed by a financial management group with the expressed intent of providing fast and efficient data analysis. It’s used by data scientists around the world and is one of the most popular Python libraries out there. Among its many data-oriented tools: the DataFrame.
One can think of the DataFrame
as a highly-optimized, in-memory spreadsheet with intuitive APIs. That’s a broad simplification but good-enough for our current context. To get a better idea of the Pandas DataFrame, let’s look at how it can be integrated into Django’s Object models:
Explicit Queryset Conversion
Let’s say you’ve queried the database to get all the historic Price
data for a given stock as illustrated in the example above—leaving you with a Django Queryset
object. The following line of code will convert that into a Pandas DataFrame
:
# Convert Django's Queryset into a Pandas DataFrame pricing_dataframe = pd.DataFrame.from_records(prices.values()) >>> date opening closing high low adj_close volume 0 1340337600 0.002200 0.002200 0.002200 0.002200 0.002200 0 1 1340596800 0.002200 0.002200 0.002200 0.002200 0.002200 0 2 1341460800 0.002200 0.002200 0.002200 0.002200 0.002200 0 3 1341547200 0.002200 0.002200 0.002200 0.002200 0.002200 11000 4 1341806400 0.002200 0.002200 0.002200 0.002200 0.002200 0 ... ... ... ... ... ... ... ... 2140 1608181200 0.005500 0.005000 0.005500 0.004800 0.005000 14304663 2141 1608267600 0.004950 0.004500 0.005300 0.004100 0.004500 15350831 2142 1608526800 0.004950 0.005600 0.005800 0.004200 0.005600 19676276 2143 1608613200 0.007000 0.006000 0.007000 0.005300 0.006000 33831552 2144 1608699600 0.006750 0.006450 0.007000 0.005500 0.006450 21728887 [2145 rows x 7 columns]
Immediately, it’s evident that the DataFrame
object provides a better visual representation. This is 100% achievable with Django models, however, provided one takes the time to write some custom __str__
or __repr__
methods.
Custom Model Method Conversion
In my uses of DataFrame
objects within a Django setting, I find that when I need to convert one Queryset
into a DataFrame
—I end up needing to convert many. As such, it would be very repetitive to continually convert Querysets
to DataFrames
explicitly. Adding a custom method on an object class—even better on a base class—provides an easier approach. Consider a revised version of the Stock
class from above:
class Stock(Model): """ Object model for Stock """ symbol = CharField(max_length=8, unique=True) def pricing_dataframe(self) -> DataFrame: """ Get all pricing data, as a Pandas DataFrame object, for a given Stock. Returns: Pandas DataFrame """ return DataFrame.from_records(Quote.objects.filter(stock=self).values())
This method provides two notable extensions from earlier:
- Returns all pricing data for a given stock (could be done with a ManyToMany relationship);
- Converts that pricing data into a Pandas DataFrame object.
This code would be put into action with the following approach:
# Get the Stock model stock = Stock.objects.filter(symbol="ABCD").first() # Get pricing data prices = stock.pricing_dataframe() >>> date opening closing high low adj_close volume 0 1340337600 0.002200 0.002200 0.002200 0.002200 0.002200 0 1 1340596800 0.002200 0.002200 0.002200 0.002200 0.002200 0 2 1341460800 0.002200 0.002200 0.002200 0.002200 0.002200 0 3 1341547200 0.002200 0.002200 0.002200 0.002200 0.002200 11000 4 1341806400 0.002200 0.002200 0.002200 0.002200 0.002200 0 ... ... ... ... ... ... ... ... 2140 1608181200 0.005500 0.005000 0.005500 0.004800 0.005000 14304663 2141 1608267600 0.004950 0.004500 0.005300 0.004100 0.004500 15350831 2142 1608526800 0.004950 0.005600 0.005800 0.004200 0.005600 19676276 2143 1608613200 0.007000 0.006000 0.007000 0.005300 0.006000 33831552 2144 1608699600 0.006750 0.006450 0.007000 0.005500 0.006450 21728887 [2145 rows x 7 columns]
There’s no real performance gain here, but the convenience is nice. Also, by specifying model values in the value()
method, one can provide filtered results. For example, let’s say only the date
, opening
, and adj_close
values were needed for most cases. That could be achieved by updating the method on the Stock
model as such:
class Stock(Model): """ Object model for Stock """ symbol = CharField(max_length=8, unique=True) def pricing_dataframe(self) -> DataFrame: """ Get all pricing data, as a Pandas DataFrame object, for a given Stock. Returns: Pandas DataFrame """ return DataFrame.from_records(Quote.objects.filter(stock=self).values( "date", "opening", "adj_close" ))
Now, if we run the same query as before, a filtered set of data will be returned:
# Get the Stock model stock = Stock.objects.filter(symbol="ABCD").first() # Get pricing data prices = stock.pricing_dataframe() >>> date opening adj_close 0 1340337600 0.002200 0.002200 1 1340596800 0.002200 0.002200 2 1341460800 0.002200 0.002200 3 1341547200 0.002200 0.002200 4 1341806400 0.002200 0.002200 ... ... ... ... 2140 1608181200 0.005500 0.005000 2141 1608267600 0.004950 0.004500 2142 1608526800 0.004950 0.005600 2143 1608613200 0.007000 0.006000 2144 1608699600 0.006750 0.006450 [2145 rows x 3 columns]
Final Thoughts
Django is an incredibly powerful web application framework that drives some seriously enterprise-grade projects. For data-centric projects, some of the default approaches may not be optimal for certain workflows. For example, many popular machine learning frameworks like scikit-learn
rely on Pandas DataFrame objects.
Being able to easily convert Django querysets into DataFrames
can help facilitate such workflows. Personally, I love Django’s ORM so much that I use it for web scraping and data modeling all the time with zero intention of it ever being an application—let alone a web application.