Converting Django Querysets to Pandas Dataframes

Django is an incredible Python-based web application with a powerful ORM model that makes rapid-prototyping and data-wrangling a breeze. By default, queries to Django-accessed databases are returned as Django Queryset objects.

Table of Contents show

Converting such a Queryset object to a Pandas DataFrame can offer several performance benefits as well as keep data in a familiar format when working with common data analysis pipelines. Fortunately, converting a Django Queryset into a Pandas DataFrame is a breeze!

The Scenario

Let’s say you’re doing financial analysis on stock market data—you’ve gathered historic stock prices over a long period of time. These data have been put into a database via Django’s super friendly ORM framework. Such an assortment of data might be represented by a Stock and Price model resembling the following:

class Stock(Model):
    """
    Object model for Stock 
    """
    symbol      = CharField(max_length=8, unique=True)
    
    
class Price(Model):
    """
    Object model for historic stock prices
    """
    # Define core reqs of a price
    date        = DateTimeField()
    opening     = TextField()
    closing     = TextField()
    high        = TextField()
    low         = TextField()
    adj_close   = TextField()
    volume      = TextField()

    # Relate to stock
    stock       = ForeignKey('Stock', on_delete=CASCADE)

The Django Queryset

With the above two models, one would be able to query all historic pricing for a given Stock symbol by doing the following:

# Get the Stock entry
stock       = Stock.objects.filter(symbol="ABCD").first()

# Get the associated historic pricing data
pricing     = Quote.objects.filter(stock=stock)

>>> <QuerySet [<Price: Price object (1)>, <Price: Price object (2)>, ... , <Price: Price object (2500)>

This is an example of Django database querying 101. The Queryset object API provides several easy-to-access methods for doing things like getting the total number of elements, getting the first element, or even filtering based on given criteria.

The Pandas DataFrame

Django is designed to be a web application framework. As such, it’s APIs are designed, largely, with smaller queryset displays, manipulations, and storages in mind. 2,500 stock quotes are still well within Django’s wheelhouse but what if there were 500,000+ entries? Millions? More!?

Pandas is a Python library developed by a financial management group with the expressed intent of providing fast and efficient data analysis. It’s used by data scientists around the world and is one of the most popular Python libraries out there. Among its many data-oriented tools: the DataFrame.

One can think of the DataFrame as a highly-optimized, in-memory spreadsheet with intuitive APIs. That’s a broad simplification but good-enough for our current context. To get a better idea of the Pandas DataFrame, let’s look at how it can be integrated into Django’s Object models:

Explicit Queryset Conversion

Let’s say you’ve queried the database to get all the historic Price data for a given stock as illustrated in the example above—leaving you with a Django Queryset object. The following line of code will convert that into a Pandas DataFrame:

# Convert Django's Queryset into a Pandas DataFrame
pricing_dataframe = pd.DataFrame.from_records(prices.values())

>>>             date   opening   closing      high       low adj_close    volume
0     1340337600  0.002200  0.002200  0.002200  0.002200  0.002200         0
1     1340596800  0.002200  0.002200  0.002200  0.002200  0.002200         0
2     1341460800  0.002200  0.002200  0.002200  0.002200  0.002200         0
3     1341547200  0.002200  0.002200  0.002200  0.002200  0.002200     11000
4     1341806400  0.002200  0.002200  0.002200  0.002200  0.002200         0
...          ...       ...       ...       ...       ...       ...       ...
2140  1608181200  0.005500  0.005000  0.005500  0.004800  0.005000  14304663
2141  1608267600  0.004950  0.004500  0.005300  0.004100  0.004500  15350831
2142  1608526800  0.004950  0.005600  0.005800  0.004200  0.005600  19676276
2143  1608613200  0.007000  0.006000  0.007000  0.005300  0.006000  33831552
2144  1608699600  0.006750  0.006450  0.007000  0.005500  0.006450  21728887

[2145 rows x 7 columns]

Immediately, it’s evident that the DataFrame object provides a better visual representation. This is 100% achievable with Django models, however, provided one takes the time to write some custom __str__ or __repr__ methods.

Custom Model Method Conversion

In my uses of DataFrame objects within a Django setting, I find that when I need to convert one Queryset into a DataFrame—I end up needing to convert many. As such, it would be very repetitive to continually convert Querysets to DataFrames explicitly. Adding a custom method on an object class—even better on a base class—provides an easier approach. Consider a revised version of the Stock class from above:

class Stock(Model):
    """
    Object model for Stock
    """
    symbol      = CharField(max_length=8, unique=True)

    def pricing_dataframe(self) -> DataFrame:
        """
        Get all pricing data, as a Pandas DataFrame object, for a given Stock.
        Returns:
            Pandas DataFrame
        """
        return DataFrame.from_records(Quote.objects.filter(stock=self).values())

This method provides two notable extensions from earlier:

Returns all pricing data for a given stock (could be done with a ManyToMany relationship);
Converts that pricing data into a Pandas DataFrame object.

This code would be put into action with the following approach:

# Get the Stock model
stock = Stock.objects.filter(symbol="ABCD").first()

# Get pricing data
prices = stock.pricing_dataframe()

>>>             date   opening   closing      high       low adj_close    volume
0     1340337600  0.002200  0.002200  0.002200  0.002200  0.002200         0
1     1340596800  0.002200  0.002200  0.002200  0.002200  0.002200         0
2     1341460800  0.002200  0.002200  0.002200  0.002200  0.002200         0
3     1341547200  0.002200  0.002200  0.002200  0.002200  0.002200     11000
4     1341806400  0.002200  0.002200  0.002200  0.002200  0.002200         0
...          ...       ...       ...       ...       ...       ...       ...
2140  1608181200  0.005500  0.005000  0.005500  0.004800  0.005000  14304663
2141  1608267600  0.004950  0.004500  0.005300  0.004100  0.004500  15350831
2142  1608526800  0.004950  0.005600  0.005800  0.004200  0.005600  19676276
2143  1608613200  0.007000  0.006000  0.007000  0.005300  0.006000  33831552
2144  1608699600  0.006750  0.006450  0.007000  0.005500  0.006450  21728887

[2145 rows x 7 columns]

There’s no real performance gain here, but the convenience is nice. Also, by specifying model values in the value() method, one can provide filtered results. For example, let’s say only the date, opening, and adj_close values were needed for most cases. That could be achieved by updating the method on the Stock model as such:

class Stock(Model):
    """
    Object model for Stock
    """
    symbol      = CharField(max_length=8, unique=True)

    def pricing_dataframe(self) -> DataFrame:
        """
        Get all pricing data, as a Pandas DataFrame object, for a given Stock.
        Returns:
            Pandas DataFrame
        """
        return DataFrame.from_records(Quote.objects.filter(stock=self).values(
            "date", "opening", "adj_close"
        ))

Now, if we run the same query as before, a filtered set of data will be returned:

# Get the Stock model
stock = Stock.objects.filter(symbol="ABCD").first()

# Get pricing data
prices = stock.pricing_dataframe()

>>>            date   opening adj_close
0     1340337600  0.002200  0.002200
1     1340596800  0.002200  0.002200
2     1341460800  0.002200  0.002200
3     1341547200  0.002200  0.002200
4     1341806400  0.002200  0.002200
...          ...       ...       ...
2140  1608181200  0.005500  0.005000
2141  1608267600  0.004950  0.004500
2142  1608526800  0.004950  0.005600
2143  1608613200  0.007000  0.006000
2144  1608699600  0.006750  0.006450

[2145 rows x 3 columns]

Final Thoughts

Django is an incredibly powerful web application framework that drives some seriously enterprise-grade projects. For data-centric projects, some of the default approaches may not be optimal for certain workflows. For example, many popular machine learning frameworks like scikit-learn rely on Pandas DataFrame objects.

Being able to easily convert Django querysets into DataFrames can help facilitate such workflows. Personally, I love Django’s ORM so much that I use it for web scraping and data modeling all the time with zero intention of it ever being an application—let alone a web application.