How To: Progress Bars for Python Downloads

Downloading files with Python is super easy and can even be syntactically simple with robust libraries like requests. However, there is no standard library implementation to show a progress bar in python during a download. In this quick tutorial, we’ll show how to quickly implement such a feature to help avoid guesswork during long downloads.

Table of Contents show

Introduction: Understanding the Problem

Before we dive in we need to understand all the moving pieces of this problem. This will provide the framework by which we can concoct our approach of showing a progress bar while downloading a file via Python. An outline is as follows:

Make an HTTP request to download a file

Metering the transfer of data from the remote server
Displaying the metering process on the console

The good news is that we don’t have to implement any of this from scratch—or even on a low-enough level that we would need to touch Python’s urllib. Step 1 will be handled via the Python requests library and both steps 2 and 3 will be handled via the tqdm library. Let’s get started by considering downloading our file.

Step 1: Preparing an HTTP Request

To integrate a progress bar into Python while downloading a file, we need to modify the approach that we might take otherwise. Let’s start by considering how one might download a file in Python without using a progress bar. The following code will download a file and save it to the local file system:

import requests

with requests.get('https://www.example.com/file.txt')as r:
    with open('download.txt', 'wb')as file:
        file.write(r.raw)

This code uses the requests library to construct an HTTP GET request to the https://www.example.com/file.txt URL. This will download the content of example.com and save it as a file named download.txt on the local disk—46 lines of HTML code.

Note: Check out this article on other simple ways to download a file with Python.

Steps 2 & 3: Metering & Displaying the Transfer of Data

In the example above, we download the entirety of a remote file and save its contents to a newly-created local file. This doesn’t allow for any incremental measuring—the file is downloaded entirely and then saved entirely. Let’s first look at how we might implement this manually, then we’ll look at a more streamlined approach.

Manual Implementation

To incrementalize the download process, such that we can register the progress, we need to set the stream parameter of our requests object to True. This sends an initial HEAD request to get file information which will then allow iteration of transferred bytes. This is implemented as such:

import time
import requests

# use a context manager to make an HTTP request and file
with requests.get("https://www.example.com/file.txt", stream=True) as r:
    with open('download.txt', 'wb') as file:

        # Get the total size, in bytes, from the response header
        total_size = int(r.headers.get('Content-Length'))

        # Define the size of the chunk to iterate over (Mb)
        chunk_size = 1

        # iterate over every chunk and calculate % of total
        for i, chunk in enumerate(r.iter_content(chunk_size=chunk_size)):

            # calculate current percentage
            c = i * chunk_size / total_size * 100

            # write current % to console, pause for .1ms, then flush console
            sys.stdout.write(f"\r{round(c, 4)}%")
            time.sleep(.1)
            sys.stdout.flush()

When this code executes, the same data as before is downloaded, but now we are iterating over the byte stream from the response. While doing this, we are calculating the total percentage of bytes downloaded—relative to the total number conveyed by the Content-Length header from our initial HTTP request. This will continue to display a value similar to 5.234234% in the console every .1ms.

The only real magic here is using the sys.stdout.write method to stream to the console, the time.sleep method to pause for .1 millisecond, and then the sys.stdout.flush method to move the cursor back to the first character. This is sloppy, could be much-improved even in its current approach, and adds a .1ms delay for each chunk size—which is a single byte in the above example. Let’s look at a more streamlined approach.

tqdm Implementation

The tqdm library for Python provides a much more syntactically concise means of displaying download progress in the console. This library can be used to monitor and display the progress of any iterable process. There are several approaches this library provides, including command line usage, but here we’ll implement via the wrapattr convenience function as such:

import requests
import shutil
from tqdm.auto import tqdm

# make an HTTP request within a context manager
with requests.get("https://www.example.com/file.txt", stream=True) as r:
    
    # check header to get content length, in bytes
    total_length = int(r.headers.get("Content-Length"))
    
    # implement progress bar via tqdm
    with tqdm.wrapattr(r.raw, "read", total=total_length, desc="")as raw:
    
        # save the output to a file
        with open(f"{os.path.basename(r.url)}", 'wb')as output:
            shutil.copyfileobj(raw, output)

The use of context managers here is discretionary, with the request.get call acceptably done without such—call it a force of habit. This code will download the same data into the same file as before, except now we’ll be shown the following on the console:

100%|██████████| 648/648 [00:00<00:00, 324kB/s]

The use of the tqdm module here provides us with an aesthetically-pleasing progress bar with the following information:

Overall progress as a percentage
Dynamically-updated progress bar

total bytes downloaded/total bytes available
total elapsed time
data transfer speed (kB/s)

The size of the file being downloaded here (648 bytes) hardly allows a full demonstration of the tqdm progress bar experience. After re-running the above code with the URL of a podcast episode specified, the download bar is better illustrated as such:

6%|▌         | 13.0M/212M [00:04<01:04, 3.21MB/s]

Here we see that a download is 6% of the way completed in downloaded a total of 212M after 4 seconds of an estimated 1:04 total wait time averaging a rate of 3.21MB/s. This type of feedback is invaluable during the initial development and logging of complex applications.

Final Thoughts

Downloading files in Python is a simple task. Tacking on a robust library like the requests library makes it even easier. There is no concise way to display the progress of a file download to the console in Python.

Our initial manual implementation gets the job done but could use an industrial-strength dose of consideration for formatting. The tqdm module has a relatively light footprint and requires only the colorama library as a dependency. That does allow one to customize the color of the display text—something we didn’t touch on here.