Downloading files with Python is super easy and can even be syntactically simple with robust libraries like requests. However, there is no standard library implementation to show a progress bar in python during a download. In this quick tutorial, we’ll show how to quickly implement such a feature to help avoid guesswork during long downloads.
Introduction: Understanding the Problem
Before we dive in we need to understand all the moving pieces of this problem. This will provide the framework by which we can concoct our approach of showing a progress bar while downloading a file via Python. An outline is as follows:
- Make an HTTP request to download a file
- Metering the transfer of data from the remote server
- Displaying the metering process on the console
The good news is that we don’t have to implement any of this from scratch—or even on a low-enough level that we would need to touch Python’s urllib
. Step 1 will be handled via the Python requests
library and both steps 2 and 3 will be handled via the tqdm
library. Let’s get started by considering downloading our file.
Step 1: Preparing an HTTP Request
To integrate a progress bar into Python while downloading a file, we need to modify the approach that we might take otherwise. Let’s start by considering how one might download a file in Python without using a progress bar. The following code will download a file and save it to the local file system:
import requests with requests.get('https://www.example.com/file.txt')as r: with open('download.txt', 'wb')as file: file.write(r.raw)
This code uses the requests library to construct an HTTP GET request to the https://www.example.com/file.txt
URL. This will download the content of example.com and save it as a file named download.txt
on the local disk—46 lines of HTML code.
Note: Check out this article on other simple ways to download a file with Python.
Steps 2 & 3: Metering & Displaying the Transfer of Data
In the example above, we download the entirety of a remote file and save its contents to a newly-created local file. This doesn’t allow for any incremental measuring—the file is downloaded entirely and then saved entirely. Let’s first look at how we might implement this manually, then we’ll look at a more streamlined approach.
Manual Implementation
To incrementalize the download process, such that we can register the progress, we need to set the stream parameter of our requests
object to True
. This sends an initial HEAD request to get file information which will then allow iteration of transferred bytes. This is implemented as such:
import time import requests # use a context manager to make an HTTP request and file with requests.get("https://www.example.com/file.txt", stream=True) as r: with open('download.txt', 'wb') as file: # Get the total size, in bytes, from the response header total_size = int(r.headers.get('Content-Length')) # Define the size of the chunk to iterate over (Mb) chunk_size = 1 # iterate over every chunk and calculate % of total for i, chunk in enumerate(r.iter_content(chunk_size=chunk_size)): # calculate current percentage c = i * chunk_size / total_size * 100 # write current % to console, pause for .1ms, then flush console sys.stdout.write(f"\r{round(c, 4)}%") time.sleep(.1) sys.stdout.flush()
When this code executes, the same data as before is downloaded, but now we are iterating over the byte stream from the response. While doing this, we are calculating the total percentage of bytes downloaded—relative to the total number conveyed by the Content-Length
header from our initial HTTP request. This will continue to display a value similar to 5.234234%
in the console every .1ms.
The only real magic here is using the sys.stdout.write
method to stream to the console, the time.sleep
method to pause for .1 millisecond, and then the sys.stdout.flush
method to move the cursor back to the first character. This is sloppy, could be much-improved even in its current approach, and adds a .1ms delay for each chunk size—which is a single byte in the above example. Let’s look at a more streamlined approach.
tqdm Implementation
The tqdm library for Python provides a much more syntactically concise means of displaying download progress in the console. This library can be used to monitor and display the progress of any iterable process. There are several approaches this library provides, including command line usage, but here we’ll implement via the wrapattr
convenience function as such:
import requests import shutil from tqdm.auto import tqdm # make an HTTP request within a context manager with requests.get("https://www.example.com/file.txt", stream=True) as r: # check header to get content length, in bytes total_length = int(r.headers.get("Content-Length")) # implement progress bar via tqdm with tqdm.wrapattr(r.raw, "read", total=total_length, desc="")as raw: # save the output to a file with open(f"{os.path.basename(r.url)}", 'wb')as output: shutil.copyfileobj(raw, output)
The use of context managers here is discretionary, with the request.get
call acceptably done without such—call it a force of habit. This code will download the same data into the same file as before, except now we’ll be shown the following on the console:
100%|██████████| 648/648 [00:00<00:00, 324kB/s]
The use of the tqdm
module here provides us with an aesthetically-pleasing progress bar with the following information:
- Overall progress as a percentage
- Dynamically-updated progress bar
- total bytes downloaded/total bytes available
- total elapsed time
- data transfer speed (kB/s)
The size of the file being downloaded here (648 bytes) hardly allows a full demonstration of the tqdm
progress bar experience. After re-running the above code with the URL of a podcast episode specified, the download bar is better illustrated as such:
6%|▌ | 13.0M/212M [00:04<01:04, 3.21MB/s]
Here we see that a download is 6% of the way completed in downloaded a total of 212M after 4 seconds of an estimated 1:04 total wait time averaging a rate of 3.21MB/s. This type of feedback is invaluable during the initial development and logging of complex applications.
Final Thoughts
Downloading files in Python is a simple task. Tacking on a robust library like the requests library makes it even easier. There is no concise way to display the progress of a file download to the console in Python.
Our initial manual implementation gets the job done but could use an industrial-strength dose of consideration for formatting. The tqdm
module has a relatively light footprint and requires only the colorama library as a dependency. That does allow one to customize the color of the display text—something we didn’t touch on here.