Using Python to download files from the Internet is super easy—a feat possible using only standard library functions if desired. To download a URL with Python one needs little else than a URL—seriously. Of course, this comes as little surprise with Python being one of the most popular programming languages.
Other libraries, most notably the Python requests library, can provide a clearer API for those more concerned with higher-level operations. This article outlines 3 ways to download a file using python with a short discussion of each.
Pythons’ urllib library offers a range of functions designed to handle common URL-related tasks. This includes parsing, requesting, and—you guessed it—downloading files. Let’s consider a basic example of downloading the robots.txt file from Google.com:
from urllib import request # Define the remote file to retrieve remote_url = 'https://www.google.com/robots.txt' # Define the local filename to save data local_file = 'local_copy.txt' # Download remote and save locally request.urlretrieve(remote_url, local_file)
urllib is considered “legacy” from Python 2 and, in the words of the Python documentation: “might become deprecated at some point in the future.” In my opinion, there’s a big divide between “might” become deprecated and “will” become deprecated. In other words, this is probably a safe approach for the foreseeable future.
requests.get + manual save
The Python requests module is a super friendly library billed as “HTTP for humans.” Offering very simplified APIs, requests lives up to its motto for even high-throughput HTTP-related demands. However, it doesn’t feature a one-liner for downloading files. Instead, one must manually save streamed file data as follows:
import requests # Define the remote file to retrieve remote_url = 'https://www.google.com/robots.txt' # Define the local filename to save data local_file = 'local_copy.txt' # Make http request for remote file data data = requests.get(remote_url) # Save file data to local copy with open(local_file, 'wb')as file: file.write(data.content)
There are some important aspects of this approach to keep in mind—most notably the binary format of data transfer. When a web browser loads a page (or file) it encodes it using the specified encoding from the host.
Common encodings include UTF-8 and Latin-1. This is a directive aimed at web browsers that are receiving and displaying data that isn’t immediately applicable to downloading files.
Note: downloaded files may require encoding in order to display properly. That’s beyond the scope of this tutorial.
The wget Python library offers a method similar to the
urllib and attracts a lot of attention to its name being identical to the Linux
wget command. This module was last updated in 2015.
import wget # Define the remote file to retrieve remote_url = 'https://www.google.com/robots.txt' # Define the local filename to save data local_file = 'local_copy.txt' # Make http request for remote file data wget.download(remote_url, local_file)
wget.download function uses a combination of
shutil to retrieve the downloaded data, save to a temporary file, and then move that file (and rename it) to the specified location.
Downloading files with Python is super simple and can be accomplished using the standard
urllib functions. I’ve found the requests library to offer the easiest and most versatile APIs for common HTTP-related tasks. One notable exception is the URL parsing features of the urllib. Those are strictly HTTP related though—so I don’t take points away from