Parse CSV Data Directly from Download Using Python

Getting tired of creating temporary directories, searching for unused files, and all the housekeeping that ensues from downloading many files? Python makes it easy to parse data directly from memory to help avoid all the fuss!
python parse remote CSV from memory

Python makes downloading files super easy. Parsing data directly from memory is sometimes more convenient than fussing around with temporary files, directories, and the ensuing housekeeping. Fortunately, Python makes parsing data in memory just as easy (if not easier) than actually saving the file to local storage.

Parsing a CSV File from Memory

In this article, we’ll cover how to use the requests library to download a remote CSV file directly into memory and parse it using the csv module. This doesn’t involve much code but it will be helpful to outline the steps first. Here is what will happen:

  1. Define a URL
  2. Send HTTP GET request via the requests library
  3. Convert response data into iterator of lines
  4. Parse response lines as CSV reader object
  5. Iterator, save or manipulate as desired

Listing these steps takes almost as many characters as the actual code. Below is the implementation resulting in iterating over a csv.reader object to print each line to the standard output:

import request
import csv

# Define the remote URL
url = "https://query1.finance.yahoo.com/v7/finance/download/SPY"

# Send HTTP GET request via requests
data = requests.get(url)

# Convert to iterator by splitting on \n chars
lines = data.text.splitlines()

# Parse as CSV object
reader = csv.reader(lines)

# View Result
for row in reader:
    print(row)

['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
['2021-08-27', '447.119995', '450.649994', '447.070007', '450.100006', '450.100006', '58642636']

Here we see two ‘rows’ of data printing out—the first row being the header and the second being the most recent OHLC quote for $SPY. Converting a response object into an iterator is as simple as using the splitlines() method. This method is the equivalent of using split('\n') and in either case, will return an iterable (the required argument for the csv.reader method.

Final Thoughts

Python offers several ways to download a file. In any case, the resulting textual data can be parsed—directly from memory—to be used rather than first saving to local storage. This can often be a more efficient approach in cases where temporary file use is preferred and remote files are not excessive in size. Readers interested in retrieving financial data using Python should check out the article 3 Easy Ways to Get Financial Data Using Python.

Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.