Convert JSON to CSV in Python

Python is a powerful toolset for data scientists and developers alike. Often, these two disciplines meet at the intersection that is converting data. One common conversion requirement is that of CSV data to JavaScript Object Notation (JSON) data. Fortunately, the Pandas library makes this conversion fast and efficient.

Table of Contents show

Pandas is a powerful data science library whereby developers and data scientists can access, analyze, and manipulate data efficiently and conveniently. Among the many convenient methods and functions found in the Pandas library is the to_json method. This method, found in the DataFrame class, is a powerful tool for converting data from CSV to JSON.

Project Setup

For this project, we’ll create some sample data of random people with information. We’ll use this handy Python script for generating random personal information, producing the following JSON-format data, saved as a local file named people.json.

[{"id": 1, "first_name": "Alice", "last_name": "Smith", "location": {"City": "Denver", "State": "Colorado"}, "age": 18}, {"id": 2, "first_name": "Bob", "last_name": "Anderson", "location": {"City": "Dallas", "State": "Texas"}, "age": 24}, {"id": 3, "first_name": "Alice", "last_name": "Jacobs", "location": {"City": "Los Angeles", "State": "California"}, "age": 18}, {"id": 4, "first_name": "Bob", "last_name": "Williams", "location": {"City": "Dallas", "State": "Texas"}, "age": 24}, {"id": 5, "first_name": "Pat", "last_name": "Anderson", "location": {"City": "Miami", "State": "Florida"}, "age": 60}]

With this data in hand, we can leverage the power of Pandas to conveniently load our data in CSV format as a DataFrame object, from which we can then convert to CSV as desired.

Step 1: Load JSON into DataFrame

DataFrames are powerful tools for managing 2-dimensional data (or higher-dimensional data reduced as such) with much attention to low-level optimizations. A DataFrame can also quickly and efficiently export data as CSV files — which is what we’re after. We can load the people.json file data as a DataFrame as such:

# Load via context manager and read_json() method
with open('people.json', 'r')as file:
    df = pd.read_json(file.read())

# view result
print(df)

   id first_name last_name                                        location  age
0   1      Alice     Smith         {'City': 'Denver', 'State': 'Colorado'}   18
1   2        Bob  Anderson            {'City': 'Dallas', 'State': 'Texas'}   24
2   3      Alice    Jacobs  {'City': 'Los Angeles', 'State': 'California'}   18
3   4        Bob  Williams            {'City': 'Dallas', 'State': 'Texas'}   24
4   5        Pat  Anderson           {'City': 'Miami', 'State': 'Florida'}   60

This gets us pretty close but there are two noticeable issues, one being of grave importance:

The location column contains nested JSON data that didn’t import properly.

The id column should could be used to index our data (optional).

To approach the first issue, we’ll have to modify the approach by which we loaded our data.

Step 2.a: Deal with Nested JSON Data

In the first step, we loaded our data directly via the read_json function in the Pandas library. This takes the raw JSON data and loads it directly into a DataFrame. To load nested JSON as a DataFrame we need to take advantage of the json_normalize function. However, this function takes a dict object as an argument. As such, we need to first load the JSON data as a dict as such:

import json

# Load via context manager and read_json() method
with open('people.json', 'r')as file:

    # load JSON data and parse into Dictionary object
    data = json.load(file)

# Load JSON as DataFrame 
df = pd.json_normalize(data)

# Print Result
print(df)

   id first_name last_name  age location.City location.State
0   1      Alice     Smith   18        Denver       Colorado
1   2        Bob  Anderson   24        Dallas          Texas
2   3      Alice    Jacobs   18   Los Angeles     California
3   4        Bob  Williams   24        Dallas          Texas
4   5        Pat  Anderson   60         Miami        Florida

Here we see the keys of the location field automatically being converted into dot-delimited column names.

Step 2.b: Reindex

This data will produce an extra column if exported directly to CSV (the index column.) There are several ways in which one could approach handling this, but a convenient means is the setindex method — it will create a visual representation of the change in the DataFrame as well. This is achieved as follows:

# set index
df.set_index('id', inplace=True)

# view result
print(df)

   first_name last_name  age location.City location.State
id                                                       
1       Alice     Smith   18        Denver       Colorado
2         Bob  Anderson   24        Dallas          Texas
3       Alice    Jacobs   18   Los Angeles     California
4         Bob  Williams   24        Dallas          Texas
5         Pat  Anderson   60         Miami        Florida

Step 3: Export to CSV File

At this point, our data is ready to be exported to a CSV file. Fortunately, Pandas has methods to both load and save CSV data. We can output our DataFrame to a CSV format file using the to_csv method as follows:

import os

# output DataFrame to CSV file
df.to_csv('people.csv')

# view file metedata
print(os.stat('people.csv'))

os.stat_result(st_mode=33206, st_ino=56294995342297639, st_dev=2997963831, st_nlink=1, st_uid=0, st_gid=0, st_size=231, st_atime=1657209167, st_mtime=1657209167, st_ctime=1657209167)

Here we save the data to a local file named people.csv and also confirm some file metadata after the process including created time and filesize. Let’s take a look at the data loaded in Microsoft Excel to see what our conversion process has produced:

json to csv data export pandas alpharithms — Figure 1: Exported CSV data loaded into Microsoft Excel displaying the result of converting JSON to CSV.

Here we see the Microsoft Excel version of our DataFrame represented as expected. This confirms that we have, in fact, converted JSON to CSV data using Pandas’ DataFrame as well as the normalize_json and to_csv methods.

Common Issues

One often encounters issues when working with JSON formatted files in Python. The json library — part of the Python standard library — provides useful tools for addressing such issues. However, sometimes the syntax and logical flow of these tools can be unclear which leads to exceptions of which can cause confusion. Below are some common issues.

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In most cases, this error results from attempting to use the json.loads() method on the filepath itself, rather than the contents of the file. For example, the following approach is incorrect:

# Trying to load the filepath
df = pd.read_json(json.loads('people.json'))

# Resulting Exception
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The correct approach would be to first create a file pointer to reference the file object as such:

# use context manager and get reference to file object
with open('people.json', 'r')as file:

    # use the loads() method on actual file data
    data = json.loads(file.read())

Note that using the read_json method is used in a similar method.

Final Thoughts

Pandas is an incredible tool for data scientists and hobbyists alike. Its powerful low-level optimizations make crunching numbers fast and its well-conceived functions make retain a syntactic simplicity of which one can stand in awe. There are plenty of considerations to make when dealing with data such as replacing missing entries, tabulating JSON data when necessary (similar to the CSV conversion process), and even aggregating time series data. Pandas provides tools for handling all these concerns and more!