3 Easy Ways to Get Financial Data in Python for Stock Analysis

Python is the most popular language for financial analysis and comes with many libraries to make things super convenient, efficient, and fast! The one common prerequisite of all is financial data. Luckily, getting financial data in Python is a breeze with the help of several public APIs, free data sources, and well-developed libraries!
python financial data alpharithms

Python is often used for algorithmic trading, backtesting, and stock market analysis. In fact, it seems almost the canonical use-case for many tutorials I’ve seen over the years. Getting financial data in Python is the prerequisite skill for any such analysis.

In this article, you’ll learn how to easily get, read, and interpret financial data using Python. We’ll be using the Pandas library, the yfinance library, and a handful of useful helper methods. Readers should be familiar with basic Python syntax but needn’t have obtained a level of skill mistakable as guru.

Highlights

  • Understanding structured vs. unstructured financial data
  • What is OHLC data
  • Popular Python financial libraries
  • Getting data from various sources via Python including Yahoo Finance, Quandl, and Alpha Vantage
  • Deprecated APIs such as Google Finance

Financial Data 101

Financial data comes in many forms. The canonical format is tabular data (think spreadsheets) which can be formatted as rows and columns. This type of data is available from many sources such as finance.yahoo.com, Quandl, Alpha Vantage, and many brokerages.

Financial data can be bought, manually scraped from the web, or obtained from public APIs. Generally, financial data comes in one of 2 primary types:

  • Structured Data: Closing prices, financials, market performance, etc.
  • Unstructured Data: News articles, Social Media, Sentiment Analysis, etc.

Additionally, financial data can be further categorized as either Historical or Real-Time. In most cases, Real-Time data isn’t available from public APIs and must be purchased. We’ll be using mostly structured historical data for our examples here. These financial data are generally provided in a format that includes the following information:

  • Date
  • Open Price
  • High Price
  • Low Price
  • Closing Price
  • Volume

These data—often referred to as OHLC Chart Data—can be interpreted as Time Series data and are perfect for performing technical analysis. We’ll dive into this format in just a moment but, for now, just realize this is a standard format for historical pricing data within financial markets.

Pandas

Pandas is a powerful data science library that stores tabular data into memory in a very efficient manner. It makes the opening, processing, and subsequent saving of data fast and effective. It comes with a range of helper methods, data classes, and in the case of financial data—web APIs!

This article will glaze over much of the intricacies of the Pandas library—just know that it is complex! We will be mostly using the data_reader function, the DataFrame class, and miscellaneous statistic-generating functions like head(), info(), summary() and etc.

Required Libraries

Now that we know what to expect from our data, let’s consider how to get some financial data using Python! Before we get started, make sure the following packages are installed as they will be relevant for each data source. We’ll cover specific packages as we move along.

# Install the pandas library
pip install pandas

# Install the pandas-datareader library
# Note: Will also install pandas if not already installed.
pip install pandas-datareader

Yahoo Finance

yahoo finance python api alpharithms

Pros:

  • Free
  • Huge amount of data
  • Well-supported Python libraries
  • Integrated with many backtesting libraries

Cons:

  • No official API is available
  • Basic data only
  • Can get IP rate-limited or banned

Yahoo Finance provides historical data for a massive number of securities. You’ll find data for securities, currencies, and even cryptocurrencies like Bitcoin ($BTC-USD). We can use the pandas-datareader library as well as the yfinance library to get financial data from Yahoo Finance. Let’s consider both approaches:

Note: I would suggest using a proxy when accessing yahoo financial data. Both the yfinance library and pandas-datareader libraries accommodate this.

Using Pandas-Datareader

import pandas_datareader as pdr

# Request data via Yahoo public API
data = pdr.get_data_yahoo('NVDA')

# Display Info
print(data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1258 entries, 2016-08-08 to 2021-08-05
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   High       1258 non-null   float64
 1   Low        1258 non-null   float64
 2   Open       1258 non-null   float64
 3   Close      1258 non-null   float64
 4   Volume     1258 non-null   float64
 5   Adj Close  1258 non-null   float64
dtypes: float64(6)
memory usage: 68.8 KB

Here we see a 5-year historical period of OHLC data for $NVDA (NVidia Corporation) provided as a pandas’ DataFrame object with 1258 rows of data. Excluding imports and summaries—that took a single line of code.

Using yfinance

For this approach, we need to install the yfinance library as pip install yfinance. This library provides ample tools for working with financial data requests to the Yahoo Finance website. Keep in mind, however, this is not an official API and is subject to rate limiting, periodic breakage, and general quirkiness. Nonetheless, its the defacto Python library for OHLC data and can be used as follows:

import yfinance as yf

# Request historical data for past 5 years
data = yf.Ticker("NVDA").history(period='5y')

# Show info
print(data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1258 entries, 2016-08-08 to 2021-08-05
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1258 non-null   float64
 1   High          1258 non-null   float64
 2   Low           1258 non-null   float64
 3   Close         1258 non-null   float64
 4   Volume        1258 non-null   int64  
 5   Dividends     1258 non-null   float64
 6   Stock Splits  1258 non-null   float64
dtypes: float64(6), int64(1)
memory usage: 78.6 KB

Here we see the same 5-year historical data for $NVDA returned as the familiar pandas’ DataFrame object. We’ve used a little more complex syntax but achieved the same basic result. The yfinance library’s Ticker class doesn’t actually retrieve data. The history method helps us with that and takes a number of optional parameters.

By default, yfinance returns a previous months’ data as a daily time series. Note this method adds two additional columns: Dividends and Stock Splits; and also omits the adjusted close column. The Adjusted Close data takes into account such actions a dividends payouts and stock splits. The inclusion here assumes we’re comfortable calculating our own adjusted close.

Quandl

quandl python financial data alpharithms

Pros:

  • Free to use (rate limited)
  • Datasets can be downloaded
  • Official Python Library
  • Good API documentation

Cons:

  • Limited OHLC data
  • No real-time or delayed data for stocks
  • A limited number of free datasets
  • Free API access for non-rate limited use (or freer access at least)

Quandl is one of the largest data providers in the world. Their self-confessed mission is the “inspire customers to make new discoveries and incorporate them into trading strategies.” They’ve been around since 2013 and offer millions of free datasets. Yes, millions.

In 2018 they were acquired by NASDAQ and have continued to remain an authority on financial data ranging from equities and futures to options, currencies, and other non-financial market data such as housing, energy, and agriculture.

Quandl offers official APIs to access any public dataset for free. Here we’ll see how to get OHLC data via the official Quandl python library and also via the pandas-datareader. One important note is that the free Quandl OHLC data only goes up to 2018 at the time of this article’s writing. If you need more recent data and don’t want to pay this source isn’t for you.

Quandl Python Library

To get started with Quadl’s official API we need to install the python library as such: pip install quandl. This will install the official quandl python library and let us make up to 50 daily API requests without registering an account. Let’s get our financial data:

import quandl

# Get data via Quandl API
data = quandl.get('WIKI/NVDA')

# Summarize
print(data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4825 entries, 1999-01-22 to 2018-03-27
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Open         4825 non-null   float64
 1   High         4825 non-null   float64
 2   Low          4825 non-null   float64
 3   Close        4825 non-null   float64
 4   Volume       4825 non-null   float64
 5   Ex-Dividend  4825 non-null   float64
 6   Split Ratio  4825 non-null   float64
 7   Adj. Open    4825 non-null   float64
 8   Adj. High    4825 non-null   float64
 9   Adj. Low     4825 non-null   float64
 10  Adj. Close   4825 non-null   float64
 11  Adj. Volume  4825 non-null   float64
dtypes: float64(12)
memory usage: 490.0 KB

Our data is similar to before; it’s still a pandas’ DataFrame object, it still contains rows and columns, but we’ve got a lot more of it. By default, the quandl API returns all available data for the requested asset. Date ranges can be specified using a start_date="YYYY-MM-DD" and end_date="YYYY-MM-DD" pair of keyword arguments to the get method.

Note that our ticker syntax has changed a bit from the yfinance example. Instead of requesting “NVDA” we’re now requesting “WIKI/NVDA.” This syntax instructs the Quandl API to query the WIKI dataset for an entry labeled NVDA. Read the documentation for more on that.

pandas-datareader

As with the Yahoo Finance data, the pandas-datareader library also accommodates requests to the Quandl API. However, this approach requires that one have an API key to provide as an argument. API keys are free, don’t require any payment methods to be stored, and can be obtained via the Quandl signup page.

Note: you will have to confirm your email address before the key becomes active. With our API key in hand, we can get data via the pandas-datareader library as such:

# Necessary imports
import pandas_datareader as pdr

# Request Data
data = pdr.get_data_quandl("NVDA", api_key="YoUrApIkEyGoEsHeRe")

# Summarize
print(data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 411 entries, 2018-03-27 to 2016-08-08
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Open        411 non-null    float64
 1   High        411 non-null    float64
 2   Low         411 non-null    float64
 3   Close       411 non-null    float64
 4   Volume      411 non-null    float64
 5   ExDividend  411 non-null    float64
 6   SplitRatio  411 non-null    float64
 7   AdjOpen     411 non-null    float64
 8   AdjHigh     411 non-null    float64
 9   AdjLow      411 non-null    float64
 10  AdjClose    411 non-null    float64
 11  AdjVolume   411 non-null    float64
dtypes: float64(12)
memory usage: 41.7 KB

Here we see another familiar image: historic OHLC data provided as a pandas’ DataFrame class object. The QuandlReader class in pandas-datareader will default to the WIKI dataset if only a ticker is provided. This makes for convenient OHLV requests but may cause some confusion when trying to retrieve data from other datasets. Just stick to the Quandl-recommended syntax of DATASET/QUERY for those.

Alpha Vantage

alpha vantage api pyhton financial data

Pros:

  • Free to use
  • Large Amounts of Datasets
  • Offers Technical Indicators
  • Good API documentation
  • Intraday Data

Cons:

  • Rate limiting of API access
  • Real-Time Data is delayed

Alpha Vantage supplies a myriad of free data via API access. These data are free but not public meaning you need an API key. Such keys can be obtained by registering an account with Alpha Vantage. Entering your name, email, and status (educator, student, investor, etc.) will earn you an API key in a matter of seconds—you don’t even have to confirm your email! Let’s take a look at getting data from this API.

Unofficial Python alpha_vantage API

There is not official Python library for the Alpha Vantage API and their official documentation only details common HTTP requests via the requests module. This approach is 100% valid and will provide the OHLC data without issue. Syntactically, it’s a bit more cumbersome.

In pursuit of some sugar for our syntactic sweet tooth, we’ll use the well-developed alpha_vantage library. This is an unofficial API but, at least in my experience, the defacto Python Alpha Vantage API library. Let’s see how this library can retrieve our OHLC data:

from alpha_vantage.timeseries import TimeSeries

# Create an API object
ts = TimeSeries(key='UNO4CZQHSBZSN71N')
print(type(ts))

# Get daily OHLC data for NVDA
data, meta_data = ts.get_daily(symbol="NVDA")
print(data)

{
    '2021-08-05': {
        '1. open': '205.0000',
        '2. high': '207.3300',
        '3. low': '203.4200',
        '4. close': '206.3700',
        '5. volume': '21143537'
    },
    '2021-08-04': {
        '1. open': '199.9000',
        '2. high': '203.1800',
        '3. low': '198.2800',
        '4. close': '202.7400',
        '5. volume': '23130940'
    },

    ...

    '2021-03-16': {
        '1. open': '534.2600',
        '2. high': '540.5000',
        '3. low': '524.6700',
        '4. close': '531.6500',
        '5. volume': '6803240'
    }
}

We can see here the last several month’s worths of OHLC data from the Alpha Vantage database. However, we’ve got our first curveball: our data is returned as a dictionary object rather than the pandas DataFrame class object we’ve come to know and love. To get back to the basics, we need only supply the following argument to the TimeSeries object instantiation: output_format='pandas'. With that, our data looks like this:

            1. open  2. high    3. low  4. close   5. volume
date                                                        
2021-08-05   205.00   207.33  203.4200    206.37  21143537.0
2021-08-04   199.90   203.18  198.2800    202.74  23130940.0
2021-08-03   197.40   202.22  192.2000    198.15  30181074.0
2021-08-02   197.00   199.61  193.6100    197.50  21744397.0
2021-07-30   194.18   196.30  192.6300    194.99  18349746.0
...             ...      ...       ...       ...         ...
2021-03-22   516.51   535.78  516.2700    527.45   7445077.0
2021-03-19   510.00   516.86  504.5000    513.83   7480174.0
2021-03-18   525.46   527.36  508.6817    508.90   7354702.0
2021-03-17   521.59   538.13  519.5800    533.65   6096605.0
2021-03-16   534.26   540.50  524.6700    531.65   6803240.0

[100 rows x 5 columns]

Ahh, that’s much better. Now we can see that we have a 5-column DataFrame with 100 rows. To get more than the previous 100 periods’ worth of data, you can use the outputsize='full' argument in the ts.get_daily() method. This will return all available data.

pandas-datareader Alpha Vantage API

Again, the pandas-datareader library offers easy access to OHLC data via Alpha Vantage integration. The following code will retrieve historical data for $NVDA once again:

import pandas_datareader as pdr

# Get Alpha Vantage Data
data = pdr.get_data_alphavantage("NVDA", api_key='EnTeRYoUrApIKeYhErE')

# Summarize
print(data.info())

Index: 5027 entries, 2001-08-13 to 2021-08-05
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    5027 non-null   float64
 1   high    5027 non-null   float64
 2   low     5027 non-null   float64
 3   close   5027 non-null   float64
 4   volume  5027 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 235.6+ KB

Here we see historic OHLC data for $NVDA all the way back to 2001. This is much more data than the default method of other approaches so be prepared to filter as necessary via the start or end functions. Note: the pandas-datareader get_alphavantage method uses the TIME_SERIES_DAILY argument by default. Consult the Alpha Vantage API documentation for more information on alternatives.

Google

google finance api alpharithms

Pros:

  • Integrates with Google Sheets

Cons:

  • Not available via API
  • No Python library
  • Officially shut down in 2012

As of October 2012 Google no longer offers a financial API service. This news came as a shock to many but was ultimately reflective of many policy changes to public APIs. Google also does not provide financial data via metered APIs, as evidenced by a search on their APIs explorer. The Google Finance API is still available however but only as an Excel-style formula in Google Sheets:

Google Finance API Sheets Formula
Google Finance API data is still available, though only through formulaic requests in Google Sheets documents. (click to enlarge)

This isn’t a Python-centric way of getting financial data and is included here only because of historical relevancy. I suppose one could hack together an HTTP request method in Python for this—but that’s beyond the scope of this article. Check out the official Google Documentation for more information and syntax related to the GOOGLEFINANCE function in sheets.

Using the Data

Getting historical stock prices in Python is all well and good but what is one to do with such data? There are tones of approaches for analyzing OHLC data—allowing one to distill many numbers of useful insights based on expected outcomes and use-case. Below are some projects that can get you started:

These are only a few common applications of OHLC financial data in Python. These tutorials details how stock data can be used to identify patterns, correlations, and even predict future prices—all in the comfort of Python! Ultimately the only limitation to use of these data is the analyst’s imagination!

Review

We’ve seen here that getting financial data in Python can be approached in many ways. Whether via official APIs, well-supported third-party libraries, or even hacked-together approaches there seems no shortage of OHLC data to be had. These examples showcase why Python has emerged as the defacto programming language for data science—financial data included.

The yahoo finance API, particularly via the yfinance library, is deeply integrated within many backtesting frameworks. As such, it’s been my experience that this library is the defacto source for daily OHLC historic data. It’s not suited for intraday analysis or real-time but proves invaluable for basic analysis.

The data sources here are not meant to be an exhaustive list and are only reflective of common sources available with easy access (minus Google of course.) Some sources provide downloads such that local data can be retained for more efficient loading when entire universes of Stocks are being analyzed. The web-based access APIs discussed here are great for casual testing and on-the-go development.

Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.