pandas time series random data banner

Generating Artificial Time Series Data with Pandas in Python

Generating Time Series data with Pandas is a useful skill to have for a number of reasons. It helps set up a data environment for quick testing of new approaches, allows for flexibility away from one’s main computing environment, and is a breeze to do.

Pandas is among the defacto Python libraries used in machine learning, statistics, and computation-heavy workflows. Time series data is of such deep interest among methods of statistical analysis that it’s no wonder Pandas makes ample accommodation.

Pandas Time-Series Generation

In this quick example, you’ll learn how to generate a sample set of Time Series data to load as a Pandas Dataframe for whatever purpose you see fit.

import pandas as pd

# Generate series from start of 2016 to end of 2020
series = pd.date_range(start='2016-01-01', end='2020-12-31', freq='D')

>>> series

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
               '2016-01-09', '2016-01-10',
               '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
               '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=1827, freq='D')

That’s it. We’re done.

Creating Sample Data

The example above creates a dummy Times Series sequence but without data, it isn’t very useful. Fortunately, pandas is deeply integrated with NumPy and can leverage that module to create some random data to associate with the Time Series with relative ease. This is done as such:

# Add a column of random integers to each date entry
series['nums'] = np.random.randint(0, 42, size=(len(series)))

>>> series.head()

2018-01-31     7
2018-02-28    35
2018-03-31    20
2018-04-30    40
2018-05-31     5

This adds a new column of randomly generated integer values to our data. The data entered here would obviously be something relevant to the domain one was testing. There’s a time and place for generating random numbers, like assignments during random sampling, but not often for analysis. With that in mind, let’s take a look at this random Time Series data we’ve just generated:

pandas time series random data
A plot of the random integers paired with Month-based matching to our random Time Series

Final Thoughts

Random data is very convenient for testing new algorithms, getting up and running fast in an unfamiliar programming environment, and for making quick visualizations. Random Time Series data can help sidestep the hurdle of having to sanitize and clean data before trying out new methods of analysis.

After all, the quickest way to kill your buzz over that new machine learning library is to have to pause for an hour while you prepare your data for a test drive! Fortunately, pandas make short work of such data needs. This library, along with NumPy and many other data-oriented libraries, illustrates the strong support that continues to keep Python among the most popular programming languages out there.