Generating Time Series data with Pandas is a useful skill to have for a number of reasons. It helps set up a data environment for quick testing of new approaches, allows for flexibility away from one’s main computing environment, and is a breeze to do.
Pandas is among the defacto Python libraries used in machine learning, statistics, and computation-heavy workflows. Time series data is of such deep interest among methods of statistical analysis that it’s no wonder Pandas makes ample accommodation.
Pandas Time-Series Generation
In this quick example, you’ll learn how to generate a sample set of Time Series data to load as a Pandas Dataframe for whatever purpose you see fit.
import pandas as pd # Generate series from start of 2016 to end of 2020 series = pd.date_range(start='2016-01-01', end='2020-12-31', freq='D') >>> series DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08', '2016-01-09', '2016-01-10', ... '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25', '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29', '2020-12-30', '2020-12-31'], dtype='datetime64[ns]', length=1827, freq='D')
That’s it. We’re done.
Creating Sample Data
The example above creates a dummy Times Series sequence but without data, it isn’t very useful. Fortunately, pandas is deeply integrated with NumPy and can leverage that module to create some random data to associate with the Time Series with relative ease. This is done as such:
# Add a column of random integers to each date entry series['nums'] = np.random.randint(0, 42, size=(len(series))) >>> series.head() nums date 2018-01-31 7 2018-02-28 35 2018-03-31 20 2018-04-30 40 2018-05-31 5
This adds a new column of randomly generated integer values to our data. The data entered here would obviously be something relevant to the domain one was testing. There’s a time and place for generating random numbers, like assignments during random sampling, but not often for analysis. With that in mind, let’s take a look at this random Time Series data we’ve just generated:
Final Thoughts
Random data is very convenient for testing new algorithms, getting up and running fast in an unfamiliar programming environment, and for making quick visualizations. Random Time Series data can help sidestep the hurdle of having to sanitize and clean data before trying out new methods of analysis.
After all, the quickest way to kill your buzz over that new machine learning library is to have to pause for an hour while you prepare your data for a test drive! Fortunately, pandas make short work of such data needs. This library, along with NumPy and many other data-oriented libraries, illustrates the strong support that continues to keep Python among the most popular programming languages out there.