Using Python to Get the Mean (Average) of Numbers

Python is arguably the most popular programming language for data science. As one might expect, it comes with a slew of built-in libraries that can handle statistical analysis such as mean, median, and mode calculations.

Depending on your use—there are several ways to approach using Python to calculate the average value of a set of numbers. Whether you’re in need of a weighed average, the harmonic mean, or something more exotic—python has several average functions that are anything but!

Obligatory Clarification

The term “average” and even “mean” are ambiguous and may refer to one (or several) different types of mathematical calculation. The arithmetic mean is the most common method—and the one I’d wager most people mean when they broach the subject.

Other types include the geometric mean, harmonic mean, and more than a dozen others such as the moving average. For a deeper dive into exactly how some of these are calculated, check out this article. For our discussion here, I’m going to glaze over the finer details and focus mostly on implementation in Python.

Using Python to Get the Average

Below are several approaches to getting the mean in Python ranging from the simple mean function built into the statistics module to third-party libraries like numpy. For the examples, I’ll be using the following randomly-generated set of numbers:

```import random

# Generate a list of 10 random numbers from 0-99
numbers = [random.choice(range(100)) for _ in range(10)]

>>> [59, 97, 94, 98, 54, 40, 96, 37, 11, 17]```

These are the numbers that will be used for each example to follow.

Statistics Library

Since Python 3.4 there is a standard statistics library that provides several methods to calculate the mean of a set of numbers. Among them are methods to calculate arithmetic, geometric, and harmonic means as shown below:

```import statistics

# Define a list of random numbers
numbers = [59, 97, 94, 98, 54, 40, 96, 37, 11, 17]

arithmetic_mean = statistics.mean(numbers)
>>> 60.3

geometric_mean = statistics.geometric_mean(numbers)
>>> 48.73877382924253

harmonic_mean = statistics.harmonic_mean(numbers)
>>> 35.868566290602814```

Note: Before 3.4 Python provided similar functionality via the `stats` library.

Numpy

NumPy is among the most-used numerical processing libraries among data scientists, along with other staples such as `pandas`, matplotlib, and `scikit-learn`. All of these are overkill for simple mean calculation but, if they’re already dependencies they can be convenient.

```import numpy

# The arithmetic mean
numpy_mean = numpy.mean(numbers)
>>> 60.3

# The weighted average (without weights)
numpy_average = numpy.average(numbers)
>>> 60.3```

Note: The `numpy.average` calculates a weighted average but, in the example above, isn’t provided with any data for weights. As such, a non-weighted average is calculated instead.

SciPy.stats

The SciPy library is focused mostly on probability distributions but provides some functions for mean calculation. The arithmetic mean, along with median and mode functions, are available as attributes of other larger functions. Overall, the scipy.stats module isn’t used for simple mean calculation. If you’re hellbent on doing so, the geometric and harmonic means are available as such:

```from scipy import stats

# The geometric mean
geometric_mean = stats.gmean(numbers)
>>> 48.738773829242575

# The harmonic mean
harmonic_mean = stats.hmean(numbers)
>>> 35.868566290602814```

Manual Calculation

For those that prefer vanilla Python code—the mean/average isn’t exactly a hat trick. The following illustrates approaches for calculating the mean using nothing but standard Python syntax:

```# Arithmetic Mean
arithmetic_mean = sum(numbers) / len(numbers)
>>> 60.3```

The geometric mean and harmonic mean can be done in vanilla Python but they’re not nearly as straight-forward. At the very least, one would want to make use of the `math.log` and `math.pow` functions.

Final Thoughts

Python makes it super easy to calculate the mean—the toughest part is deciding which method and/or library to use to do so! The three Pythagorean means are available as methods via the standard `statistics` library. Other third-party libraries like `numpy` can offer the same features but often add unwanted project overhead.

Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.