Estimators: Sampling Measures to Estimate Population Values

statistical estimators banner

Estimators are measures taken from sample populations to estimate parameter values for entire populations. An example is the sample mean being utilized to represent the population mean. The value of an estimator is known formally as the estimand and can be categorized into several types, reflecting several characteristics, and goes by several names.

The field of statistics is based on making estimates about a population based on a smaller group (sample.) The values of the measure taken from sample populations can serve as estimators for parameter values of the entire population. The distinction is basic but with many subtle differences in characteristics that may affect measurable outcomes if applied improperly or haphazardly.

TL;DR – Estimators are measured values from sample populations that represent parameter values for the entire population. For example, the sample mean is an estimator for the population mean.

Population vs. Sample

population vs sample statistics
Sample groups contain only a portion of the members of a total population and are used to generate statistics to estimate values of the total population.

Estimators can be understood by considering the relationship between populations vs. sample populations. A population represents the entirety of a group whereas a sample population represents a smaller portion. Descriptive measures such as mean, standard deviation, and variance are values (a.k.a parameters) when they are obtained from a population. These same measures are called statistics when obtained from a sample population.

Sample populations are used when information about an entire population is not available. For example, the average salary of fifteen Fortune 500 CEOs (a statistic) could be used to estimate the average salary of all Fortune 500 CEOs (a value.) The average salary of the sample population serves as an estimator for the population. The vocabulary can get a tad confusing here, however; the estimator for the population is still considered a statistic for the sample population!

Estimator vs. Statistic

statistic vs estimator
A sample statistic is referred to as an estimator when it is used to estimate a value within the total population. (click to enlarge)

Sample statistics are used to estimate values (a.k.a. parameters) in a population. This is the fundamental basis of the field of statistics—estimating values of large groups based on a smaller number of observations. Similar measures are referred to by different names in many cases. One such case is when discussing both the population and sample population.

The different observational grouping types (sample vs. population) determine when a statistic might be referred to as an estimator. Consider the illustration above: the mean value of the yellow sample group is a statistic, commonly named X-Bar.

The sample mean can also be used as an estimator for the orange population group, where it is then referred to as an estimator. If the entire population was measured it would simply be a value referred to as the population mean, commonly represented by the Greek mu symbol.

Kind of like calling eggs, peppers, onions, and mushrooms ingredients once they have been mixed into a bowel prior to making an omelet. Same stuff—named differently based on its intended use.

Types of Estimators

point estimates vs interval estimates
Point estimators measure a single value from sample population whereas interval measurements estimate a range of values in which a measure is likely to fall. (click to enlarge)

Estimators come in two main types: point estimators and interval estimators. Their names are assigned appropriately in that point estimators represent discrete values (a single point) and interval estimators represent a range of values (an interval of values.)

Point Estimators

Point estimators estimate parameters that are not measurable in the population. These may be parts of partial sets or whole sets of population parameters. In other words, sometimes other population parameters are known while others are not. The use of an estimator for one measure does not guarantee the necessity or use of an estimator for another.

Interval Estimators

Interval estimators are related to confidence intervals in that they represent a range of values in which an appropriate point estimator is likely to be contained. These intervals are typically constructed with a 95% or 99% confidence meaning there is a 95% or 99% probability the estimator value will be contained in that range of values. Higher levels of confidence generally reflect larger intervals of possible values.

Important Estimator Characteristics

Estimators have several traits of which one needs to be aware and characterize the difference between population measures and sample statistics. These characteristics describe the value of an estimator in addition to offering perspective on how and when it might be best considered to represent population parameters.

Bias: The difference between the values of the estimator and the parameter value being estimated. If there is no difference there is no bias.

Consistency: Measure of how closely an estimated value stays to the parameter measure as size increases. Can be checked by the measure of the estimator’s corresponding expected value and variance.

Efficiency: The measure of the variance of all unbiased and consistent estimators and is dependent on the distribution. For example, sometimes the mean is a more efficient measure than the mode.

Invariance: The trait of an estimated to not change as data, interval, or other measure is scaled. Invariance can still apply to a statistic that remains invariant most of the time.

Shrinkage: The reduction of extreme values towards a central value such as the median. Can help provide more stable estimations for population parameters but is subject to bias, especially with asymmetric distributions. Shrinkage estimators are obtained by shrinking raw estimators. Examples include lasso and ridge estimators.

Sufficiency: An estimator is determined as “sufficient” in cases where no other statistic provides additional information. The sample mean is an example of a sufficient estimator.

Final Thoughts

Estimators are powerful tools used to derive parameter values during the development of statistical models. These parameters are used in many applications ranging from linear regression to more advanced machine learning models. Estimators are fundamental statistical tools but come with the potential of advanced nuances to take into consideration. Different distributions are better served from different estimator selections. The key to deriving effective estimators is to be aware of the nuances and characteristics of one’s data and always consider multiple options.

Zαck West
Full-Stack Software Engineer with 10+ years of experience. Expertise in developing distributed systems, implementing object-oriented models with a focus on semantic clarity, driving development with TDD, enhancing interfaces through thoughtful visual design, and developing deep learning agents.