Uncertainty Module 1.4

Statistics Terminology part 1: Mean, Standard Deviation, and Standard Deviation of the Mean

The researchers at Taconite Lake need to report a single value of the concentration of lead in the lake waters. Let's imagine that they really want to report a value that closely represents the population (true) value, so they've now made 500 measurements. They'll report the sample mean, or average measured value, for their n=500 number of samples ('replicates').

Reporting the average result alone is an incomplete picture. Since each measurement made is an approximation of the true value, each measurement has uncertainty. The average result will also have it's own uncertainty that must be reported. Remember that uncertainty is a range of values in which a measurement or result lies. To calculate the uncertainty associated with the result, the standard deviation is used.

In the previous section, we saw how the standard deviation is related to the normal distribution. By definition, the sample standard deviation is a measure of the precision of a data set. In other words, it shows how closely data are clustered about the mean value. Standard deviation is given by the following formula:

$$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}$$

in which $x_i$ is the value of a single measurement, $\bar{x}$ is the mean value and n is the number of samples. This formula is for the sample standard deviation. This standard deviation belongs to the individual measured values, and is not the standard deviation that applies for the mean value.

When experiments do not have a large number of samples, there is a distinct possibility that the calculated mean differs from the mean that would be obtained if we measured an infinite number of samples (e.g. the whole lake). In this case, we also need a statistical value to characterize the uncertainty in the value of the mean. This value is the standard deviation of the mean. The standard deviation of the mean is given by the following formula:

$$ s_n = {s \over \sqrt{n}} $$

The standard deviation of the mean describes the uncertainty of calculated mean for an experiment. Standard deviation of the mean better characterizes the random uncertainty that is present in an experiment. This value is one step away from being the value that is reported in a final result, but more on that later.

Table 2 below summarizes the differences in the two concepts of standard deviation and standard deviation of the mean.

*Table 2: Differences between standard deviation, and standard deviation of the mean*
	Standard Deviation	Standard Deviation of the mean
Formula	$$ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}$$	$$ s_n = {s \over \sqrt{n}} $$
Definition	The variability expected for a single measured value	The variability expected for a calculated mean from a set of measured values
Use	Report a single value with its uncertainty $x_i \pm s$	Report the mean value with its uncertainty $\bar{x} \pm s_n$

The standard deviation and the standard deviation of the mean both depend on sample size. The standard deviation of a sample is determined by the measured values and is an inherent property of a sample. As the sample size (n) increases the calculated standard deviation becomes a better estimate of the inherent variation in the sample.

The standard deviation of the mean directly changes with sample size because of the n term in its equation. As more samples are taken, the standard deviation of the mean decreases; we are more confident in the position of the sample mean, i.e. the sample mean becomes closer to the true population mean.

Statistics Terminology part 2: Confidence Level and Confidence Interval

The researchers at Taconite Lake have calculated the mean standard deviation and standard deviation of the mean for the ensemble of samples that were measured. Recall that samples are used to infer information about the population. We now need to relate these calculated values back to the population. This is done through the use of the confidence level and confidence interval.

The confidence interval is the mathematical way to connect the sample to the population. To begin our discussion, we will start with the formula for a confidence interval:

$$ Confidence\, Interval\, (at\, 95\%\, confidence\, level) = \bar{x} \pm \frac{ts}{\sqrt{n}} = \bar{x} \pm t{s_n}$$

The value "t" is known as Student's t-value which can be found in a table. This value can be thought of as a "fudge factor". The value of t depends on the confidence level and the number of replicates. While you will normally use a t-table to determine the value of t, for the purposes of this module we will use t = 2.0, which technically corresponds to the 95% confidence level if n = 60. In practice, t=2 is a rule of thumb used for the 95% confidence level regardless of n.

Since we have seen that all measurements contain a degree of uncertainty, it is important to be able to report uncertainty to a standardized value. In chemical analysis, values are reported with an uncertainty at a confidence level. You may have seen values reported "at the 95% confidence level" before. This is a common confidence level that is used in science. But what does it mean?

The confidence level expresses how sure we want to be that the true population mean will lie in the range we've found. A confidence level is a chosen value, with the 95% confidence level being the unofficial standard. Some fields report this as "p $\leq$ 0.05", which says the chance that the true value is out of the range is less than or equal to 5%. You should define your confidence level in advance of the experiment. How sure do you want to be of your result? What defines a suitable level of evidence in your study?

A confidence interval at the 95% confidence level indicates the range of values over which there is a 95% probability that the true population value lies within it. That is, the confidence interval is the span:

$$\bar{x} - t{s_n} < \bar{x} < \bar{x} + t{s_n} $$

The magnitude of a confidence interval depends on the number of replicates as well as the standard deviation. As n increases, our precision increases, and the confidence interval decreases, indicating that the mean value will fall within a smaller interval, showing our precision increased.

The magnitude of a confidence interval also depends on the confidence level chosen. It is counter-intuitive, but as the confidence level increases, the magnitude of the confidence interval also increases. If we want to be more sure that the true value falls into some range, we must make the ranger larger. Tables of Student's t-value show the value increases as the desired confidence level increases, and since t is larger, the confidence interval is expanded.

Another term that is often used is the confidence limit. The confidence limit are the end-points of the confidence interval range. Figure 9 below summarizes these concepts.

Schematic showing confidence limit and interval on Gaussian curve — Figure 9: Summary of confidence level, interval and limits for t=2.0 at the 95% confidence level

Statistics Terminology part 3: Reporting Results

Results are reported using confidence levels and confidence intervals. It is important that all components are included when reporting a result. Standard practice is to report the uncertainty (via the confidence interval) with only one significant digit. If using the value in following calculations, you would carry more significant digits to avoid rounding error.

A general equation for how results are reported is below; in it note we report the range as $\pm$ half the confidence interval.

$$result = \bar{x} \pm t{s_n} \, units\, (at\, confidence\, level) $$

Table 3 provides example results that might have calculated for the determination of lead in Taconite Lake based on 500 samples of lake water.

*Table 3: Example summary of results for Taconite Lake lead concentration measurements*
	Calculated Value	Value rounded to appropriate sig. figs.
Mean concentration of lead	0.19952 $\mu$g/l	0.200 $\mu$g/l
Standard deviation	0.012529 $\mu$g/l	0.013 $\mu$g/l
Standard deviation of the mean	0.56031 x ${10^{-3}}$ $\mu$g/l	0.0006 $\mu$g/l (or 6 x ${10^{-4}}$ $\mu$g/l)
Confidence interval (at 95% confidence level and t=2.0)	1.12063 x ${10^{-3}}$ $\mu$g/l	0.001 $\mu$g/l

With these calculated values, the researchers would report their final result of the concentration of lead in the lake water as:

$$[Lead] = 0.200 ± 0.001 \, {\mu}g/L \, (95\% \, confidence \, level)$$

To simplify this result, we can use scientific notation and combine the result and uncertainty together:

$$[Lead] = (20.0 \pm 0.1)× {10}^{-2} \, {\mu}g/L \, (95\% \, confidence \, level)$$

The researcher's final result for the concentration of lead in Taconite Lake is below the maximum acceptable concentration. If lake-wide sampling for lead contamination is repeated in the future, statistical tests could be used to compare the concentrations found at different times. These statistical tests will be described in a later module. Next up, we look at how to combine different uncertainties together.

	Calculated Value	Value rounded to appropriate sig. figs.
Mean concentration of lead	0.19952 \(\mu\)g/l	0.200 \(\mu\)g/l
Standard deviation	0.012529 \(\mu\)g/l	0.013 \(\mu\)g/l
Standard deviation of the mean	0.56031 x \({10^{-3}}\) \(\mu\)g/l	0.0006 \(\mu\)g/l (or 6 x \({10^{-4}}\) \(\mu\)g/l)
Confidence interval (at 95% confidence level and t=2.0)	1.12063 x \({10^{-3}}\) \(\mu\)g/l	0.001 \(\mu\)g/l

Uncertainty Module 1.4

Statistics Terminology part 1: Mean, Standard Deviation, and Standard Deviation of the Mean

Statistics Terminology part 2: Confidence Level and Confidence Interval

Statistics Terminology part 3: Reporting Results

Module Navigation

Font size controls

Outside resources for this topic

This website layout and associated stylesheets based on Skidoo Redux

Copyright © 2017 Robin Stoodley and Amber Richardson, Department of Chemistry, University of British Columbia

Uncertainty Module 1.4

Statistics Terminology part 1: Mean, Standard Deviation, and Standard Deviation of the Mean

Statistics Terminology part 2: Confidence Level and Confidence Interval

Statistics Terminology part 3: Reporting Results

Module Navigation

Font size controls

Outside resources for this topic

This website layout and associated stylesheets based on Skidoo Redux Copyright © 2017 Robin Stoodley and Amber Richardson, Department of Chemistry, University of British Columbia

This website layout and associated stylesheets based on Skidoo Redux

Copyright © 2017 Robin Stoodley and Amber Richardson, Department of Chemistry, University of British Columbia