2.2 Making Inferences
Video Presentation
Learning Guide
The purpose of sampling is to make inferences about the population. We often use these inferences to inform management decisions and actions in natural resources, so we want to make inferences with precision and confidence.
In order to make inferences about a population:
- We sample from the population (repeatedly)
- We estimate central tendency and variability.
The sample mean () is the most common expression of central tendency, while the standard deviation (s or SD) is the most common expression of variability.
Point Estimates and Interval Estimates
Consider a situation where we conduct sampling in an area that has been reseeded with perennial grasses. We count the number of grass seedlings in many plots and report the mean seedling density, = 15 seedlings/m2. By doing this, we have reported a point estimate, or a single value. There is no information about how variable the measurements were: this could be very important from a management perspective.
For example, if the seedling density measurements were highly variable, it could indicate that seedling emergence and establishment was spatially variable or patchy. Patchiness could be due to localized differences in soil texture and soil moisture conditions, or problems during the reseeding process. Regardless of the reasons, relatively large patches with very few seedlings may need to be reseeded.
Reporting the estimate as an interval estimate provides essential information about precision and variability. If seedling density was reported as 15 ± 12 seedlings/ m2 ( ± SD), this might trigger a management action to inspect the area and decide if additional seeding was needed. If seedling density was reported as 15 ± 1 seedlings/ m2, this would indicate that, overall, the estimated mean density was well-represented on the site. Therefore, reporting the interval estimate is usually a far more informative and powerful way to convey sampling results.
Precision of estimates may vary for several reasons:
- Inherent variability of the population: Some variables may assume a wide range of values, depending on the variable being measured and the population or area of interest. For example, in a heterogeneous environment, vegetation may be scattered or distributed in discrete patches. In this case, the inherent variability of cover or density measurements would be high. In other situations, measurements of some variable may be quite consistent. For example, in areas dominated by sod-forming grasses, we might expect to see very little spatial variability in production or cover.
In addition, some characteristics of organisms are highly consistent. For example, most flowers in the genus Astragalus produce a single fruit. Other characterstics are highly variable such as the height of blue grama (Bouteloua gracilis) culms, which can range between 20 and 75 cm, depending on environmental conditions.
- Sampling Error: Because a sample consists of a subset of sampling units from the population, some samples may include a group of relatively uniform sampling units, whereas others may include a collection of sampling units that are quite dissimilar. Thus, the precision of an estimate includes some variation that is simply due to the chance selection of the sampling units that are measured.
- Sampling Design: Each of the components of a sampling design or procedure may influence the precision of the estimate. We make deliberate decisions about sampling design with the intention to reduce variability and increase precision. Based on the results of pilot sampling, we may make adjustments to the sampling design in order to improve precision. The elements of sampling design are the topic of a subsequent lesson.
Inferring with Confidence
When making inferences about a population, it is good to know the precision of our estimate and be able to describe how confident we are in this estimate. A confidence interval is an interval around the sample mean. We use the confidence interval to express the amount of confidence we have that the true population mean is included within the interval limits A confidence interval expresses the likelihood that the sample mean from additional samples will fall within the interval. Therefore, confidence intervals include both a stated confidence level and interval width.
First, let’s examine the information that we use to construct confidence intervals, and then we will consider how to interpret them.
Confidence intervals are calculated from the following formula:
C.I. = X ± SE·t(α, df)
Where: C.I. = confidence interval
X = sample mean
t = critical t-value derived from t– distribution (found in a t-table)
α = false change (Type I) error rate = 100% – Confidence level
SE = standard error (standard error is sometimes written sx or stderr)
df = degrees of freedom; df = n-1
n = sample size, or number of sampling units in the sample
For example, we may calculate a 90% confidence interval for plant density estimated from a sample of 25 quadrats (n = 25), with = 15.5 plants/m2, and SE = 3.2 plants/m2.The calculation for the 90% confidence interval follows:
- 90% C.I. = 15.5 ± 3.2 1.711, where 1.711 is the critical t-value when α = 0.10 and df = 24
- 90% C.I. = 15.5 ± 5.5 plants/ m2
Another way to express a confidence interval is as [Lower limit ≤ µ ≤ Upper limit], or in our example:
- 90% C.I. = 0 plants/m2 ≤ µ ≤ 21.0 plants/m2
Note: the units are written here as plants/m2, but may also be written as plants m-2.
The confidence level is stated as a percent, such as 95% confidence, describing the likelihood that the interval includes the true population mean. The confidence level reflects the acceptable Type I error rate, α; a 95% CI implies an acceptable error rate equal to 5%, or α = 0.05. The confidence level is represented in the equation through the t-value. As we increase the confidence level, α becomes smaller, and t-values increase. For example, if we hold the degrees of freedom (df) constant, the t-value increases as α decreases (see highlighted rows in Figure 1). Therefore, assuming the other components of the equation are held constant, a confidence interval will be wider if the confidence level is increased.
Figure 1. A partial table showing the critical values of the t distribution. Critical t-values are determined by the Type I error rate (α) and degrees of freedom (u). Rows highlighted in yellow illustrate the increase in critical t-values as the error rate, α, decreases (left to right). The column highlighted in blue illustrates the decrease in critical t-values as degrees of freedom increase (top to bottom). (Adapted from Appendix 5 in Elzinga et al. 1998).
The width of confidence intervals is influenced by the sample size, n, in three ways.
- Critical t-values decrease as degrees of freedom (df) increase (see highlighted column in Figure 1). Since df= n – 1, sample size has a direct effect on CI width through its influence on the critical t-value.
- Standard error (SE) is calculated by dividing the standard deviation by the square root of the sample size, n (SE = s/√n). By increasing the sample size, the denominator in this equation also increases, and the resultant SE will be smaller in value.
- In general, precision tends to increase with increasing number of sampling units, because the influence of outlying values is diminished when there are more measurements. So increasing the number of samples, n, indirectly increases precision, which is represented in the CI formula by the standard deviation, s.
Confidence levels of 80%, 90% and 95% are commonly used in natural resource applications. The confidence level is set by the investigator before data are analyzed. Selecting the appropriate confidence level involves a trade-off between how confident we want to be, or what we feel is an acceptable error rate, and the intensity of sampling that we are able to conduct. If we choose an 80% confidence level, this means that we are willing to accept a 20% error rate and willing to make the wrong inference 20% of the time. On the other hand, if we choose a 95% confidence level, we have a lower error rate, but the interval will be quite a bit wider.
Our willingness to make the wrong inference depends on the consequences of making a wrong decision. For example, if we were estimating the population size of a rare plant that is being considered for listing as a threatened or endangered species, we would most likely select a higher confidence level than we would if we were monitoring the residual biomass of herbage to guide short-term stocking decisions.
There are a couple of options to decrease the interval width without changing the confidence level.
- We could increase the sample size to decrease the width of the interval – however, the decision to increase sample size must be weighed by the added time, effort and cost associated with more sampling!
- We could possibly make adjustments to other components of the sampling design in order to increase the precision of our sampling. For example, it may be possible to increase precision by adjusting the size or shape of quadrats, or by reorienting the direction of transects relative to an environmental gradient.
The influence of sampling design is always case-specific, and the need to evaluate the efficiency of sampling design reinforces the value of conducting pilot sampling before devoting resources to a full-scale sampling effort. The components of sampling design and their influence on precision are discussed in subsequent lessons.