3.2 Positioning Sampling Units Across Landscapes
Video Presentation
Learning Guide
Decisions about how to position sampling units are made with respect to the need for good interspersion throughout the population or area of interest along with the practical constraints of time and expense.
Remember that we are making inferences about an entire population based on the samples collected. Without good interspersion throughout the area of interest, we have a reduced likelihood that our sample adequately represents the population of interest. Careful consideration of both interspersion and available resources at the beginning of a study can greatly improve sampling efficiency and our ability to make valid inferences.
Most statistical analyses assume that each sampling unit had an equal probability of being selected, and we generally accomplish this through random selection of units. Many approaches to selecting sampling units (i.e., locating sampling units in the area of interest), exist. Nearly all of them include some aspect of randomization, and each has advantages and limitations that should be considered when designing a sampling protocol. The strategy should be selected based on case specific monitoring and sampling objectives, site characteristics, and practical constraints of time and available resources.
Simple Random Sampling
For simple random sampling, each sampling unit has:
- an equal probability of selection
- selection of sampling units is independent
Simply put, if one sampling unit is selected, it does not affect the chance of any other sampling unit being selected.
There are two approaches to locating sampling units with simple random sampling: 1) the coordinate method, and 2) the grid method.
Coordinate Method
This approach uses an X,Y coordinate system to locate sampling units in the study area. An X,Y coordinate system is superimposed over a map or aerial image of the study area, and points are randomly selected for sampling using coordinate pairs. Reject any points that are not included in the study area (Figure 1a).
Figure 1. Approaches to simple random sampling: a) the coordinate method, in which 2 points (yellow stars) are accepted for sampling and 1 point (red “x”) is rejected; and b) the grid-cell method, in which 10 quadrats (green squares) are selected from the entire set of possible quadrat locations in the study area.
The coordinate system works well when sampling units are points or small quadrats, but has limitations for transects or large plots:
- Points near the study area boundary could be discarded if the transect were to extend outside of the study area; by repeatedly discarding points near the boundary, we could introduce a bias towards sampling near the center of the area.
- Sampling units may overlap if points are relatively close to each other.
Grid-Cell Method
With the grid-cell method, the area of interest is overlaid with a conceptual grid, in which the size and shape of each cell in the grid matches the size and shape of the sampling units: by overlaying the grid, the location of every possible, mutually exclusive sampling unit is identified. Each grid cell is given a unique identifying number. The sampling unit locations are randomly selected, either by putting all of the numbers in a “hat” and selecting them without replacement, or by using a random number generator, such as those found in most spreadsheet programs or using a random number table. The grid-cell method is one of the most efficient and convenient methods of randomly positioning quadrats.
Advantages of Simple Random Sampling:
- Best applied in small, relatively homogeneous areas that do not require a large sample size
- Easy to analyze statistically
- Meets assumptions of independence
Disadvantages of Simple Random Sampling:
- Potential for poor dispersion of sampling units in the study area (Figure 2)
- Potential for reduced sampling efficiency in large areas if significant travel time is required between sampling locations
Figure 2. Due to random selection of sampling locations, relatively large areas may not be included in the sampling effort. The dotted tan lines delineate large areas with few sampling points.
Restricted random sampling
Restricted random sampling is used to achieve good interspersion of sampling units when the sample size is relatively small (e.g., n < 25 or 30). Once the sample size (n) has been determined, the study area is divided into n segments, without regard to vegetation type, topography or soils. A single sampling unit is then randomly located within each of the segments (Figure 3). In this way, sampling units are still randomly located, and interspersion is improved within the area of interest.
Figure 3. Diagram illustrating location of transects in an area divided into equal sized segments using restricted random sampling.
Stratified random sampling
Stratified random sampling is commonly used to when portions of the study area can be classified into 2 or more sub-units, or strata, that share common characteristics which are unlikely to change over time, such as vegetation types, soil types, ecological sites, or land forms. Stratification should not be based on the attribute of interest because that represents a potential source of bias. Once the area is stratified, the number of sampling units assigned, or allocated, to each of the strata is calculated using an allocation strategy.
- Proportional allocation: the number of sampling units assigned to each stratum is proportional to relative size of each stratum to the size of the study area. For example, in a 30 hectare (30 ha) study area that includes sagebrush grassland (15 ha), ponderosa pine savanna (9 ha), and riparian meadow (6 ha), and we know that we are going to sample the entire area using n = 50 sampling units, we would allocate 50% (25 units) to the grassland, 30% (15 units) to the savanna, and 20% (10 units) to the riparian meadow (Figure 4a). Once the allocation has been determined, units are randomly located within their assigned strata.
Figure 4. Location of sampling units to strata using 2 allocation strategies; a) proportional allocation, and b) optimum allocation.
- Optimum allocation: the number of sampling units assigned to each stratum is weighted by both the relative area and an estimate of the variability observed in each stratum. This allocation strategy is based on the concept that areas (strata) with higher variability will require more intensive sampling to achieve desired precision. This approach aims to provide the highest quality information for a given level of sampling effort.
Expanding on the previous example, if the variability of the vegetation had been estimated through pilot sampling, we may find that the vegetation in the sagebrush grassland is far more homogeneous (lower standard deviation) than in either the savanna or meadow areas (Table 1). A new weighting factor (Column D in Table 1) is used to calculate the allocation of sampling units to each stratum. With optimum allocation, the number of sampling units in the grassland decreased from 25 to 14, while the number of units in the savanna increased from 15 to 27 (Figure 4b).
Table 1. Calculation of sampling unit allocation in a grassland-savanna-riparian site using an optimum allocation strategy. Allocation to each stratum is based on the relative proportion of the area (Column A) weighted by the variability of the vegetation (Column B).
Advantages
- More efficient sampling
- Improved quality of information in a variable environment
- Data from each stratum may be analyzed separately or in combination
Systematic Sampling
Systematic sampling involves the systematic placement of units at equal distances from an original random point. This sampling design results in better dispersion of sampling units than simple random sampling and can greatly reduce search and travel time compared to simple random sampling. For example, transects may be regularly positioned throughout the study area (Figure 5). The first transect is randomly located, and subsequent transects are spaced at regular intervals from the first transect. If each transect includes sub-samples such as quadrats (or points), the first quadrat is randomly located along the transect, and subsequent quadrats are regularly space along the transect. The positions of the first quadrats on subsequent transects are also randomly located.
Figure 5. Illustration of systematically located transects. Location transects are relative to the random location of the first transect (blue arrow). Within each transect, the location of the first quadrat is randomly located (blue circle), and other quadrats are positioned at regular intervals relative to the first quadrat.
In this example (Figure 5), the first transect was positioned along a baseline at the 7m position, and each subsequent transect was positioned at 10m intervals (e.g., 17m, 27m, 37m, etc.). The first quadrat on the first transect was randomly positioned at the 4m mark, and subsequent quadrats were placed at 5m increments from then on (e.g., 9m, 14m, 19m, etc.), On the second transect, the position of the first quadrat was randomly positioned at the 3m mark, and all quadrats on that transect were positioned relative to the first one.
Assumptions for systematic sampling differ from simple random sampling because unit placement is not independent, and this can seriously impact the effective sample size. In the above example of 8 transects with 9 quadrats each, is the sample size 72 quadrats or 8 transects? Clearly, we would prefer the larger sample size provided by quadrats being the sampling unit. To consider the individual quadrats as sampling units, they must be sufficiently spaced apart to ensure that the measured values in adjacent quadrats are not correlated. Since most vegetation is clumped, this would almost certainly be the case when the quadrats are located relatively close to each other on transects.
How far apart do quadrats need to be to be considered independent? General guidelines suggest that quadrat spacing should be far enough apart that multiple quadrats do not fall within the same vegetation gap, on the same plant or clone, or in the same microsite. When quadrats or points are not sufficiently separated to be considered independent or when transects are permanent, the entire transect should be treated as the sampling unit.
Subjective Sampling
Subjective sampling, also known as selective sampling, involves deliberate placement of plots in specific areas that meet the investigator’s interests or management objectives. The term “subjective” applies to this type of sampling because the selection of sampling locations is based on subjective decisions about what areas we consider to be “representative” in terms of how the study area responds to change or management. However, the degree to which these sites actually reflect the larger area is questionable and depends on the judgment of the person selecting the sites.
Advantages:
- can be very effective approach to determine whether management strategies are working to meet management goals
Disadvantages:
- difficult to make inference to entire management area due to small sample sizes
- subjective location of sampling locations violates assumptions of independence
- increases potential for bias due to lack of randomization
There are two types of locations commonly identified for sampling in natural resources monitoring:
Key Areas and Critical Areas
Key areas are selected because they are representative of the management unit and will respond to management changes in a way that is “typical” for the area. Key areas serve as indicators of land condition, trend, or seasonal use by grazing animals, and are often selected based on vegetation type, physical features, or location relative structural improvements such as fences or watering points.
Critical areas are selected because they contain unique or special values such as riparian areas, rare plants, or habitat for threatened or endangered species. An example of a critical area is a sage grouse lek, or high-quality nesting habitat. Monitoring designed to determine the impacts of management designed to enhance sage grouse habitat and populations should focus on these critical areas.
Cluster Sampling
Cluster sampling is used to measure small-scale variation within a cluster of individuals, and large-scale variation among clusters. Quadrats or belt transects are randomly located in the study area, and measurements are made on the “cluster” of individual plants within the larger unit. Cluster sampling is best used when it is difficult to take a random sample of the element of interest, and is one of the most efficient ways to measure individual plant characteristics.
Figure 6. Illustration of cluster sampling, in which 5 belt transects are positioned in the study area, and fruit production on each plant located within each transect is measured.
For example, researchers in Montana determined the effect of targeted sheep grazing on spotted knapweed flower production by counting flowers on each individual plant rooted within 10 quadrats. The quadrats served as the clusters in which individual plants were located, and flower production of each individual plant was measured.
Double Sampling
Double sampling involves frequent estimation and infrequent measurement of an attribute. Double sampling is an efficient way to increase the amount of information gathered for attributes that are difficult, expensive or time consuming to measure. If there is a strong relationship between estimated values with measured values, we can use linear regression to predict the actual value of the attribute from the estimated values.
To illustrate, herbaceous biomass is very labor-intensive to directly measure by harvesting:
- clipping, drying, and weighing forage is time-consuming
- time limitations tend to reduce the number of samples that can realistically be collected
- smaller sample sizes tend to result in reduced precision
Fortunately, biomass lends itself to estimation:
- estimation can be done more quickly than actual measurement
- when done by trained observers, estimated values may be highly correlated to measured values
Double sampling is a powerful approach to enhance the quality of information that we can gather for biomass monitoring.
For example, we could estimate biomass in 36 quadrats (frequent estimation), and harvest biomass in every 3rd of those quadrats (infrequent measurement) (Figure 6). Using linear regression analysis, we develop a regression equation to describe the relationship between estimated and measured values.
Figure 6. Illustration of double sampling, in which biomass is estimated in the solid quadrats, and both estimated and measured in the striped quadrats.
We then use that equation to calculate biomass from the estimated quadrats (Figure 7). The regression equation was used to adjust the estimated mean value (31.7 g/quadrat) to a corrected estimate (29.0 g/quadrat).
Figure 7. Graphical illustration of scatter graph of estimated and measured biomass values. The linear regression equation can be used to adjust biomass estimates to more closely approximate actual biomass.