4.1 Monitoring Implementation, Data Quality, and Best Practices
Video Presentation
Learning Guide
Why consider best management practices for data management?
We collect data to provide information about how the resource has responded to management, and to inform future management decisions. This information needs to be correct and reliable!
There are many stages along the path from data collection to analysis and interpretation:
- Field data collection
- Data entry
- Data analysis
- Interpretation
Each of these stages has multiple steps. In any multi-step process, these are many potential weak points along the path where non-sampling errors can occur. Recall that non-sampling errors affect accuracy, and are not detectable statistically. But they are avoidable, and it is essential that we do all that we can to eliminate them.
When you report the findings of a sampling effort, you put your name on that report, and people are relying on the information that you provide to them. We build our professional reputation through the work that we produce, and we generally try hard to protect our reputations and credibility. We are all familiar with the saying “Garbage In, Garbage Out”, and we certainly don’t want people thinking of the work that we do as unreliable or suspect.
The following recommendations include ways that we can avoid making non-sampling errors that tend to occur as we collect and analyze data. Certainly it is not an exhaustive list, but a good start!
Field Data Collection
- Ensure that all individuals collecting data are trained and calibrated.
- Establish clearly defined ground rules for data collection
- Review methods for taking measurements
- Review how to record data
- Abbreviations
- Number formats
- How to indicate “repeated information”
- Establish clearly defined ground rules for data collection
- Practice data collection and data recording
- Training and calibrating ensures consistency
- Train observers and recorders together – helps to identify problems
- Enables you to identify and address problems EARLY
- Document all data collection methods and ground rules. This information should be included in the monitoring plan or sampling protocol, but it’s important to document any decisions that were implemented in the field. This step is important because you can’t count on remembering details, and even if you do, you may not be the person who needs this information in the future!
- Use well-designed data sheets. Although data may be collected using electronic recording devices, paper data sheets are probably the most common way to record data in the field. Data sheets need to facilitate and streamline data recording. A well-designed data sheet should be organized to efficiently collect information specific to the attribute or attributes being measured, and always includes fields for “metadata” – the where, when, who, and what describing the information that was collected. Characteristics of data sheets are covered in Chapter 9 in Measuring and Monitoring Plant Populations by Elzinga et al. (1998).
Fortunately, data sheets have been developed and tested for most types of data collection, and they are readily available from different sources. Some agencies may require that you use specific data sheets. Otherwise, you can find templates of data sheets in Measuring and Monitoring Plant Populations, the Monitoring Manual, and Sampling Vegetation Attributes. Depending on your sampling project, you may need to adapt existing data sheets to meet your specific purpose. In that case, be sure to include the essential fields for metadata, ask colleagues to review the new format, and test it out to ensure that it serves your purpose efficiently.
- Always record metadata on the data sheets. This is essential: without sufficient information that links the data on the sheet to the location, sampling unit, date, and project, the data sheet is unidentifiable and the effort to collect the data was wasted. Filling out metadata can be repetitive, and it may be tempting to take shortcuts by recording just one piece of identifying information, such as transect number. Avoid this temptation! You can save time by pre-printing certain fields before photocopying, or filling in fields during “downtimes” such as while traveling between sampling locations (assuming you are not the driver!).
- Take photos to document actual conditions and tie those conditions to the recorded data. This is an easy step, and provides very helpful information to individuals who were not present during data collection.
- is essential to avoid potential errors that occur of someone uses “creative” methods that are not aligned with the pre-determined methodology.
- Identify and correct data collection errors. This includes errors by both the observer and the recorder. For example, if one set of observers identify and record plant species using common names while others are using species codes based on scientific names, this creates a data entry nightmare that could easily be avoided. Similarly, if an observer consistently guides a pin flag to the ground instead of dropping it, that is a potential source of bias that needs to be addressed.
- Ensure that recorders write legibly! No, we don’t need to have the perfect handwriting that our first grade teachers used, but we do need to write to communicate clearly. The person recording data may not ultimately be responsible for entering the data, and if their handwriting is not legible or difficult to decipher, it can easily be misread
- Avoid worker fatigue, boredom, and physical stress – Taking care of workers is both ethical and good data management practice because it helps you avoid both data collection errors and lost time for medical reasons!
- When working in pairs, trade jobs periodically. Observing often involves repetitive motions such as bending over to identify species, determine the ground cover reading, etc. Recording can be monotonous and boring. Be aware of physical well-being and keeping your mind engaged.
- Make sure crew members are prepared to work in the “elements”. Be prepared for sun, wind, rain, cold, insects, and other potential hazards. As always, practice “Safety First!”
- Crews need to take breaks to rest, eat, and stay hydrated.
- is essential to avoid potential errors that occur of someone uses “creative” methods that are not aligned with the pre-determined methodology.
Data Entry
After all the work in the field, you want to be sure that the recorded information is accurately transferred into electronic format.
- Structure of electronic data sheet: Set up the electronic data sheet so that it is easy to transfer data from the written sheet into the computer. Electronic data sheets are usually created in a spreadsheet program, such as Microsoft Excel, to facilitate data manipulation after the data have been entered.
- Sometimes the electronic data sheet may exactly resemble the written sheet. This makes crosschecking the data easier, and it ensures that the metadata stay with the recorded information. The data can be re-arranged to facilitate analysis after it has been entered.
- If you want to organize the electronic data sheet to include information from multiple field data sheets, be sure to organize the electronic sheet so that every observation can easily be crosschecked, or tied back to the original paper copy. Usually this involves adding columns for site ID, sampling unit ID, observation date, and observer ID. Since the electronic data sheet doesn’t resemble the paper copy, be sure that each column has a “header” that identifies the contents of the column.
- Record the measurement units and metadata electronically, IN the electronic data file. For example, are the measurement units in centimeters, meters, inches, or feet? Be sure to document
- measurement units
- the transect and quadrat dimensions
- spacing between measurements
- species codes
- if nested quadrats were used, be sure to indicate the dimensions of the nested quadrats and which species were measured in which quadrat.
Some of this information, such as sampling unit dimensions and measurement units, can be included at the top of the electronic data sheet. When there is a substantial amount of information that needs to be documented, it helps to create one or more separate worksheets that have informative names such as “Notes” or “Species Codes”. It is essential that these documentation worksheets are included IN the same electronic file as the data.
- How to handle missing data: Sometimes you will find a blank cell or line on the field data sheet. Unfortunately, missing data happens…. Be sure you know what to do when there is missing data – DO NOT insert a zero in place of missing data!
A Zen master of data management once famously said:
”Zeroes mean something – they mean nothing!”
When a piece of data is missing, you have no information about the actual measurement value. By inserting a zero for missing data, you are artificially selecting a value for that measurement.
How you handle missing data depends on the statistical software that you’ll be using to analyze the data. If the data will be analyzed in MS Excel, leave the cell that corresponds to the missing data blank. If you are using more sophisticated software such as SAS or SPSS, you may want to enter a period “.” in place of the missing data, or you may need to define a special value, such as “-99” and use that only in place of missing data. The point here is to know what to do when you encounter missing data during data entry, and ensure that everyone who enters data follows the same protocol.
- Verify the data for correctness: Once the data have been entered electronically, you need to compare the data from the field data sheets to the electronic data to verify the accuracy of the electronic data. Although this is time consuming, it is a very important step. Otherwise, how will you know that the data have been entered without mistakes? There are several ways to detect data entry errors:
- Crosscheck data in the electronic version to the field data sheets.
- Have one person read the electronic data out loud while another person reads the data on the corresponding field data sheet. This is the preferred approach, but it involves more time because two people are needed.
- If you don’t have the personnel available for the “team approach”, one person should visually compare the electronic data to the field data sheets, looking for mismatches between the two versions.
- Next, sort or filter the data to check for numerical abnormalities in the electronic data.
- Crosscheck data in the electronic version to the field data sheets.
A Word of Caution! Before you sort or filter your data, be sure to save the file! This protects your data (and the hard work you did to enter it) in case you accidentally rearrange or move a subset of the data while you are sorting.
Look for the following values:
- Outliers, unusually large or small values compared to the other values in the data set. Flag or highlight the outliers and verify them in the original data sheets. For example, if most of the values range between 30 to 80, odd values such as 3 or 425 need to be verified for accuracy.
- Unusual formats, relative to the other entered values. This includes negative values, decimal points when the recorded values weren’t collected with decimal points, too many digits after the decimal point, two decimal points, alphabetical or alphanumeric data where numerical data are expected and vice versa, etc. Look for anything that could be related to a typo, and refer to the original data sheets to verify the correct value.
The take-home message: Take time to ensure that the electronic data accurately reflect the information that was recorded during field data collection.
- Protect the data! After all the work of collecting data in the field and entering and checking the electronic data, be sure that you don’t “lose” it! Photocopy or scan the hard copies of field data sheets (scanning saves paper), organize the original hard copies, and store them in a safe location. Similarly, be sure that electronic files are given appropriate and informative filenames to facilitate rapid retrieval, and make backup copies of electronic files that are stored on separate computers or servers to ensure that the files can be retrieved despite computer failures and crashes.
Data Analysis
Data analysis includes data conversion and manipulation, calculation of descriptive statistics and inferential statistical analyses.
A note of caution! You’ve gone to a lot of trouble to ensure that the electronic data accurately reflect the data that were collected in the field. The data is “raw”, or un-manipulated at this point. You will probably need to manipulate the data somehow to prepare it for analysis, and this often involves multiple steps. Rather than working directly on the “raw” data, you may want to copy the worksheet and rename it “working” data before you start to do your data conversions. As you manipulate your data, paste the converted data into new cells rather than pasting into the cells where the data are currently located. This protects your work in case you make a mistake, manipulate only a subset of the data, or a cat runs across your keyboard while your back is turned!
Data conversion: Some data need to be converted using mathematical formulas.
For example, line-point measurements are collected as COUNTS along a line, and need to be converted to units of PERCENT. Biomass data collected in units of grams/m2 may need to be converted to units of kg/ha. Whenever you use a mathematical formula to manipulate data, verify that the formula is correct. Don’t just trust to luck or “wing it”, because it is easy to make calculation errors.
Follow this approach to avoid errors:
-
- Write the equation by hand, including all of the units in your equation.
- Cancel units by hand to ensure that the end result is in the desired units.
- Test your equation with a known quantity.
- Once you are certain that the equation is correct, enter it as a formula in the spreadsheet.
Data Manipulation: Depending on how you plan to analyze the data, you may need to rearrange the data to get it ready for statistical analysis. Again, take care while copying and moving data to ensure that you don’t accidentally omit or “double-copy” values from the data field.
Data Summarization: Calculate descriptive statistics and/or inferential statistics.
Interpretation
You have gone to a lot of effort to get to the point where you can interpret your results! Before you get too far with interpreting the results, be sure to do a reality check.
- Carefully review the results/output for red flags. Crosscheck the output to the input. Does the count or number of observations in the output match the count or number of observation in the input?
- Does the output make sense? This is your time for a reality check.
You should have a ballpark idea, based on your experience at the site (if you collected the data) or based on photos from the site of what a reasonable result would be. This last reality check is important. If the output suggests a condition that does not seem reasonable based on your ballpark estimate, then check the data to find out why. Does the density estimate seem unrealistically high? Then figure out which sites or sampling units are responsible for the high estimate, and make sure that this reflects REALITY, not a data entry error.
The vegetation is quite patchy, but there is obviously a high percentage of bare ground. Given the patchiness of the vegetation, you could develop a ballpark estimate of 65%-90% bare ground. If the data analysis indicates that there is 25% bare ground, this would raise a red flag because it is a substantial departure from your ballpark estimate. Before interpreting this result, you should check the input, look for sampling units with low bare ground percentages, and check the electronic data and original data sheets to make sure that the results reflect REALITY, not a data entry error.