“In God we trust. All others must bring data.” – W. Edwards Deming, Statistician
Development organizations and research consultants routinely collect field-based data for research and M&E. Ultimately, this data is used for evidence-based decision-making in order to optimize interventions and maximize project impact. In this post, we’ll run through 4 common mistakes made in the process of data collection that have the potential to seriously undermine the quality of the data you collect and therefore the soundness of your decisions and, in the end, the impact of your programs.
1. Inadequate Training
Having had the privilege of working with and supporting many field-based data collection projects, I can tell you this for sure; not properly training your enumerators is not an option. I’ve seen projects that rush to get started with data collection before thoroughly training their field staff, and it never ends well. Even experienced enumerators need orientation to your specific research project.
I will tell you why.
Firstly, it is important to ensure that all your enumerators’ not only interpret your questions in the same way, but that their interpretation is also consistent with your intentions. As an example, consider the common survey question: What is your marital status?
Enumerators might introduce all manner of biases in interpreting this question. For instance, is someone who’s been living with their boyfriend over the last 3 months married or not? Some enumerators will say yes, others will say no. For the purposes of your study, what is the correct interpretation? This is what you must clarify to your field staff, and you must endeavor to run through as many such grey areas as possible. As a bonus, this often a fun exercise, and it is a great way to get your field team to bond.
Secondly, if you are deploying a software application for data collection, as you should be doing in 2018, your enumerators must get comfortable using it. Good data collection software should be intuitive and easy to use, but that cannot be an excuse for not training your enumerators. Some mobile data collection apps include highly useful productivity features significantly increase efficiency, and it important to point them out to your field staff. It may also be helpful to show them less frequently used but useful features such as how to update their questionnaire, if changes are made post-deployment, or how to restore their data if they switch to a new device.
Poorly Worded Questions
No amount of training can make up for a poorly designed questionnaire. As part of gathering high quality data and guaranteeing a seamless data collection process, it is important to ensure that your survey questions are worded as clearly and unambiguously as possible.
Take for example the question: Do you exercise regularly?
Even with the best of training, this question leaves too much to the imagination. Who between the person who walks 4 km to work every day and the one who went to the gym for the first time last week to assuage his guilt of overindulging in nyama choma is a regular exerciser?
What if you reworded the question to read as follows: How many times did you exercise last week, including at least a 30 minute gym session, brisk walking, cycling or swimming?
This way, you are likely to get much more meaningful answers for your research.
Mishandling Missing Data
How could you mishandle what is not there, you ask. Well, it turns out you can!
In this age of electronic data collection, my advice is to never allow missing data. Instead, you should only allow for it. It must not be possible for an enumerator to simply run past a question without actively entering or selecting a value, even if only to indicate that the data is missing.
Let’s say you’re asking for a respondent’s age, and they happen not to know it. There are 3 ways to solve this problem.
- You could leave the field blank.
- You could enter a pre-determined number to indicate that the information is missing, often 99.
- You could have a previous question that explicitly asks if the respondent knows their age, and if yes, ask them to specify it.
The only wrong way to do it is option 1. It encourages enumerators to neglect asking and entering a value for age, even when it is indeed available. It is the path of least resistance.
Option 2 is reasonable, until you encounter a legitimate 99 year old – and now you cannot tell the difference between their age and a missing value. Well, you may choose to use 999 instead of 99, but there may still be another problem. If your data collection platform automatically runs analysis on your data set, then the 999 will skew your results severely, and in that case, you might find option 3 to be a much better alternative.
Failure to Monitor the Data
A good electronic data collection solution should allow you to monitor data collection right off the bat. Monitoring the process in this way is especially important in the first few days of the exercise as it helps you catch and correct any errors that slipped through the planning and training phases. For larger projects, it is often best to run a pilot exercise intended solely for this purpose.
Typical issues arrested through this process include last-minute typos and questionnaire structure issues, critical missing options, cheating or under-performing enumerators and so on.
This small investment can be the difference between getting things right the first time, and accumulating so many anomalies that the entire exercise has to be repeated, at great financial and time expense, of course.
Happy data collection!