In the first part of this series, we discussed the importance of data verification as a component of DQA. In this installment, we shall talk about Microsoft Excel as a tool for conducting data verification. We shall discuss why it is currently the most popular choice and examine some of the critical challenges associated with it. If you haven’t already read the first part, please do so before proceeding as this post builds on it.
By far the most popular electronic tool for conducting data verification is Microsoft Excel spreadsheets. And with good reason, too. Spreadsheets readily support not just data entry but also complex data analysis and visualization. In fact, spreadsheets are such a popular method for data verification that many DQA tools published by notable organizations are offered in this format.
The idea is simple. You prepare your DQA tools in Excel, embed the necessary formulas to analyze and visualize the data and then hand out copies to the DQA teams visiting the sampled health facilities. These teams then fill out one workbook per facility. Once they are done making the entries, the workbook automatically analyzes the data and voila! The DQA results for the health facility can be disseminated on the spot.
What could possibly be wrong with that?
Not much, actually, particularly if you are the health facility. You get your results instantly and can begin thinking about how to address any emergent quality issues there may be.
However, if you are the sub-national, or worse, the national level, this methodology is an unmitigated nightmare. I will tell you why. Let’s say you have a sample of 500 facilities. How many workbooks do you now have floating around out there? 500 of them!
Now, consider a very simple indicator, like the Number of people tested for HIV. The value for that indicator at any given facility is easy to report. But what is the total for all sampled facilities together, at each step in the reporting pipeline? It turns out that this deceptively simple question is very difficult to answer. Anyone who has attempted it will tell you that merging data from 500-odd Excel workbooks is no mean feat. Add the fact that distributing 500 workbooks increases the chances of your formulas being tampered with and you have an intractable situation. On the basis of this challenge alone, you can expect the national DQA report to take anywhere from several months to more than a year to prepare.
A frustrated data manager trying to merge hundreds of MS Excel files.
The second problem is less severe but not any less harmful. With spreadsheets, all data is entered manually. Again, taking the example of the indicator Number of people tested for HIV, field officers must manually enter the value reported in each of the 4 sources in the pipeline i.e. the Register, the summary form (MOH 731), the DHIS and DATIM.
Why is this a problem?
Two reasons. Firstly, manual data entry introduces new opportunities for error. What if the Register says 50 and the DHIS says 50, but, in entering the DHIS value into the spreadsheet, the field officer makes a typo and enters 500? Now you have a situation where the quality audit process itself is introducing its own errors! Secondly, DHIS and DATIM values are already digitally recorded elsewhere. Entering them manually again is unnecessary double-work.
A better way?
These challenges notwithstanding, it is still easy to see why in the absence of alternatives, spreadsheets have been the tool of choice for data verification. But not anymore!
In the third and final part of this series, I will explain how using our powerful mobile data collection and analysis platform solves these two problems while preserving every benefit that makes Microsoft Excel the de facto choice for conducting data verification.