How Important Is "Perfect Data" To Your Data Analytics Program?
Back in 2014 I published a guest post on the Open Data Institute’s web site titled How important is the quality of open data? Here’s a quote from that piece:
Setting aside for the moment how we define “quality” … database managers have always had to concern themselves with standards, data cleaning, corrections, and manual versus automated approaches to data quality control. In fact, I’m not convinced that “adding” quality control measures throughout the process will significantly add to costs; in fact, adding quality control measures may actually reduce costs, especially when data errors are caught early on.
I was reminded of this while participating earlier this week in a Data Analyticssession sponsored by DorobekINSIDER: LIVE as similar questions about data quality were raised by session attendees. (Note: when you click the “listen to the archive” link on the DorobekINSIDER LIVE page you need to register to listen to the audio.)
Unfortunately, there’s no simple answer to the question, “How clean does our data have to be to make use of it?” other than, perhaps, “Clean enough to support its intended use.”
If you’re doing exploratory data analysis to help you decide how much data prep might be needed to make your data public, that’s one thing. On the other hand, if you are using your data to calculate input to an invoicing system, or you’re preparing data to distribute publicly for re-use by others, that’s another.
Either way, before you can logically consider what the consequences are of analyzing, interpreting, and using your data, you’ll need to know what data you have and who owns it, both from technical as well as business perspectives. A data inventory should probably be one of the first things youdo when you are considering an upgrade to your organization’s data analytics capabilities. Such an inventory should also include the interdependencies of the various systems involved — and the identities of who controls them.
The question of “where to start” on a data analytics program was another important question raised by the DorobekINSIDER attendees. While I’m all for taking a strategic view when considering how to best organize and manage an organization’s data (for example, such as that promoted by CMMI’s Data Management Maturity Model) it’s also possible that taking too “strategic” a view can lead you down the path of too much analysis in rapidly changing situations, especially if you have to document for the first time the degree to which your various systems and applications depend upon each other for data.
With that in mind here are a few suggestions to ponder when you are considering improvements in how you analyze and use data; note that “data quality” is only one of the many things you need to be considering:
- Be honest about costs. They’ll bite you if you aren’t.
- Focus first on solving important problems, not just on “low hanging fruit.”
- Look first at available data before proposing significant changes to your data management infrastructure.
- Don’t be surprised if process changes are more complex and time consuming than buying new tools.
- Know what data you have, who owns it, and who controls it.
- Identify and engage with the data experts in your organization regardless of where they are organizationally. Encourage them to collaborate if they aren’t doing so already.
- Engage early and often with senior management, not just with your project’s sponsor.
- Don’t automatically assume that your IT department is the best place to control your analytics projects, even though they probably know more about project management than anyone else.
- Deliver useful analytical results early and often even if the underlying data are suspect. Since you’ll need to prioritize data corrections and standardization processes, the more that management understands about what you’re doing, the better.
- Don’t just recruit and hire data analysts. Also consider upgrading middle and upper management’s analytical skills.
- Don’t get so caught up in planning exercises that you’re accused of “analysis paralysis.” On the other hand, if you don’t engage in meaningful planning while developing initial deliverables, any early disappointments could stop you in your tracks.
- Expect the unexpected. If you already knew what analysis of your data will tell you, you wouldn’t need to analyze your data, would you?
Copyright (c) 2016 by Dennis D. McDonald