Improved Data Access Requires More than Analytics and Technology

July 6, 2016 Dennis D. McDonald

Giving The People Control

I’ve been reading that giving people more control over the data that’s relevant to them personally is an important key to making data more accessible and useful. A good example is the U.S. Federal government’s Blue Button initiative for health records.

At the same time, we need to do a better job of making sure that people can understand and use their data. Not everyone is a data scientist or statistician. Even reasonably intelligent people can be flummoxed by the intricacies of even a moderately sophisticated spreadsheet. Plus, the details of an individual’s financial or health records may require expert knowledge to interpret. Even though I have easy access to my own physical exam and test results from my physician’s web site, for example, I have no idea what most of the numbers actually mean.

Helping People Understand

Even as we make strides in “opening up” both personal and governmental data to public scrutiny through expanded “open data” initiatives and upgrades to Freedom of Information policies, let’s also give thought to what people need to know and do to manipulate and make sense of the data we give them.

Technologically one avenue might be to eventually use 3D modelling, virtual reality, and augmented reality to help people understand and interpret data. Examples include three-dimensional data visualization and real-time data manipulation and interactivity. Videogamers especially might be receptive to such approaches. One thing to explore, for example, would be the use of color and motion to illustrate the interaction of different variables in different data streams.

Yet, would such capabilities really help “non data scientists” to improve their understanding and comprehension of potentially complex data relationships so they can discover meaningful insights that would normally remain hidden? Or would expanding such access to more people prove to be too confusing by making it too easy for them to get lost in trivialities or blind alleys?

I don’t know the answers to such questions, but I do think that technology, while it may play an important role in expanding access to potentially valuable data, will not be the complete answer to making data accessible and useful.

Helping People Ask The Right Questions

What will be just as important as technology is the ability to easily and clearly formulate the questions or problems being addressed. Are we, for example, providing a mechanism to experienced analysts for analysis and exploration, or are we trying to help members of “the general public” solve personally relevant problems or make specific decisions?

The same data store may be relevant to both classes of use. The implications for designing the supporting services that enable the user to understand and use the data are significant. Health researchers might be able to use a data lake of health records to construct a simulated clinical trial. Looking at the same data I wouldn’t be able to interpret the data in my own little slice of the lake without assistance.

I am reminded of some of the things I learned when working for a company that developed electronic repair manuals and parts catalogs for the aircraft, home appliance, truck, and computer hardware industries. I saw eyes light up when such tools were provided to service technicians and call center reps. They immediately recognize how much better their jobs would be with improved and streamlined data access. It wasn’t just speed and portability they craved. They also needed rapid searching and filtering as they clawed their way through closely related and confusing data to find the proverbial “needle in haystack.”

Granted, call center reps, engineers, and repair technicians were already familiar with the context and meaning of what they were searching for. They knew how to formulate their queries and were usually trying to solve a problem with a specific solution, the significance of which they understood. However, as such systems eventually migrated to the web and then to public access, such contextual information and support have not always been provided. Online searches for parts and product repair information now represent a much wider variety of users ranging from do-it-yourself home repair enthusiasts (like me) to grizzled veterans of the appliance repair wars. Both may be interested in a means to an end but public access systems now have to support a variety of different search methods, each of which carries with it a different cost implication for search and support. “Fee or free” then becomes a real design issue given the expense of maintaining separate systems for amateurs and experts.

The situation we face today in opening up public data systems for improved access is somewhat analogous. Large amounts of data of various kinds (financial, transactional, health, medical, environmental, etc.) can be brought to bear on a very wide range of issues with potential value to a wide range of users who have a wide range of expertise when it comes to interpreting and manipulating data. Some of these users will just require adequate documentation. Others will require better analysis and visualization tools. Still others will require expert help in formulating their questions and interpreting the results. Supporting such a variety of use cases is expensive.

We Need to Prioritize

It is likely that we need to prioritize the uses we intend to support. Then we can work back from there to make intelligent decisions about how meaningful – and useful — data based services will be, given the varying levels of data sophistication of the different user groups and the varying costs involved in providing the services needed to support these users.

One question will be how any new data access services fit into existing organizational structures and data management lifecycles. Can improved data access be folded into existing operations and services? Or will new organizational structures be required to provide the needed management oversight and accountability?

I suspect that the need for separate or even nontraditional organizational structures will increase as organizations become more dependent on data for accomplishing their objectives.

Another factor influencing how services will be managed will be variety in data sources and data users. As variety increases, pressure on existing organizational and management structures will increase. Adding requirements for data services to become more “real time” may also increase the complexity of the services and how they are provided, especially in situations where data usage has been primarily report-based or historical.

These are important questions, regardless of whether you are considering private sector, public sector, R&D, or academic environments. Whatever the environment, new data management and analysis technology may be important, but success and sustainability will also be driven by how we manage it and by how successful we are in putting data users and their priorities into the driver’s seat.