Health Data Access, TANSTAAFL, and the VRDC

May 14, 2024 Dennis D. McDonald

Meredith Wadman, in the May 10 issue of SCIENCE’s article U.S. Wants to Change How Researchers Get Access to Huge Trove of Health Data. Many Don’t Like the Idea, reports on current issues surrounding the U.S. Federal Government's decision to exert more centralized control over researcher access to and storage of taxpayer-funded data on US Medicare and Medicaid recipients and their healthcare.

Some instititutions and researchers, having built up collections of anonymized data over the years, are pushing back against the government's decision to require access to a secure centralized data source, the Virtual Research Data Center (VRDC).

The issues, both pro and con, touch on the core of how healthcare data are financed, managed, and used. The issues fall into two general categories:

The need to address the complexities surrounding the confluence of data governance, AI governance, and cybersecurity.
That old bugaboo, TANSTAAFL (i.e., There Ain't No Such Thing As A Free Lunch).

I have a hard time arguing against the need to centralize management of such a sensitive data resource, given increasingly sophisticated cybersecurity threats including the potential for AI tools to de-anonymize data. (For exampale, see Blake Murdoch’s Privacy and artificial intelligence: challenges for protecting health information in a new era.) Centralization offers no panacea, but it does allow for more efficient implementation of access standards and requirements (and defenses) that would otherwise fall—expensively— on individual distributed repositories. Plus, there is the possibility of improving access by researchers at smaller less-well-funded insitutions.

As with many research access issues, a central question surrounds who will bear the cost for changes to how the current system operates. Current research recipients, for example, are able to include data access costs (which are nontrivial) in grant funding requests.

As with any research and dissemination effort where Federal funding is involved, government and insitutional subsidies occur all along the way. When a research grant recipient is required to ensure ongoing access to research data files after the original research is published, who bears the cost for maintaining such resources? And, what responsibility should the original funder, the original researcher, and the researcher’s insitution bear for data access?

Statements such as the Cato Insitute’s Michael Cannon’s suggest a lack of understanding about what secure ongoing data access costs in the real world. Cannon writes::

CMS is collecting these data anyway. Furnishing these data to academics—who are just about the only people trying to figure out what’s going on in these programs—scarcely costs taxpayers anything. CMS should be giving away Medicare and Medicaid data at least as freely as it shovels taxpayer dollars out the door to high‐cost, low‐quality health care providers.

Cannon’s “scarcely cost anything” suggests ignorance about the real world costs for making valuable data accessible and secure. Nevertheless, it is inevitable that shifting from a distributed health data access model to a centralized model while ensuring equitable access standards will be no trivial matter and potentially very messy.

Text copyright (c) 2024 by Dennis D. McDonald. The above image was generated on May 14, 2024 by GPT-4o after several tries with different prompts in an attempt to illustrate the relationship between health data access and costs.

More on “health data access”

See this gallery in the original post