Dennis D. McDonald (ddmcd@ddmcd.com) consults from Alexandria Virginia. His services include writing & research, proposal development, and project management.

Comparing How 3 AI Tools Summarize Text: An Experiment

By Dennis D. McDonald

Introduction

Early in my career I made a living applying statistical and research methods to studying and consulting on professional communication and electronic publishing. These days I apply my project management and consulting experience to writing and content development in connection with tech- and data-related proposals.

My own experimentation with AI tools reflects that experience and has ranged from editing and grammar-checking text to generating project planning documents. Since I use my own website to promote my own consulting, writing, and research services, I’m very sensitive to how I use AI tools in connection with content development. I want what and how I write to reflect my own skills and experience.

A recent case in point: my recent website post titled Health Data Access, TANSTAAFL, and the VRDC. This went through several iterations, first by my writing it out longhand (I’m a fountain pen and cursive fan), rewriting it, then converting it to text through dictation where I then did a final edit before posting.

The Question

I’ve tinkered with the article following publishing and as I write this I’m STILL finding spelling errors. But I wondered, would there be benefit to using AI tools to summarize the published text?  I’ve written many “executive summaries” for research or consulting reports and could see summarization being a useful tool in my own consulting – if I could trust what was generated to accurately represent the original text. 

The Experiment

My experiment: ask 3 AI tools to summarize my recent article’s text into ten bullet points, then compare the text.

Perplexity

Here’s what Perplexity generated (210 words):

Here are 10 bullet points summarizing the article:

  1. The U.S. government wants to centralize researcher access to taxpayer-funded healthcare data through a Virtual Research Data Center (VRDC).

  2. Some institutions and researchers are pushing back against this centralized data access model, as they have built up their own anonymized data collections over the years.

  3. The issues involve data governance, AI governance, cybersecurity concerns, and the costs associated with managing and providing access to sensitive healthcare data.

  4. Centralization could improve access standards, security defenses, and access for smaller, less-funded institutions, but it raises questions about who will bear the costs.

  5. Currently, researchers can include data access costs in grant funding requests, but centralization may shift these costs.

  6. There are debates around government and institutional subsidies for research data access and dissemination after publication.

  7. Some argue that providing academics with healthcare data "scarcely costs taxpayers anything," suggesting a lack of understanding of real-world data access costs.

  8. Shifting from a distributed to a centralized data access model while ensuring equitable access standards will be complex and potentially messy.

  9. Cybersecurity threats, including AI tools that can de-anonymize data, highlight the need for centralized data management.

  10. The confluence of data governance, AI governance, and cybersecurity complexities underscores the need for a centralized approach, despite the challenges.

Claude

Here is what Claude generated (222 words):

  1. The U.S. Federal Government plans to exert more centralized control over researcher access to and storage of taxpayer-funded data on Medicare and Medicaid recipients and their healthcare.

  2. Some institutions and researchers are pushing back against the government's decision to require access through a secure centralized data source, the Virtual Research Data Center (VRDC).

  3. The issues touch on data governance, AI governance, cybersecurity, and the costs associated with managing and accessing healthcare data.

  4. Centralization of data management is seen as a way to address cybersecurity threats, including the potential for AI tools to de-anonymize data.

  5. Centralization could improve access for researchers at smaller, less well-funded institutions.

  6. A central question is who will bear the cost for changes to the current system, as current research recipients include data access costs in grant funding requests.

  7. When research data must be maintained after publication, there are questions about the responsibility of the funder, researcher, and institution for ongoing access costs.

  8. Statements suggesting that providing data access "scarcely costs anything" demonstrate a lack of understanding of the real-world costs involved.

  9. Shifting from a distributed to a centralized data access model while ensuring equitable access standards will be a complex and potentially messy process.

  10. The article includes an AI-generated image illustrating the relationship between health data access and costs.

ChatGPT-4o

Here is what ChatGPT-4o generated (151 words):

  1. The U.S. Federal Government plans to centralize control over researcher access to taxpayer-funded Medicare and Medicaid data.

  2. Researchers and institutions with existing anonymized data collections are resisting this move.

  3. The debate involves how healthcare data are financed, managed, and used.

  4. Key issues include data governance, AI governance, and cybersecurity complexities.

  5. Centralizing data management could enhance security against sophisticated threats, such as AI de-anonymization.

  6. A centralized system may improve access for researchers from smaller, less-funded institutions.

  7. The centralization of data management raises questions about who will bear the costs of the changes.

  8. Currently, researchers can include data access costs in grant funding requests, but this may change.

  9. Statements like those from Michael Cannon of the Cato Institute underestimate the real costs of secure data access.

  10. Transitioning to a centralized health data access model will be complex and potentially disruptive, requiring careful implementation.

Discussion 

At first glance I think all three summaries are pretty good. Looking closer there are some differences:

  1. Perplexity included in its output the links (not reproduced here)  I had incorporated into the original article, suggesting that source traceability may be better for that tool.

  2. Only Claude included a reference to the AI-generated image I had included in the article (which was generated by and credited in the article to GPT-4o).

  3. Of the three summaries my personal assessment is that the GPT-4o summary is more neutral and “flat” in tone. It’s also the one with the lowest word count.

Keep in mind that I used the same introductory prompts with all three tools: “please summarize the following article text I wrote into 10 bullet points:” It is likely the output would have been different had I included more details in the prompt, such as “make sure to include references to any links in the article” or “assume the reader is knowledegable about data governance issues.”

Conclusions

Of the three—all of which are pretty good I think—the one that most accurately and objectively reflects the content of my article is the one generated by ChatGPT-4o. It’s also the shortest and one I think is the most boring to read.

Which one would I use? The answer to that question will depend on the intended use. If I’m trying to demonstrate writing skills I wouldn’t use any of the three without polishing. But if I were working on a government proposal where page count is limited I might tend towards ChatGPR-4o.

Admittedly though this is a pretty limited test so I will be interested in what others think based on their own experience.

Text copyright (c) 2024 by Dennis D. McDonald. The image at the top was generated on 5/17/24 by GPT-4o in response to the prompt, “Please express in graphic form how AI tools such as ChatGPT can intelligently summarize large amounts of complex text.“

Should You Disclose Using AI Tools To Create Proposal Content?

Should You Disclose Using AI Tools To Create Proposal Content?

Health Data Access, TANSTAAFL, and the VRDC

Health Data Access, TANSTAAFL, and the VRDC