www.ddmcd.com

View Original

On Using Synthetic Text Generation To Report Scientific Research

By Dennis D. McDonald

Scientific journals are scrambling to figure out how to deal with the likelihood that tools such as ChatGPT will be used to help generate research articles, according to the report As scientists explore AI-written text, journals hammer out policies, published in the February 22, 2023 issue of AAAS' Science magazine.

Developing tools to reliably detect synthetic text will be difficult given the likelihood that AI-based text generation tools will continue to evolve in sophistication. A key problem is that it can be difficult if not impossible via automated means to reliably and consistently detect all errors in synthetically generated text.

Some journals are requiring article authors to identify their use of AI based tools. Other journals have banned the use of such tools outright. My own take: 

  • There are legitimate uses for AI-based synthetic text generation in scientific research.

  • It will be impossible to “sniff out” such usage in all cases.

  • Based on my own professional experience as a researcher and data analyst, it is sometimes frustratingly difficult to unambiguously define what an “error” is.

  • Standards and practices for how research is conducted and reported will have to be hammered out by policymakers, publishers, and scientists themselves. No one should pretend this will be easy.

  • No matter what policymakers, funders, and scientific communities decide, some will flaunt the rules and attempt to bypass voluntary or even legislatively mandated controls over how AI based tools are used in research.

Another concern is whether research consumers are knowledgeable enough to understand without the aid of experts how reported research was conducted. The potential role of social media in establishing trustworthy relationships to communicate about research reliability might seem to be a possibility. Unfortunately we see that modern social media can be manipulated politically as well as criminally.

Worrying about AI generated journal articles may be less important than whether consumers of research findings are even able to understand and act on the findings of reported research, but that is another topic to address.

What might be done in the short term by scientific communities, perhaps led by policymakers or an organization such as the National Academies, is to openly explore and report on how to best utilize tools such as ChatGPT in conducting and reporting on scientific research.

Meanwhile, funding agencies that are now requiring that researchers make data available when research results are published could also consider rules about how analytical tools and processes were used in connection with the research (i.e., “Show your work!”)

Bottom line: tools that help organize, analyze, and interpret data are potentially useful. As with all tools they can also be misused. We need to understand both the dangers and the opportunities.

Copyright 2023 by Dennis D McDonald.

More on “research data”

See this gallery in the original post