Can research transparency & AI defend against fake science?
Increasing fakery
According to the recent SCIENCE article Fake scientific papers are alarmingly common there is increasing evidence that more fake scientific papers are making their way into refereed scientific research journals.
The exact extent of the problem is not known. Some researchers, by analyzing potential indicator variables associated with suspect published papers, are pointing to the work of shady "paper mills" that specialize in enhancing manuscripts with bogus findings, authorship, and institutional affiliation data. (Paper mills in China, Russia, and Iran had already been reported as a problem in the 2021 NATURE article The fight against fake-paper factories that churn out sham science.)
Paper mills are not the only problem. Issues are already surfacing with an increasing use of large language model (LLM) tools, such as open AI‘s ChatGPT, to analyze data and synthesize the findings presented in some scholarly research papers. The challenge: how are reviewers to assess what analysis is legitimate and what is not? Use more AI?
The problem is probably going to get worse before it gets better. This is due to two factors inherent to how scholarly publishing operates.
Publish or perish
The first is that, like it or not, publishing in refereed scholarly journals is often required for getting ahead in the academic research world; "publish or perish,” they say. It’s not surprising that some will try to cut corners to get published, which is one of the reasons why the aforementioned "paper mills" exist.
A decentralized volunteer based system
The second is due to the very nature of the scholarly publishing system. Despite the concentration of private and society publishers responsible for so many journals, most journals are still greatly dependent on some level of volunteer labor for editorial content, management, editing, review, and content production work. Production, access, retrieval, and distribution systems are decentralized and dependent on labor with many different institutional affiliations. Those who want to publish, therefore, can if they desire use readily-available technologies to cut corners; both the motive and the means are readily available to so many.
What’s the solution? The first challenge will be to measure the extent of the problem. Some propose using AI-based tools to help detect fraudulent information. Others are using AI tools -- legitimately as well as illegitimately -- to synthesize information. This is generating a type of "information war" where what’s "real" and what’s "fake" are becoming increasingly difficult to differentiate.
Increased transparency
One solution to uncovering and discouraging fakery is increased process transparency of the type that the United States’ NIH is recommending in its enhanced public access initiatives. Creation and strengthening of trusted research communities may also help uncover and discourage fakery and artifice. Perhaps initiatives discussed at the recent Nobel Prize Summit "Truth, Trust and Hope" conference will stimulate further positive developments.
Increased vigilance
Part of an increase in research transparency, ironically, will be to help expose legitimate uses of AI in the academic research process, of which there are many, especially in data and analysis intensive research that involves the processing large data volumes.
One challenge, of course, is that as AI tools’ capabilities become more self-aware and more human like, what's to stop them from fudging research results on their own?
Copyright (c) 2023 by Dennis D. McDonald. This article was originally published in slightly different form at ddmcd.stck.me.