As generative AI, such as ChatGPT, becomes capable of producing scientific articles that appear legitimate—particularly to those outside the field—how can we identify which are fake?
Ahmed Abdeen Hamed, a visiting research fellow at Binghamton University's Thomas J. Watson College of Engineering and Applied Science, has developed a machine-learning algorithm called xFakeSci, which can detect up to 94% of counterfeit scientific papers, a success rate nearly double that of conventional data-mining methods.
"My primary research focuses on biomedical informatics, but because I engage with medical publications, clinical trials, online resources, and social media mining, I’m constantly concerned about the authenticity of the knowledge being disseminated," said Hamed, a member of George J. Klir Professor of Systems Science Luis M. Rocha's Complex Adaptive Systems and Computational Intelligence Lab.
"The biomedical literature, particularly during the global pandemic, was significantly affected by the spread of false research."
In a recent study published in Scientific Reports, Hamed and collaborator Xindong Wu, a professor at Hefei University of Technology in China, generated 50 fake articles on three prevalent medical topics—Alzheimer's, cancer, and depression—and compared them with an equal number of genuine articles on the same subjects.
Hamed explained that when he requested AI-generated papers from ChatGPT, "I used the same keywords I employed to retrieve literature from the National Institutes of Health's PubMed database to ensure a consistent basis for comparison. I suspected that there must be discernible patterns distinguishing fake content from genuine research, but I wasn’t sure what those patterns would be."
After conducting experiments, Hamed programmed xFakeSci to analyze two main features in the writing of these papers. The first feature was the frequency and usage of bigrams—two words that often appear together, such as "climate change," "clinical trials," or "biomedical literature." The second feature examined how these bigrams were connected to other words and concepts within the text.
"The most striking observation was that the number of bigrams in fake papers was significantly lower than in genuine ones, where bigrams were more abundant and varied," Hamed noted. "Moreover, despite the lower frequency of bigrams in fake papers, they were heavily connected to other parts of the text."
Hamed and Wu hypothesize that the differences in writing style arise because human researchers and AI-generated content have distinct objectives. Human researchers aim to report findings honestly and transparently, while AI systems, like ChatGPT, are designed to persuade readers by emphasizing specific terms, often lacking the broad contextual depth that characterizes genuine scientific research.
"ChatGPT, constrained by its current knowledge, attempts to convince readers by focusing on the most impactful words," Hamed said. "A scientist's role isn't to convince but to accurately report experimental results and methodologies. While ChatGPT focuses on depth in a single area, genuine scientific research encompasses a broad scope."
Mohammad T. Khasawneh, Distinguished Professor and Chair of the Department of Systems Science and Industrial Engineering, commended Hamed’s work: "We’re thrilled to have Dr. Ahmed Abdeen Hamed as part of our team, working on such groundbreaking ideas. In an era where ‘deepfakes’ are increasingly prevalent, his research is incredibly timely and relevant. We eagerly anticipate further collaborations and advancements in his work."
To refine xFakeSci, Hamed plans to broaden the algorithm's applicability to a wider range of topics beyond medicine, including engineering, other scientific disciplines, and the humanities. He also anticipates that AI will continue to evolve, making it increasingly challenging to distinguish between authentic and AI-generated content.
"We'll always be playing catch-up unless we develop a comprehensive solution," he said. "We have significant work ahead to identify a general pattern or create a universal algorithm that isn’t tied to a specific version of generative AI."
Although their algorithm detects 94% of AI-generated papers, Hamed emphasized the need for caution: "This means 6 out of 100 fake papers still slip through. We must remain humble about our achievements. While we've made significant strides in raising awareness, there’s much more to be done."