Penn Wharton School of Business professor Ethan Mollick made a highly impactful observation for academia on X on 4/25: today’s AI agents can independently reproduce complex academic research results using only publicly described methods and data—without the original paper and without the original code. Mollick further noted that when these AI reproductions differ from the original papers, “the errors often come from the human-written papers themselves, not from the AI.” This is a concrete turning point for the crisis of research reproducibility in the age of generative AI—what previously required expensive human effort for peer verification is now being completed at scale and at low cost by AI.
Claude reproduces multiple papers, then uses GPT-5 Pro for a double check
In his OneUsefulThing blog post and in this tweet, Mollick described his specific experiment with Claude: he gave an academic paper to Claude, had it open the archive, organize the files, automatically convert the STATA code used for the statistics into Python, and then execute all of the findings in the paper one by one. After Claude finished, he then used GPT-5 Pro to perform a second round of checks on the same set of reproduction results. Multiple papers were tested in the same way, and the results were generally successful—only getting blocked when the data files were too large, or when there was something wrong with the original replication data itself.
For academia, this process typically used to take research assistants weeks, or even months. The time scale Mollick described was from an afternoon to a day, and the operating cost was only the token fees of a commercial LLM API.
Most errors come from the original human text—not the AI
More controversial is Mollick’s judgment about “who is wrong.” In his tweet, he stated plainly that when AI reproduction results don’t match the original papers, in most cases it’s not the AI that made a mistake, but rather the original paper had data-processing errors, the model was misused, or the conclusions went beyond what the data supports. In psychology, behavioral economics, management, and other social sciences, there have been multiple major reproducibility crisis events over the past decade; the most famous is the large-scale reproduction study by the 2015 Open Science Collaboration, where only about 36% of psychology paper results could be independently reproduced. AI agents are pushing the boundary of this testing process from “requiring human staffing” to “being broadly executable.”
Learned societies still ban AI from peer review; the system lags behind the technology
In another 4/25 tweet, Mollick specifically pointed out that the Academy of Management—the largest learned society in his field—still explicitly bans AI from the manuscript review process. He cited existing research indicating that AI peer review is already better than some traditional human reviewers in terms of accuracy, consistency, and bias control; therefore, this “ban” position could end up further exacerbating the failure of existing review systems. The gap between this kind of institution and the technology is a policy issue that academia, learned societies, and funding organizations will have to face in the next 1–2 years in the world of academic publishing.
For readers, this debate is not limited to academia. When AI agents can verify research findings in real time, academic evidence cited in industry research, policy reports, and financial decisions will enter a new scrutiny threshold—“can the conclusions withstand independent AI reproduction?” In a supplementary tweet, Mollick added that he believes governments are the only entity that can anchor this testing as tool strength continues to rise—and that the complexity of policy design will simultaneously become a relatively overlooked main thread in AI governance discussions.
This article: AI agents can independently reproduce complex academic papers—Mollick says most errors come from the human original text rather than the AI; first appeared on Chain News ABMedia.
Related Articles
UAE Announces Shift Toward AI Government Model in the Next Two Years
AI Trading Platform Fere AI Raises $1.3M in Funding Led by Ethereal Ventures
Nvidia Deploys OpenAI Codex AI Agent Across Entire Workforce on Blackwell Infrastructure
AI Coding Startup Cognition in Talks for $25B Valuation Funding Round
AI Trading Agent Platform Fere AI Raises $1.3M, Led by Ethereal Ventures