AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI

ChainNewsAbmedia

Penn Wharton School of Business professor Ethan Mollick made a highly impactful observation for academia on X on 4/25: today’s AI agents can independently reproduce complex academic research results using only publicly described methods and data—without the original paper and without the original code. Mollick further noted that when these AI reproductions differ from the original papers, “the errors often come from the human-written papers themselves, not from the AI.” This is a concrete turning point for the crisis of research reproducibility in the age of generative AI—what previously required expensive human effort for peer verification is now being completed at scale and at low cost by AI.

Claude reproduces multiple papers, then uses GPT-5 Pro for a double check

In his OneUsefulThing blog post and in this tweet, Mollick described his specific experiment with Claude: he gave an academic paper to Claude, had it open the archive, organize the files, automatically convert the STATA code used for the statistics into Python, and then execute all of the findings in the paper one by one. After Claude finished, he then used GPT-5 Pro to perform a second round of checks on the same set of reproduction results. Multiple papers were tested in the same way, and the results were generally successful—only getting blocked when the data files were too large, or when there was something wrong with the original replication data itself.

For academia, this process typically used to take research assistants weeks, or even months. The time scale Mollick described was from an afternoon to a day, and the operating cost was only the token fees of a commercial LLM API.

Most errors come from the original human text—not the AI

More controversial is Mollick’s judgment about “who is wrong.” In his tweet, he stated plainly that when AI reproduction results don’t match the original papers, in most cases it’s not the AI that made a mistake, but rather the original paper had data-processing errors, the model was misused, or the conclusions went beyond what the data supports. In psychology, behavioral economics, management, and other social sciences, there have been multiple major reproducibility crisis events over the past decade; the most famous is the large-scale reproduction study by the 2015 Open Science Collaboration, where only about 36% of psychology paper results could be independently reproduced. AI agents are pushing the boundary of this testing process from “requiring human staffing” to “being broadly executable.”

Learned societies still ban AI from peer review; the system lags behind the technology

In another 4/25 tweet, Mollick specifically pointed out that the Academy of Management—the largest learned society in his field—still explicitly bans AI from the manuscript review process. He cited existing research indicating that AI peer review is already better than some traditional human reviewers in terms of accuracy, consistency, and bias control; therefore, this “ban” position could end up further exacerbating the failure of existing review systems. The gap between this kind of institution and the technology is a policy issue that academia, learned societies, and funding organizations will have to face in the next 1–2 years in the world of academic publishing.

For readers, this debate is not limited to academia. When AI agents can verify research findings in real time, academic evidence cited in industry research, policy reports, and financial decisions will enter a new scrutiny threshold—“can the conclusions withstand independent AI reproduction?” In a supplementary tweet, Mollick added that he believes governments are the only entity that can anchor this testing as tool strength continues to rise—and that the complexity of policy design will simultaneously become a relatively overlooked main thread in AI governance discussions.

This article: AI agents can independently reproduce complex academic papers—Mollick says most errors come from the human original text rather than the AI; first appeared on Chain News ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Worxphere Rebrands JobKorea With AI-Powered Hiring Tools

Gate News message, April 26 — South Korean HR platform Worxphere has rebranded JobKorea as it transitions from traditional online job boards to AI-driven hiring solutions. The company is consolidating services including JobKorea and Albamon into a unified platform covering permanent employment,

GateNews2h ago

UAE Announces Shift Toward AI Government Model in the Next Two Years

His Highness Sheikh Mohammed bin Rashid Al Maktoum stated that the goal was for 50% of government sectors to operate through autonomous agentic AI. The transition will also include the training of federal employees to “master AI” and will be overseen by Sheikh Mansour bin Zayed. Key Takeaways:

Coinpedia23h ago

AI Trading Platform Fere AI Raises $1.3M in Funding Led by Ethereal Ventures

Gate News message, April 25 — Fere AI, an AI-powered digital asset trading platform, announced the completion of a $1.3 million funding round led by Ethereal Ventures, with participation from Galaxy Vision Hill and Kosmos Ventures, according to Globenewswire. The platform supports cross-chain

GateNews23h ago

Nvidia Deploys OpenAI Codex AI Agent Across Entire Workforce on Blackwell Infrastructure

Gate News message, April 25 — Nvidia has rolled out OpenAI's Codex, an AI agent powered by GPT-5.5, to its entire workforce following a successful trial with approximately 10,000 employees, according to internal communications from CEO Jensen Huang and OpenAI CEO Sam Altman. Codex is designed to as

GateNews04-25 03:11

AI Coding Startup Cognition in Talks for $25B Valuation Funding Round

Gate News message, April 25 — AI coding startup Cognition is in early talks to raise hundreds of millions of dollars or more at approximately a $25 billion valuation, according to people familiar with the matter. Interest has increased following SpaceX's acquisition of a rival AI coding startup. Co

GateNews04-25 02:51

AI Trading Agent Platform Fere AI Raises $1.3M, Led by Ethereal Ventures

Gate News message, April 25 — AI-powered digital asset trading agent platform Fere AI announced the completion of a $1.3 million funding round, led by Ethereal Ventures, with Galaxy Vision Hill and Kosmos Ventures participating. The platform supports cross-chain networks including Ethereum,

GateNews04-25 01:27
Comment
0/400
No comments