Science has a verification problem
If NeurIPS papers are published with hallucinated citations, we need to question the point of academia
Earlier this week, a new report broke that genuinely disturbed me.
A damning analysis by GPTZero revealed that 53 papers accepted to NeurIPS 2025, the world’s most prestigious AI conference, contained over 100 hallucinated citations.
Let’s be clear about what this means.
These aren’t rejected drafts and they aren’t preprints on arXiv.
These are publications that were submitted, peer-reviewed, accepted, and published as part of the scientific record.
And yet, they cite papers that do not exist. They list authors who never wrote them. They reference venues that never hosted them.
Citations are the moral contract of science. They are the “proof of work” that shows you have engaged with reality, that your ideas stand on the shoulders of actual giants, and that your claims can be traced and verified.
When that contract breaks, we aren’t doing science anymore. We are doing “content generation.”
The Loop is Closed (and Broken)
Does AI assistance invalidate a research idea? No.
Does a hallucinated citation automatically make the math false? Also no.
But a paper with fabricated sources signals something far worse than sloppiness: It signals that no one in the loop actually read it.
The Authors didn’t check their own references (likely relying on an LLM to “fill in the blanks”).
The Reviewers didn’t check the references (likely relying on an LLM to summarize and score the paper).
The ACs didn’t catch the reviewers’ apathy.
We have quietly built a system where Claude or GPT writes the paper, and another AI reviews it. It is an automated loop of plausible-sounding noise.
Can you imagine a paper winning a “Test of Time Award” in 2036, only for us to realize half its bibliography was a hallucination? It would be a stain on the entire field.
The Human Cost
The most painful part is what this does to the real people.
I have friends who truly love the craft of research. They love reading deeply, synthesizing ideas, and reviewing thoroughly. They tell me:
“Why bother? No one values the effort anymore.”
We are teaching the next generation of researchers that “looking correct” is more important than being correct.
The Future is Curation, Not Volume
AI didn’t kill science. It just exposed where we stopped protecting it.
In an age of infinite content, curation is the only scarce resource.
We need to stop measuring success by the number of submissions (which AI can inflate to infinity) and start measuring it by the quality of verification.
The future of academia cannot be “more papers, faster.” It must be:
Fewer, better-curated venues.
Reviewers selected for expertise, not availability.
Real compensation for peer review (because volunteer labor cannot scale against automated generation).
Explicit human accountability.
The Hard Truth
AI can assist writing. AI can assist discovery. AI can assist synthesis.
But AI cannot replace human judgment, responsibility, and taste.
Science is not just about generating ideas. It’s about choosing which ones deserve our attention.
If we outsource that choice to machines, we have forgotten why academia exists in the first place.
What do you think? Does the current peer review system need a complete overhaul, or can we fix this with better detection tools?
If you’re interested in how AI is reshaping research, peer review, and what happens when automation meets institutional knowledge, follow profrod.ai for more posts like this. I write about agent engineering, AI operations, and the systems that separate signal from noise.


