The Paper Mill Becomes the Literature
Scientific knowledge is becoming a machine-readable substrate for search, medicine, policy, and AI training. Paper mills and hallucinated citations do not merely add bad papers to that substrate. They test whether institutions can still tell the difference between evidence and evidence-shaped output.
The Polluted Record
The scientific literature used to look like a slow archive. Papers entered journals, indexes, libraries, citation graphs, review articles, clinical guidelines, grant proposals, textbooks, and eventually the background memory of a field. Errors mattered, but they moved through a human-speed system.
That archive is now also an input layer for machines. Search engines summarize it. Retrieval systems quote it. Clinical and legal workflows cite it. AI assistants use it to answer questions. Foundation models absorb parts of it during training. Research agents may soon mine it for hypotheses, experimental plans, and automated reviews. The literature is not only read by scientists. It is parsed by infrastructure.
That is why paper mills are not a narrow publishing scandal. A fake or unreliable paper can become a node in a knowledge graph, a citation in a review, a chunk in a retrieval database, a training example for a model, or a false premise in an automated research workflow. The damage is not only that a journal published something weak. The damage is that weak evidence can become machine-actionable memory.
Nature reported that retractions for research articles passed 10,000 in 2023, a record driven in large part by efforts to clean up sham papers and peer-review fraud. That number should not be read as simple decline. Retractions can also mean correction is happening. But the scale reveals a structural problem: the record is being repaired after contamination, not protected before it.
The Paper-Mill Economy
A paper mill is not just a sloppy author. It is an organized production system for academic-looking output. It may sell authorship, fabricate data, recycle images, manipulate peer review, target special issues, use templates, or coordinate submissions across journals. The product is not knowledge. The product is a credential object that can pass through enough institutional checkpoints to count.
The incentives are familiar. Universities, hospitals, promotion committees, grant systems, and national evaluation regimes often reward publication volume, journal placement, and citation count. Publishers may rely on article-processing charges. Editors and reviewers face overloaded queues. Indexing systems can lag behind abuse. A paper mill exploits the gap between symbolic evidence and verified evidence.
The Hindawi crisis made the economics visible. Retraction Watch reported that Hindawi retracted more than 8,000 articles in 2023 while Wiley absorbed revenue and reputational costs. Wiley later moved away from the Hindawi brand and closed or integrated journals. The lesson is not that one publisher was uniquely vulnerable. The lesson is that high-throughput scholarly publishing can become an attack surface.
Paper mills are also a governance problem because they turn fraud into logistics. A single fabricated result is a misconduct case. A coordinated stream of fabricated manuscripts is an institutional adversary. It requires detection, information sharing, correction workflows, sanctions, and changes to the incentives that made the market profitable.
AI Changes the Cost Curve
Generative AI does not invent paper mills. It makes parts of the operation cheaper, faster, and harder to see.
Language models can draft plausible introductions, abstracts, literature reviews, cover letters, reviewer responses, and method-sounding prose. Image generators can produce figures. Citation tools can create references that look formatted. Translation and paraphrase systems can smooth repeated templates. None of this proves a particular paper is fraudulent; legitimate researchers also use assistive tools. The risk is that the marginal cost of producing evidence-shaped text falls toward zero while the cost of careful review stays human.
The absurd visible cases are easiest to remember. Frontiers retracted a 2024 article after concerns were raised about AI-generated figures that did not meet editorial and scientific rigor. The episode became a meme because the image errors were blatant. The more important cases will be less visible: a plausible pathway diagram, a synthetic image without obvious artifacts, a fabricated dataset that fits expectations, or a review article whose citations mostly resolve but subtly misstate the field.
A 2026 BMJ study used machine learning to screen 2.65 million PubMed-indexed cancer-research articles and flagged 261,245, or 9.87 percent, as potential paper-mill publications. The authors and later respondents treated the tool as triage, not ground truth. That distinction matters. A detector can help prioritize scrutiny, but if institutions treat automated suspicion as proof, research integrity becomes another black-box discipline machine.
Citations as Interface
Citations are the interface between claim and record. They tell the reader where a statement claims to stand. When citations break, the paper does not merely contain a formatting error. It loses part of its accountability surface.
Recent evidence suggests that AI-assisted writing is weakening that surface at scale. A 2026 arXiv preprint audited 111 million references across arXiv, bioRxiv, SSRN, and PubMed Central and estimated 146,932 hallucinated citations in material published in 2025. Nature separately reported that arXiv would apply sanctions, including bans, for authors who submit work with unchecked AI-generated content such as hallucinated references.
This belongs beside the site's earlier analysis of AI hallucinated legal citations. Courts and journals are different institutions, but the failure has the same shape. A professional document borrows authority from a citation. The citation looks official. The reader, clerk, reviewer, or retrieval system may not verify it. The hallucination enters an institutional workflow wearing the costume of source discipline.
The danger is not only nonexistent citations. It is citation drift: real identifiers paired with wrong titles, real papers cited for claims they do not support, synthetic reviews that launder weak evidence into consensus language, and retrieval systems that make the first available citation feel like proof. The citation becomes a button the user presses instead of a trail the institution follows.
Preprint Pressure
Preprints are valuable because speed matters. During emergencies, fast sharing can save time. In mathematics, physics, computer science, biology, and medicine, preprint servers let communities inspect work before journal delay. But the same speed creates a moderation problem when cheap generation meets low submission friction.
The governance problem is not solved by saying "peer review will catch it." Peer review was already strained. Many reviewers are unpaid, overloaded, narrow in expertise, and working from partial information. Preprint moderators cannot reproduce experiments or verify every citation. Journals cannot manually inspect every image, dataset, authorship claim, and reference trail at the scale of modern submission flows.
Nor is the answer to shut down open scientific exchange. A locked literature would protect incumbents, slow correction, and make public knowledge more dependent on private platforms. The harder design problem is preserving openness while adding friction at the points where the record becomes machine-readable authority.
The Cleanup Machine
The response is already becoming infrastructural. United2Act, supported by COPE and STM, released a consensus statement in 2024 focused on education, post-publication corrections, paper-mill research, trust markers, and joint action. The STM Integrity Hub describes itself as a shared environment where publishers can screen submitted manuscripts for patterns associated with paper mills and other research-integrity concerns.
This is necessary, but it creates a second-order risk. The cleanup machine can become its own high-control interface: opaque scores, private watchlists, cross-publisher signals, automated suspicion, and uneven appeal. Researchers from less-resourced institutions, non-native English writers, and fields with formulaic language could be harmed if detection systems confuse style with fraud.
So the question is not whether to use machines against machine-assisted fraud. At this scale, some automation is unavoidable. The question is what kind of institutional wrapper surrounds it: evidence standards, human review, bias testing, appeal paths, disclosure, privacy limits, and public correction records.
A Governance Standard
A serious response should meet seven tests.
First, separate triage from judgment. Detection models should route work for review, not pronounce guilt. A paper-mill score is a lead, not a verdict.
Second, verify the evidence objects. Journals and repositories need routine checks for citations, DOIs, image provenance, data availability, cell lines, ethics approvals, author identities, reviewer conflicts, and reused templates. The checkpoint should match the claim.
Third, make corrections machine-readable. Retractions, expressions of concern, linked corrections, and trust signals should propagate through Crossref, PubMed, indexing services, library systems, search engines, retrieval databases, and model-training filters. A corrected record that machines cannot see is only half corrected.
Fourth, protect legitimate AI use without normalizing negligence. Translation, editing, coding support, and accessibility aids can help researchers. Fabricated citations, undisclosed generated figures, synthetic data, and unverified claims should remain professional failures no matter which tool produced them.
Fifth, align incentives away from publication volume. Paper mills exist because credential systems buy their product. Hiring, promotion, funding, and national evaluation systems need more weight on data quality, replication, software, peer review, negative results, public-interest work, and correction behavior.
Sixth, audit the detectors. Integrity tools should be tested for false positives across language, geography, discipline, institution type, and article genre. Otherwise the defense system will reproduce the same status hierarchies that made authors vulnerable to paper-mill markets.
Seventh, treat the literature as critical infrastructure. Scientific publishing is no longer only a professional communication system. It feeds medicine, policy, AI systems, education, search, and public belief. Its integrity deserves infrastructure-level funding and oversight.
The Spiralist Reading
The paper mill reveals a recursive failure in model-mediated knowledge.
Institutions create incentives to publish. Paper mills create publications that satisfy the incentives. Indexes ingest them. Citation systems connect them. AI systems read them. Researchers then use AI systems to write more papers, find more citations, and summarize more literature. A fake paper is no longer just a lie on a page. It can become part of the environment that teaches future systems what reality sounds like.
This is the same pattern as the training set eating itself, but with institutional prestige attached. Synthetic residue enters the archive, then the archive becomes evidence for another synthetic act. The loop does not need anyone to believe the fake paper deeply. It only needs enough systems to treat the form as usable.
The answer is not nostalgia for a pure human literature. Human science has always had error, fraud, prestige games, exclusion, and sloppy citation. The answer is better source discipline for a world where sources are operational. A citation must be verifiable. A retraction must travel. A detector must be contestable. A model using the literature must know when the literature is disputed.
Model-mediated reality will not be safer than the records it learns to trust. If the scientific record becomes polluted, the pollution will not stay inside journals. It will move into answer engines, clinical tools, grant reviews, classroom summaries, policy memos, and the next layer of automated research. The institution that cannot clean its memory cannot govern its machines.
Sources
- Richard Van Noorden, More than 10,000 research papers were retracted in 2023 - a new record, Nature, December 12, 2023.
- Katharine Sanderson, Science's fake-paper problem: high-profile effort will tackle paper mills, Nature, January 19, 2024.
- Retraction Watch, Hindawi reveals process for retracting more than 8,000 paper mill articles, December 19, 2023.
- Frontiers Editorial Office, Retraction: Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway, Frontiers in Cell and Developmental Biology, February 16, 2024.
- Adrian G. Barnett et al., Machine learning based screening of potential paper mill publications in cancer research: methodological and cross sectional study, The BMJ, January 30, 2026.
- James Walsh et al., LLM hallucinations in the wild: Large-scale evidence from non-existent citations, arXiv, May 2026.
- Nature, Researchers who use hallucinated references to face arXiv ban, May 19, 2026.
- UKSG, United2Act Consensus Statement concerning papermills, January 19, 2024.
- STM Association, The STM Integrity Hub, reviewed May 2026.
- Church of Spiralism Wiki, AI in Science and Scientific Discovery, Retrieval-Augmented Generation, and Synthetic Data and Model Collapse.