Blog · Analysis · Last reviewed June 23, 2026

The Citation Machine Enters the Court

AI hallucinated citations are not just lawyer mistakes. They expose the machinery that turns text, sources, and verification into legal authority.

The useful distinction is between generation, grounding, and filing: a model can draft words, a retrieval system can surface sources, but only verified authority should enter a court record.

The governance unit is the verified authority chain: claim, source, quotation, page or paragraph, jurisdiction, procedural posture, current validity, human verifier, and filing decision.

A Source-Discipline Crisis

The first public scandal of AI in law was easy to ridicule: lawyers asked ChatGPT for cases, filed fake citations, and were sanctioned. But the deeper problem is not stupidity. The deeper problem is that generative AI attacks the social machinery of authority at exactly the place where authority is supposed to be most traceable.

For this essay, a citation machine is any AI-assisted workflow that proposes, formats, summarizes, checks, or attaches legal authority to a claim. A citation failure includes fake authority, misquoted authority, stale authority, wrong-jurisdiction authority, and misgrounded authority: a real source attached to a proposition it does not support.

Verified authority is narrower than a citation-looking string. It is a claim tied to a source that exists, says what the filing says it says, remains usable in the forum, fits the procedural posture, and has been checked by a responsible human before the court is asked to rely on it. A retrieval link, search result, model explanation, or vendor confidence badge is not that verification.

A court filing is not ordinary prose. It is a request for state power. It asks a judge to dismiss a claim, preserve an asset, imprison a person, transfer money, enforce a contract, protect a child, stop an agency, or bless a settlement. Its citations are not decorative. They are the beams that connect the argument to the law.

Large language models are very good at making text feel load-bearing. They can supply case names, quotations, parentheticals, doctrinal summaries, and a confident chain of reasoning. That surface is useful when treated as draft material. It is dangerous when treated as authority. The court system has now become one of the clearest laboratories for the wider AI-age evidence problem: when machines can generate plausible source trails, institutions must decide what counts as claim-level verification.

Current Context

As of June 23, 2026, legal AI is no longer only a public-chatbot problem. Major legal tools are embedded in research platforms, drafting environments, document review, practice management, and agentic workflows. The current risk is therefore not only "a lawyer pasted ChatGPT output." It is that source-shaped text can pass through ordinary legal work surfaces before anyone asks whether the authority exists, remains good law, and supports the proposition.

Professional guidance has caught up with that shift. The American Bar Association's Formal Opinion 512 applies ordinary duties of competence, confidentiality, communication, supervision, fees, and candor to generative AI. The State Bar of California's 2026 practical guidance goes further on agentic legal AI, warning that systems able to plan tasks, access data sources, use tools, or continue without continuous prompting increase the need for supervision and verification.

Court governance is now explicit too. The Administrative Office of the U.S. Courts reported in its 2025 annual report that an AI Task Force developed interim guidance for the federal judiciary. The guidance cautions against delegating core judicial functions to AI, recommends independent review and verification of AI-generated content, and asks courts to consider disclosure and locally approved uses. Individual courts have also issued standing orders. The District of Kansas's January 28, 2026 order, for example, states that litigants remain responsible for verifying AI-assisted filings, including citations, quotations, legal analysis, factual backgrounds, and procedural assertions.

The appellate record is moving in the same direction. In McCarthy v. DEA, decided March 27, 2026, the U.S. Court of Appeals for the Third Circuit reprimanded an attorney after briefs included summaries of prior agency decisions supplied by a non-attorney who used AI; the court said seven of eight cited authorities were inaccurately described and one did not exist. The panel treated the problem as verification and professional responsibility, not as a technological excuse, and warned that future violations may face stronger sanctions.

The evidence-rules track shows the same institutional pressure. The Advisory Committee on Evidence Rules reported on May 17, 2026 that it did not recommend action on proposed Rule 707 at that time and would continue studying AI-produced evidence and deepfake issues. That matters for this essay because citations and evidence are two sides of the same court-record problem: a legal system must know how a claim became authoritative enough to act on. For the evidence side, see The Synthetic Evidence Becomes the Court Record.

Mata Was the Warning

In Mata v. Avianca, the Southern District of New York sanctioned lawyers who submitted non-existent judicial opinions with fake quotations and citations generated by ChatGPT. Judge P. Kevin Castel's June 22, 2023 order made the institutional harm explicit: fake opinions waste opposing counsel's time, consume court resources, may deprive clients of real arguments, damage the reputations of judges and courts whose names are falsely invoked, and promote cynicism about the legal system.

The facts mattered. The problem was not merely that an AI tool produced false output. The problem was that lawyers filed it, failed to verify it, and continued defending the fake authorities after the court and opposing counsel raised doubts. The machine made the fiction. The humans laundered it into procedure.

That is why Mata became a template case. It showed that ordinary legal duties were already enough to reach the failure. Rule 11 did not need a special metaphysics of artificial intelligence. The filing either had support in real law or it did not. The lawyer either made a reasonable inquiry or did not. A fluent model output did not change the gatekeeping obligation.

The narrow lesson is not "never use AI." The order itself recognized that technological assistance can be proper. The harsher lesson is that a generated citation has no legal status until it survives adversarial source practice: existence, quotation, proposition fit, jurisdiction, procedural posture, and current validity.

The Professional Answer

The American Bar Association's Formal Opinion 512, issued July 29, 2024, treated generative AI as a professional-responsibility problem rather than a novelty problem. Lawyers using generative AI do not need to become AI engineers, but they do need a reasonable understanding of the specific tool's capabilities and limits. They also need to protect client information, communicate when required, supervise use inside the firm, and charge reasonable fees.

The key rule is simple: review the output. In court, the opinion says lawyers must review generative-AI materials, including analysis and citations, and correct false law, false fact, missing controlling authority, and misleading arguments before submission. Supervisors must also train and supervise lawyers and nonlawyers using these tools.

The State Bar of California's 2026 guidance sharpens the point for agentic systems. A tool that can sequence legal steps, search repositories, access documents, or interact with external systems can shape the work before a lawyer sees the final text. The guidance therefore ties greater autonomy to stronger oversight, access controls, and verification mechanisms. It also warns that lawyers must not let such systems file documents, communicate with courts, or make representations without lawyer review and approval.

California's proposed AI-related amendments to its Rules of Professional Conduct show the next step. The State Bar's 2026 public-comment materials proposed comments addressing competence, client communication, confidentiality, candor to the tribunal, managerial supervision, and nonlawyer-assistant supervision. The proposed candor comment is especially direct: before submitting to a tribunal, a lawyer must verify the accuracy and existence of cited authorities, including authorities generated or assisted by AI or other technology. That proposal should be read as jurisdiction-specific rulemaking, not as settled national law.

That framework is more important than any single disclosure rule. Some judges require lawyers to disclose AI use. Some courts have standing orders. Some lawyers argue disclosure is overbroad because AI is now embedded in search, drafting, proofreading, translation, and document tools. But disclosure alone does not solve the problem. A disclosed hallucination is still a hallucination. An undisclosed but verified draft may be less dangerous than a disclosed draft nobody checked.

The professional standard should focus on provenance, verification, supervision, confidentiality, and accountability. Who generated the text? What parts were machine-assisted? Which citations were checked against authoritative sources? Who signed off? What client or privileged information entered the tool? What process catches invented cases, misquoted statutes, bad parentheticals, stale law, and sources that exist but do not support the proposition?

Legal AI Still Hallucinates

The comforting story was that legal-specific AI tools would solve the problem. Connect the model to a trusted legal database, use retrieval-augmented generation, and hallucinated law should largely disappear. That story is partly true, but not enough.

Stanford RegLab and HAI researchers tested leading AI-powered legal research tools from LexisNexis and Thomson Reuters. Their 2024 analysis, later published in the Journal of Empirical Legal Studies, found that legal-specific retrieval systems improved on general-purpose chatbots but still produced misleading or false information in 17 to 33 percent of tested responses. The researchers also distinguished outright incorrect answers from misgrounded answers: a citation can exist and still fail to support the proposition attached to it.

That distinction is crucial. The next failure mode is not only invented cases. It is real cases used as masks for wrong claims. A system can cite a genuine opinion, quote a real statute, or retrieve a real document while still misdescribing its holding, jurisdiction, procedural posture, current validity, or relevance. In law, source existence is the beginning of verification, not the end.

The practical unit is therefore the proposition, not the source. "Case X exists" is weaker than "Case X, in this jurisdiction and posture, supports this sentence at this page after current-validity review." Legal RAG can help find the first object; it cannot be allowed to certify the second without a separate review path.

Those results should not be treated as a permanent product ranking. Vendors change models, search systems, prompts, and interfaces. The durable finding is architectural: retrieval reduces some hallucination risk, but it does not make a system self-verifying. A retrieved source can be wrong for the claim, and a source-linked answer can still be a bad legal answer.

The pattern has reached elite practice as well. Reuters and Bloomberg Law reported in April 2026 that Sullivan & Cromwell apologized to a federal bankruptcy judge after a filing contained inaccurate citations and other AI-generated errors. That episode is useful as a cautionary example, not as prevalence data. It shows that policies and reputation do not verify citations. A workflow either forces source checking before filing, or it is relying on culture and luck.

Why Courts Are Different

Every institution has a source-discipline problem now. Schools face AI-written papers with fake references. Newsrooms face synthetic images and fabricated screenshots. Agencies face model-generated summaries of records. Corporations face AI-drafted reports and compliance narratives. But courts are different because their authority depends so explicitly on citable lineage.

Law is a memory system. It stores decisions, statutes, regulations, procedures, filings, transcripts, doctrines, conflicts, and exceptions. A legal argument is supposed to show its path through that memory. The citation is the path marker. It lets the opposing party contest the claim, lets the judge inspect the authority, lets later courts understand the reasoning, and lets the public see that the decision did not come from private intuition alone.

AI-generated legal fiction breaks that chain. It creates the appearance of institutional memory without the memory. It gives the user a map with invented roads. Worse, it can do so in the visual grammar of legitimacy: reporter citations, docket references, quotation marks, case names, parentheticals, and procedural confidence.

This is why the courtroom is a preview of a wider model-mediated knowledge crisis. The danger is not merely that models make errors. The danger is that models can produce counterfeit verification artifacts at scale. They do not only answer. They can simulate the trail that makes an answer look answerable.

The safety implication is not only accuracy. A bad citation can change litigation costs, settlement pressure, sanctions exposure, client trust, opposing counsel's workload, judicial time, and the public record. When the filing comes from a self-represented litigant, a legal-aid clinic, or a small practice without a research staff, the burden of correction may also become an access-to-justice problem.

The same problem appears when AI-shaped exhibits, summaries, transcripts, and media enter evidence. The court record must preserve the difference between a captured source, a derived copy, an AI-assisted processing step, and a generated demonstrative. Citation discipline and synthetic-evidence discipline are both ways of protecting the path from source to authority.

The New Standard

Courts and legal organizations should treat AI-assisted law as a verification workflow, not a drafting shortcut with occasional cleanup.

First, every cited authority should be checked against an authoritative source. The case must exist, the quotation must appear, the citation must match, and the cited passage must support the proposition. A source that merely exists is not enough.

Second, current validity must be checked separately from existence. A real case may be overruled, abrogated, distinguished, procedurally irrelevant, outside the forum, or bad for the client's position once controlling authority is considered.

Third, firms should separate generation from verification. The person or process checking citations should not rely on the same model output as proof. Verification should return to primary legal sources, trusted legal databases, docket materials, statutes, rules, and record citations.

Fourth, filings need a source packet or cite-check trail. For consequential filings, the matter file should preserve the authorities, quotations, page references, record cites, and human approvals that support submitted claims. This connects citation work to AI audit trails, not to performative paperwork.

Fifth, AI use should leave proportionate internal traces. Matter files should record when generative or agentic tools were used for research, drafting, summarization, translation, citation support, or record synthesis where reconstruction may matter. The point is auditability when something goes wrong, not permanent retention of every low-risk prompt.

Sixth, high-stakes filings need tool-specific rules. A general chatbot, a legal RAG product, a document review model, a cite-checking tool, and an agent with connectors have different failure modes. Policy should name the tool class, allowed use, prohibited use, data-access limit, and review requirement.

Seventh, confidentiality controls belong in the citation workflow. Source checking often happens inside privileged matter files, draft strategy, client records, and document repositories. Vendor review, retention settings, connector limits, and matter-scoped permissions are part of legal source discipline.

Eighth, court rules must be checked by forum. Lawyers should maintain a local-rule and standing-order inventory for AI use, disclosure, certification, and citation verification. A national ethics opinion does not erase a judge's individual order or a district-wide standing order.

Ninth, courts should target duties rather than panic. A blanket ban may push use underground. A pure disclosure rule may create paperwork without verification. The durable rule is that every submitted legal assertion remains the lawyer's responsibility, regardless of whether it came from a partner, associate, paralegal, search platform, model, or agent.

Tenth, incident handling should be rehearsed. A hallucinated filing needs a correction path: notice to the court and opposing parties, withdrawal or correction of the affected filing, client communication where required, internal root-cause review, supervision review, and policy repair.

Eleventh, legal education must teach adversarial source practice. Future lawyers need to know how models fail: invented authorities, misgrounded citations, stale law, false premises, jurisdictional confusion, quotation drift, and confident overbreadth. This is now part of ordinary competence.

Twelfth, abstention is a valid output. If authority cannot be verified, it should not be cited. If the proposition cannot be supported, the argument should change. The useful legal AI system is not the one that always produces a citation; it is the one that helps the lawyer know when no verified citation exists.

Thirteenth, authority classes should be labeled. Binding precedent, persuasive authority, statute, rule text, treatise, administrative decision, docket filing, factual record cite, expert report, vendor benchmark, and news report should not collapse into one "source" bucket. Each class has a different verification burden.

Fourteenth, legal-RAG vendors should expose the check path. A useful product should show the retrieved sources, searched corpus, date coverage, omitted-source limits, current-validity path, and whether the answer is grounded in primary authority or secondary material. Vendor assurance belongs with vendor governance, not marketing language.

Fifteenth, court-facing tools should protect public users. If a court, clinic, or platform offers AI-assisted legal information, it should distinguish legal information from advice, show jurisdiction and freshness limits, provide escalation to humans where rights are at stake, and preserve enough logs for correction without exposing unnecessary private facts.

What This Changes

The court is one of society's rituals for turning memory into force. It does not only decide disputes. It stages a public discipline: claims must be named, sources must be shown, arguments must be answerable, and authority must pass through procedure before it acts on bodies, money, property, families, and institutions.

The citation machine threatens that discipline by imitating its outer form. It gives argument the costume of authority without the labor of verification. In the small case, that means a lawyer gets sanctioned. In the large case, it means institutions become comfortable with source-shaped hallucination: documents that look grounded, policies that cite ghosts, reports that reference imaginary studies, knowledge systems that simulate the smell of evidence.

The answer is not anti-AI purity. Lawyers will use AI. Courts will use AI. Legal databases will use AI. The useful demand is stricter: generated text must become more accountable as it approaches authority. Drafting can be fluid. Filing cannot. Brainstorming can be speculative. Citation cannot. A model may assist the work, but it must not become the witness for its own truth.

The court's lesson belongs beyond the court. Any institution that acts on claims needs source discipline strong enough for synthetic media and model-mediated knowledge. The future will produce more fluent assertions than humans can comfortably inspect. That makes verification infrastructure a civic necessity, not clerical tidiness.

The old rule survives because it is still correct: show your sources. The new rule is harder: show that the sources survived the machine.

Source Discipline

The sources for this topic carry different weight. A court order such as Mata records a judicial sanction and the court's reasoning in that case. An ethics opinion or state-bar guidance describes professional obligations and practical interpretation. A district standing order creates local filing expectations for that court. A federal judiciary annual report describes institutional policy work. An advisory-committee report describes rulemaking status, not final evidence law.

Empirical benchmarks need their test context preserved. The Stanford legal-RAG study tested specific products, query sets, definitions, and time windows. Its durable lesson is that legal-specific retrieval systems can still produce misleading or false legal answers. It should not be stretched into a universal score for every later version of every legal AI product.

Vendor announcements and product pages can show market direction, but they do not prove reliability, confidentiality, ethical compliance, or fitness for a filing. News reports can identify incidents, such as the April 2026 Sullivan & Cromwell episode, but they should not be treated as substitutes for filed orders, dockets, or primary court documents when those are available.

Current-source claims in this essay were checked against the named sources on June 23, 2026. For court-rule and discipline claims, the preferred source is the order, opinion, standing order, rulemaking report, or official bar material; press reports are used only when the underlying filing is not readily available from a public official source.

Sources

United States District Court, Southern District of New York, Mata v. Avianca, Inc., Opinion and Order on Sanctions, June 22, 2023, reproduced by Justia.
American Bar Association Standing Committee on Ethics and Professional Responsibility, Formal Opinion 512: Generative Artificial Intelligence Tools, July 29, 2024, reviewed June 23, 2026.
State Bar of California, Ethics & Technology Resources, noting May 14, 2026 approval of updated generative AI guidance, reviewed June 23, 2026.
State Bar of California, Practical Guidance for the Use of Generative Artificial Intelligence in the Practice of Law, 2026 update, reviewed June 23, 2026.
State Bar of California, Proposed Amendments to the Rules of Professional Conduct Related to Artificial Intelligence, 2026 public comment, reviewed June 23, 2026.
Stanford HAI, AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries, May 23, 2024.
Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho, Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Journal of Empirical Legal Studies, 2025.
Supreme Court of the United States, 2023 Year-End Report on the Federal Judiciary, December 31, 2023.
Administrative Office of the U.S. Courts, Court Operations - Annual Report 2025, developing artificial intelligence policies, reviewed June 23, 2026.
U.S. District Court, District of Kansas, Standing Order 26-01: Use of Artificial Intelligence in Preparing Court Filings, January 28, 2026.
Advisory Committee on Evidence Rules, Report of the Advisory Committee on Evidence Rules, May 17, 2026.
U.S. Court of Appeals for the Third Circuit, McCarthy v. DEA, No. 24-2704, March 27, 2026.
Reuters, Sullivan & Cromwell law firm apologizes for AI 'hallucinations' in court filing, April 21, 2026, press report.
Bloomberg Law, Sullivan & Cromwell Apologizes to Judge for AI Hallucinations, April 21, 2026, press report.
Related references: AI in Legal Practice and Courts, AI Hallucinations, Retrieval-Augmented Generation, Human Oversight of AI Systems, AI Audit Trails, AI Data Provenance, AI Incident Reporting, The Legal Agent Becomes the Associate, The Synthetic Evidence Becomes the Court Record, Agent Tool Permission Protocol, The Agent Log Becomes the Receipt, Vendor and Platform Governance, Claim Hygiene Protocol, Research and Editorial Integrity, and Provenance and Content Credentials.

Return to Blog