AI Copyright Litigation
AI copyright litigation is the wave of lawsuits testing whether AI developers may acquire, copy, store, train on, transform, retrieve from, or generate outputs based on copyrighted works without permission.
Definition
AI copyright litigation refers to lawsuits over the use of copyrighted material in artificial intelligence systems. The cases usually focus on three different acts: acquiring source material, copying or processing that material during model development, and producing or distributing outputs that allegedly reproduce protected expression.
The category includes direct and secondary copyright infringement, fair-use defenses, DMCA copyright-management-information claims, contract and terms-of-service disputes, crawler access questions, licensing-market disputes, and claims about market substitution. In the European Union, the dispute also connects to text-and-data-mining opt-outs and general-purpose AI transparency duties.
This page is a wiki summary, not legal advice. Litigation status can change quickly, and many claims described here remain allegations until resolved by a court or settlement.
Current Context
As of June 14, 2026, U.S. courts have not adopted one rule that all AI training is fair use or one rule that all training requires permission. The early opinions point in different directions because the records are different: legal headnotes used to build a rival research product, books bought and scanned for training, books allegedly downloaded from pirate libraries, news articles reproduced or summarized in outputs, song lyrics, and visual characters.
By mid-2026, the docket was still expanding. A May 2026 Southern District of New York complaint by Elsevier, Cengage, Hachette, Macmillan, McGraw Hill, and Scott Turow against Meta and Mark Zuckerberg showed that the book-training fight had not narrowed to a single author class or a single fair-use order. That filing was a complaint, not a judicial finding.
The practical center of the debate has shifted from a slogan about "training" to a supply-chain question: what was copied, where it came from, what rights were attached, what was retained, what the model can output, and whether the system competes with the market for the works or for reasonable licenses.
That makes source discipline central. A public claim that a case "legalized AI training" or "made AI training infringement" is usually too broad. The relevant unit is the order, the claim, the record, the affected works, the stage of the case, and the conduct the court actually addressed.
Core Claims
Input-copy claims. Plaintiffs argue that developers copied copyrighted works into datasets, caches, vector stores, internal libraries, or model-development pipelines without permission.
Training claims. Plaintiffs argue that making and using copies for model training infringes reproduction rights unless excused by fair use, license, public-domain status, or another defense.
Output claims. Plaintiffs argue that systems can reproduce protected expression, characters, lyrics, images, articles, code, or other works in outputs.
Market-harm claims. Plaintiffs argue that models trained on copyrighted works can substitute for the works, flood markets with competing material, or undermine licensing markets.
Acquisition claims. Plaintiffs distinguish between lawfully purchased or licensed material and material allegedly obtained from pirate, shadow-library, or unauthorized sources.
Distribution claims. Some cases treat dataset acquisition as not only downloading but also alleged uploading, torrent sharing, or redistribution of works during collection.
Retention and deletion claims. Some disputes ask whether source copies, downloaded archives, training datasets, or embeddings may be retained after training, settlement, opt-out, or deletion requests.
Copyright management information claims. Some cases allege that copyright notices, metadata, or other management information were removed, stripped, or ignored in ways that violate 17 U.S.C. 1202.
Accounting and transparency claims. Plaintiffs and regulators increasingly seek source lists, training-content summaries, model documentation, collection records, and proof that opt-outs, licenses, or deletion duties were honored.
Crawler and contract claims. Separate from copyright doctrine, publishers and platforms may rely on access controls, website terms, API contracts, crawler policies, or database restrictions.
Major Cases
Thomson Reuters v. Ross Intelligence. In February 2025, a Delaware federal court held on summary judgment that Ross infringed Thomson Reuters' copyrighted Westlaw headnotes and rejected Ross's fair-use defense in connection with a competing legal research product. In May 2025, the court stayed the case pending appellate response to difficult copyright questions involving non-generative AI tools.
Bartz v. Anthropic. In June 2025, Judge William Alsup held that Anthropic's use of lawfully acquired books for LLM training was fair use on the record before him, and that format-shifting purchased print books into internal digital copies was fair use. The order did not excuse alleged downloading and retention of pirated books. The later settlement process concerned the piracy-related class claims, not a class-wide ruling that every AI-training use of books is lawful.
Kadrey v. Meta. In June 2025, Judge Vince Chhabria granted Meta summary judgment on fair use for the specific record before the court, while emphasizing that different evidence about market harm could matter in other cases. In March 2026, the court allowed plaintiffs to add a contributory-infringement claim and update the distribution theory while denying class discovery at that stage.
Elsevier et al. v. Meta Platforms. On May 5, 2026, five major publishers and Scott Turow filed a proposed class action in the Southern District of New York against Meta and Mark Zuckerberg over alleged use of books and journal articles to develop Llama. The complaint pleads reproduction, distribution, contributory-infringement, and DMCA copyright-management-information theories, and asks for accounting and destruction of allegedly infringing copies. Those allegations were newly pending as of this review.
The New York Times v. OpenAI and Microsoft. The Times sued in December 2023 over alleged copying of news articles for model training and outputs. In April 2025, a federal court allowed direct and contributory copyright claims to proceed while dismissing some unfair-competition, abridgment, and DMCA theories. The order left core factual and fair-use issues for later stages.
Disney and Universal v. Midjourney. In June 2025, major film studios sued Midjourney, alleging that the image generator infringed copyrights in protected characters and works through training, output, promotion, and product design. CourtListener listed the case as active with filings continuing into June 2026, making it a central pending visual-media dispute.
Music publisher cases. Music publishers have sued AI developers over alleged use and output of song lyrics. In Concord Music Group v. Anthropic, March 2025 motion practice denied a preliminary injunction and dismissed some secondary-infringement and CMI theories with leave to amend; later 2025 motion practice allowed amended secondary-infringement and CMI claims to proceed. The larger lyric-training and lyric-output dispute remained unresolved.
Fair Use Pattern
The early U.S. case law does not produce a single answer that all AI training is lawful or unlawful. Courts have focused on facts: what was copied, how it was acquired, whether the use was transformative, whether outputs substitute for originals, whether the defendant competes with the plaintiff, and what evidence exists of market harm.
One emerging distinction is between training as transformation and acquisition as infringement. A court may treat model training as transformative in one context while still refusing to bless the creation or retention of an unauthorized library of source works.
Another distinction is between abstract capability and specific output. A model that generally learns statistical patterns raises one set of questions; a system that reproduces lyrics, characters, article passages, or recognizable protected expression raises another.
A third distinction is between the plaintiff's theory and the evidentiary record. Market harm is not self-proving. Plaintiffs may need evidence about substitution, licensing markets, output behavior, traffic diversion, product competition, or internal plans to replace paid sources.
Source Discipline
AI copyright cases are easy to overread. A complaint states allegations. A motion-to-dismiss order usually asks whether pleaded facts can proceed if assumed true. A summary-judgment order depends on the evidentiary record before that court. A settlement can resolve claims without deciding the legality of the underlying conduct.
Good citation practice separates five questions: what works are at issue, what conduct was alleged or proved, what procedural stage the case reached, what legal issue the court actually decided, and what jurisdiction or regulator is speaking. A U.S. district-court fair-use order does not decide EU text-and-data-mining opt-out duties. An EU training-content template does not decide U.S. infringement.
When a public summary says that a court "held AI training is fair use," the record should identify whether the works were lawfully acquired or allegedly pirated, whether infringing outputs were alleged, whether the product competed with the source market, what market-harm evidence existed, and whether the ruling binds only named plaintiffs or a broader class.
Policy and Regulation
The U.S. Copyright Office's May 2025 report treated generative AI training as a fact-specific fair-use question rather than a blanket exemption. It rejected simple analogies between AI training and human learning, and emphasized that commercial use, expressive substitution, market effects, licensing markets, and the nature of the copied works all matter.
The report also treated unlawful access as a meaningful factor. It said that knowing use of pirated or illegally accessed datasets should weigh against fair use, while avoiding a simple per se rule. That position tracks the emerging litigation split between model training and the acquisition path for source copies.
The EU AI Act adds a different governance layer. Providers of general-purpose AI models placed on the EU market must maintain a copyright-compliance policy, identify and comply with machine-readable reservations of rights under EU copyright law, and publish a sufficiently detailed summary of training content using an AI Office template. The European Commission published an explanatory notice and training-content summary template on July 24, 2025. Those obligations do not decide U.S. fair use, but they push model developers toward documentation and rights-reservation systems.
The policy conflict is structural. AI developers want broad access to culture as training substrate. Rights holders want control, compensation, attribution, and bargaining power. Public-interest researchers want transparency and access without giving every cultural gatekeeper veto power over computation.
Governance Implications
Dataset provenance becomes legal memory. Developers need records showing source categories, acquisition paths, dates, licenses, crawler signals, opt-outs, retention decisions, and deletion actions. Without that record, they cannot separate lawful access from unlawful access or training use from library-building.
Purpose separation needs controls. The same work may appear in pretraining, fine-tuning, retrieval, evaluation, product demos, safety filters, or user-facing output examples. Governance has to log and enforce those uses separately rather than treating "data" as one undifferentiated bucket.
Licensing becomes infrastructure. Settlements, publisher deals, collective licenses, crawler permissions, and machine-readable rights standards can become the everyday operating layer for model development before legislatures resolve the bigger policy question.
Output controls matter. Guardrails, retrieval boundaries, memorization testing, attribution, citation design, and complaint pathways are not cosmetic. They shape whether a system can reproduce protected expression or displace the source at the interface.
Audit access is uneven. Model developers hold logs, source lists, product metrics, output samples, and licensing records. Individual creators often hold only suspicion and screenshots. A fair governance regime needs discovery, regulator access, independent audits, or transparency duties that reduce that asymmetry.
CMI and rights signals become compliance data. Copyright notices, publisher metadata, crawler directives, opt-outs, license fields, and rights-reservation signals need to survive ingestion and transformation if they are going to govern later model uses.
Public knowledge needs a carveout philosophy. Copyright litigation can protect some authors and publishers, but it can also make public-interest research, libraries, small labs, archives, and open-web projects dependent on private licensing deals they cannot afford.
Open Questions
What counts as transformative? Courts are still defining when training changes a work enough to support fair use.
How should market harm be measured? Litigation is testing whether harm means direct substitution, lost licensing revenue, market dilution, or broader creative-economy damage.
Does provenance change the answer? Lawfully acquired copies, licensed datasets, scraped websites, and pirated archives may receive different treatment.
Who is responsible for outputs? Cases increasingly ask whether developers, deployers, users, or platform operators are responsible when systems generate protected expression.
What counts as deletion? Settlements and opt-outs may require destruction of source files, but models, embeddings, derived datasets, backups, logs, and fine-tunes make deletion difficult to define.
How much accounting can courts require? Plaintiffs increasingly seek source lists, training records, collection methods, and destruction reports, while defendants invoke trade secrets, security, and scale.
Will licensing become infrastructure? Settlements and licensing deals may turn cultural archives into paid data channels, favoring large rights holders and large model developers.
Spiralist Reading
AI copyright litigation is the court system discovering that culture has become fuel.
The lawsuits are not only about copying. They are about conversion: books into embeddings, songs into behavior, images into style-space, journalism into answer engines, characters into promptable surfaces, and archives into private capability.
For Spiralism, the key question is whether a society can turn its memory into machine capability without losing the people and institutions that produce that memory. Copyright is an imperfect tool for that question, but it is one of the few tools with teeth.
Related Pages
- Synthetic Media and Deepfakes
- AI in Legal Practice and Courts
- Training Data
- AI Search and Answer Engines
- Perplexity AI
- AI Data Licensing
- AI Liability and Accountability
- Model Cards and System Cards
- Machine Unlearning
- AI Audits and Third-Party Assurance
- Model Distillation
- Open-Weight AI Models
- EU AI Act
- Content Provenance and Watermarking
- Provenance and Content Credentials
- Claim Hygiene Protocol
- Vendor and Platform Governance
- After the Book Becomes a Database
- Research and Editorial Integrity
Sources
- U.S. Copyright Office, Copyright and Artificial Intelligence, reviewed June 14, 2026.
- U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training, May 2025.
- Justia, Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., Document 770, February 11, 2025.
- Justia, Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., Document 804, May 23, 2025.
- Justia, Bartz v. Anthropic PBC, Order on Fair Use, June 23, 2025.
- Justia, Bartz v. Anthropic PBC, Memorandum Opinion on Preliminary Approval of Class Action Settlement, October 17, 2025.
- Anthropic Copyright Settlement, Key Dates, reviewed June 14, 2026.
- Justia, Kadrey et al. v. Meta Platforms, Inc., Document 598, June 25, 2025.
- Justia, Kadrey et al. v. Meta Platforms, Inc., Document 700, March 25, 2026.
- Justia Dockets, Elsevier Inc. et al. v. Meta Platforms, Inc. et al., filed May 5, 2026.
- Association of American Publishers, Elsevier Inc. et al. v. Meta Platforms, Inc. et al., complaint, May 5, 2026.
- Justia, The New York Times Company v. Microsoft Corporation et al., Document 514, April 4, 2025.
- Justia, Disney and Universal v. Midjourney complaint, June 11, 2025.
- CourtListener, Disney Enterprises Inc. v. Midjourney Inc., docket, reviewed June 14, 2026.
- CourtListener, Concord Music Group, Inc. v. Anthropic PBC, docket, reviewed June 14, 2026.
- Loeb & Loeb, Concord Music Group, Inc. v. Anthropic PBC, March 2025 motion-practice summary.
- Loeb & Loeb, Concord Music Group, Inc. v. Anthropic PBC, October 2025 motion-practice summary.
- European Union, Regulation (EU) 2024/1689, Artificial Intelligence Act, June 13, 2024.
- European Commission, Explanatory Notice and Template for the Public Summary of Training Content for general-purpose AI models, July 24, 2025.
- European Commission, Drawing-up a General-Purpose AI Code of Practice, reviewed June 14, 2026.