AI Training on Copyrighted Work: Who Really Benefits?

A novelist spent 20 years building a backlist. An AI company scraped every sentence, trained a model on her voice and structure, and now sells a product that competes directly with her. She owns her copyright. She received nothing. Both facts are true at once, and the coexistence of those 2 facts is the whole story.

The question circulating in legal circles right now, whether AI training on copyrighted data makes your work legally theirs, has a technically reassuring answer: no. No court, no regulator, no jurisdiction has held that ingesting your work transfers ownership to the company that ingested it. Anthropic settled a copyright lawsuit for $1.5 billion last September, not because it owned the scraped material, but because it used it without permission. The settlement was about infringement and remuneration, not title transfer. That distinction matters legally. It matters very little economically.

The Gap Between Ownership and Extraction

When a company trains on your writing, your music, your code, it doesn't file a claim over the source material. It files no paperwork at all. The value flows quietly: your aesthetic sensibility, your structural choices, your years of practice all become signal in a model that now competes with you in your own market. The law's failure isn't that it grants ownership to the wrong party. The failure is that it doesn't require anything in return for the extraction.

The European Parliament's vote on March 6, 2026, 460 to 71, in favor of non-binding recommendations demanding transparency, fair remuneration, and opt-out rights, is the first serious attempt to address that gap at scale. The proposed European register at EUIPO would list every copyrighted work used in AI training, alongside any opt-outs filed by creators. EU creative industries generate 6.9% of EU GDP; MEPs decided that number was worth protecting with something more than polite appeals to corporate goodwill.

California's AB 2013, which took effect January 1, 2026 and survived X.AI's constitutional challenge last month, takes a narrower approach: public disclosure of whether training datasets include copyrighted content, when it was collected, and its licensing status. High-level summaries, not itemized lists. Useful, but limited. A creator knowing that her work was probably in a dataset is not the same as a creator having a legal claim to compensation when that work helped generate a billion-dollar product.

Transparency Registers Won't Fix This Alone

Industry groups warn that the EU's push creates innovation barriers and licensing complexity that will entrench large incumbents and shut out smaller AI developers. That concern deserves a fair hearing: mandatory licensing markets built without careful design can calcify into gatekeeping systems where only the biggest players afford the legal overhead. The EU should watch for that failure mode.

But the alternative the industry prefers, wait for the AI Act and existing Copyright Directive to work, has a documented track record of producing nothing for creators while companies accumulate training data at scale. Stated intentions are not incentive structures. The business model rewards extraction. The regulatory response has to be proportionate to that, not deferential to it.

Congress should pass a federal disclosure law with enforcement teeth, not a California-only patchwork that AI companies can route around with training data sourced from outside state jurisdiction. Creators should have a right to know when their work was used, and a legal mechanism to seek compensation. Owning your copyright while someone else captures all the value it generates isn't protection. It's a formality.