While traditional publishers negotiate AI training deals worth hundreds of millions, Amazon has made no public statement about whether it can—or will—use the millions of books on Kindle Direct Publishing for the same purpose.

When Taylor & Francis announced in 2024 that it had signed AI licensing deals worth $75 million, academic authors discovered the news the same way the rest of the world did: through a corporate trading update spotted on social media. The publisher had licensed approximately 3,000 academic journals to Microsoft and another unnamed AI company without consulting a single author. The authors received nothing.

John Wiley & Sons followed with $44 million in similar deals and explicitly stated it would not offer opt-outs, arguing this would “erroneously support AI developers’ specious claim that licensing is not scalable.” Cambridge University Press took a different approach, contacting 20,000 authors individually and requiring consent before licensing any work.

Among trade publishers, only HarperCollins has signed a confirmed deal—offering authors and publishers $5,000 per nonfiction title, split evenly between them, for a three-year license with Microsoft, with authors required to opt in. Penguin Random House took the opposite stance, adding explicit copyright language to all books globally: “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.”

But while traditional publishers negotiate and authors’ organizations advocate for consent-based licensing, Amazon—which hosts millions of independently published books through Kindle Direct Publishing—has made no public statement about its intentions regarding KDP content and AI training. This silence persists even as the legal landscape shifts and other platforms explicitly claim AI training rights in their terms of service.

The academic publishing deals established a pattern that would prove instructive.

Taylor & Francis’s initial $10 million deal with Microsoft in May 2024 gave the tech company access to research that authors had submitted under contracts predating AI technology. When the company announced a second deal that brought total AI revenue to $75 million—boosting underlying revenue growth from 3% to 15%—academic Dr. Ruth Alison Clemens spotted it in a trading update and posted about it on social media. Her thread went viral. Scholars expressed outrage that publicly-funded research was being monetized without their knowledge, that they’d learned about the deal the same way everyone else did: through corporate earnings reports spotted on X.

Wiley’s approach was even more explicit. The publisher signed two deals totaling $44 million and stated in its investor materials that it would not offer opt-outs—claiming authors were compensated “in accordance with contractual terms,” though academic authors typically receive only single-digit royalty percentages on books and zero royalties on journal articles. The company’s justification was revealing: creating opt-outs would “erroneously support AI developers’ specious claim that licensing is not scalable.”

The pattern revealed how publishers interpreted contracts written before AI existed. Standard academic publishing agreements often transfer full copyright to the publisher, giving publishers legal authority to license works for purposes never contemplated when authors signed. The authors had no say. They received nothing.

The Authors Guild responded by establishing clear guidelines: AI training rights are “not book or excerpt rights” and require separate authorization. The organization argues authors should receive 75-85% of licensing revenue, with publishers taking only the equivalent of an agent’s fee.

This position stems from the Guild’s legal analysis of standard publishing contracts. Subsidiary rights clauses typically cover foreign editions, serializations, and readable digital formats—not machine learning training. While some publishers invoke “other digital” or “other electronic” language, the Authors Guild calls this interpretation “erroneous.” Their position is unambiguous: rights that are not expressly granted are generally retained by the original copyright owner.

HarperCollins acknowledged this principle by noting its AI licensing was “outside the original publishing agreement” and requiring authors to opt in. The deal offers $5,000 per nonfiction title, split evenly between author and publisher, for a three-year license with Microsoft. The 50-50 revenue split drew criticism from the Authors Guild, which argued publishers taking half of AI licensing revenue—for rights that don’t traditionally belong to them—gives “far too much to the publisher.” But the opt-in model itself? That established something important. It acknowledged the rights belonged to the author in the first place.

Penguin Random House, the world’s largest trade publisher, went further. In October 2024, the company added explicit AI training prohibitions to all books globally: “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.” CEO Tom Weldon stated the company would “vigorously defend the intellectual property that belongs to our authors and artists.”

Cambridge University Press took perhaps the most labor-intensive approach, contacting 20,000 authors individually and requiring consent before licensing any work. Managing director Mandy Hill called it a “huge extra investment” that made the process “harder”—but argued the author relationship was “too important” to treat any other way.

Three different responses from three major publishers. All three acknowledged the same underlying principle: authors have a say.

In September 2025, Judge William Alsup gave preliminary approval to a $1.5 billion settlement in the class action lawsuit against Anthropic—the largest copyright recovery in U.S. history.

The court’s earlier ruling created a distinction that matters enormously for the question of KDP. Training on legally purchased books, the judge found, may constitute transformative fair use. He described the practice as “quintessentially transformative” and “among the most transformative we will see in our lifetimes.” But downloading and training on pirated copies from shadow libraries like LibGen and PiLiMi was ruled “inherently, irredeemably infringing.” Anthropic agreed to pay approximately $3,000 per work for the roughly 500,000 books it obtained from piracy sites, and to destroy the datasets containing that material.

This creates a legal framework with significant implications. Entities that legally possess copyrighted works have a stronger fair use argument than those who scraped pirated content. For AI companies that downloaded Books3 or LibGen datasets, the Anthropic settlement suggests significant liability.

But for entities that already possess books through legitimate channels—such as platforms where authors uploaded their own work—the legal questions become more complex.

What KDP Terms Actually Grant Amazon

Amazon’s Kindle Direct Publishing Terms and Conditions, last updated September 27, 2024, grant Amazon specific rights that indie authors agree to when uploading their books.

Section 5.5, titled “Grant of Rights,” states that authors grant Amazon “a nonexclusive, irrevocable, right and license to print (on-demand and in anticipation of customer demand) and distribute Books, directly and through third-party distributors, in all formats you choose to make available through KDP by all distribution means available.” The section continues with specific permissions, including the right to “reproduce, index and store Books on one or more computer facilities, and reformat, convert and encode Books,” to “display, market, transmit, distribute, sell, license and otherwise make available all or any portion of Books through Amazon Properties,” and to “transmit, reproduce and otherwise use (or cause the reformatting, transmission, reproduction, and/or other use of) Books as mere technological incidents to and for the limited purpose of technically enabling the foregoing.”

One word in that language deserves attention: irrevocable.

These rights survive even after an author removes their books from the platform or terminates their account. Section 3 explicitly states that following termination or suspension, Amazon “may continue to maintain digital copies of your Digital Books in order to provide continuing access to or re-downloads of your Digital Books.” The rights don’t end when the author leaves.

The Unanswered Question

The language in KDP’s terms differs significantly from traditional publishing contracts. Where standard agreements grant rights to “publish and distribute” works in various formats for human consumption, KDP’s terms include the right to “reproduce, index and store Books on one or more computer facilities” for purposes of “technically enabling” distribution and sale.

Whether this language could extend to AI training is a question without public answer.

A search for published legal analysis specifically examining whether KDP’s Section 5.5 grants AI training rights yielded no results. This stands in contrast to traditional publishing contracts, which the Authors Guild and publishing attorneys have analyzed extensively and determined do not grant such rights. The absence of analysis is notable because the question affects millions of works.

Amazon uses machine learning extensively across its services—advertising, recommendations, search functionality, book categorization. All of it relies on processing content at scale. The question is where “reproduce, index and store” for purposes of “technically enabling” distribution ends and AI training begins.

Other platforms have addressed this explicitly. In November 2024, X updated its terms of service to state users grant the platform rights to use their content “for use with and training of our machine learning and artificial intelligence models.” The language is unambiguous.

Amazon’s KDP terms contain no such explicit language about AI training. They also contain no explicit prohibition.

Amazon’s Conspicuous Silence

Amazon has been active on AI policy, developing comprehensive guidelines for author disclosure of AI-generated content after months of consultation with the Authors Guild. The company monitors how generative AI affects reading, writing, and publishing, and requires authors to check disclosure boxes when uploading content that involved AI assistance. But only regarding what authors do with AI, not what Amazon might do with author content.

In September 2023, the company announced new guidelines requiring authors to disclose AI-generated content when publishing through KDP. The policy, developed after months of discussions with the Authors Guild, distinguishes between “AI-generated” content (text, images, or translations created by AI tools) and “AI-assisted” content (human-created content refined with AI assistance). Authors must check a box during upload indicating whether their book contains AI-generated material. Amazon stated it is “actively monitoring the rapid evolution of generative AI and the impact it is having on reading, writing and publishing” and will use author disclosures to “help shape policies around AI-generated content going forward.”

But Amazon has made no equivalent disclosure about its own potential use of KDP content. The company has not publicly stated whether it interprets KDP terms as granting AI training rights, whether it intends to use KDP content to train AI models, whether it would offer authors opt-out mechanisms, or whether it would compensate authors if it licenses their work.

Nothing.

This silence is notable because clarification would appear straightforward. If Amazon has no plans to use KDP content for AI training, saying so publicly would presumably build author trust at minimal cost. If Amazon believes its terms don’t grant such rights, saying so would align with the Authors Guild’s interpretation and industry standards. If Amazon intends to seek separate authorization, announcing consent-based licensing would follow HarperCollins’s model. Any of these statements would be simple. Amazon has made none of them.

What Makes KDP Different

Traditional authors have representation. The Authors Guild negotiates on behalf of members, establishes contract standards, and pursues collective legal action when needed. Literary agents negotiate publishing contracts and can push back on unfavorable terms. Authors can organize—as approximately 70 authors did in June 2025 when they petitioned Big Five publishers to refuse AI-written books built on copyrighted content without consent. Indie authors on KDP have none of this leverage.

Amazon’s terms are take-it-or-leave-it. Authors cannot negotiate, modify terms, or add contractual restrictions. There is no collective bargaining, no agent to advocate on their behalf. The platform itself creates dependency—Amazon controls 68% of the U.S. ebook market and an estimated 83% when including Kindle Unlimited subscriptions. For many indie authors, KDP isn’t optional. It’s how they reach readers. Going “wide” to other platforms means accepting dramatically reduced reach and income.

The Anthropic settlement adds another dimension to this dynamic. Because Anthropic obtained books through piracy, authors had clear grounds to sue for copyright infringement. But every KDP book was uploaded voluntarily by its author under terms that grant Amazon legal possession and a license to use the content for specified purposes. The terms also state these rights are irrevocable and survive account termination. An indie author who uploaded their debut novel to KDP in 2015 may have granted Amazon rights that continue indefinitely—and unlike traditional publishing contracts that expire or allow rights reversion, those KDP rights persist regardless of whether the book remains available for sale.

What’s Documented and What Remains Unknown

Here’s what the public record shows:

  • Academic publishers have licensed author works to AI companies for over $119 million, often without author consent or compensation, under contracts that predated AI technology.
  • Traditional publishers have established that AI training rights require separate authorization, either through opt-in deals like HarperCollins’s or explicit prohibitions like Penguin Random House’s copyright notice.
  • The Authors Guild has analyzed traditional publishing contracts and determined they don’t grant AI training rights because such rights are “not book or excerpt rights” and require explicit separate agreements.
  • The Anthropic settlement established that training on pirated books constitutes copyright infringement, while training on legally purchased books may qualify as fair use under certain circumstances.
  • Amazon possesses every KDP book legally through voluntary author upload and holds irrevocable rights to “reproduce, index and store” them on computer facilities for purposes of “technically enabling” distribution.
  • Amazon requires authors to disclose AI use in their content but has made no public statement about its own use of author content for AI training.
  • Other platforms like X have explicitly updated their terms to include AI training rights, while Amazon’s KDP terms contain neither explicit permission nor explicit prohibition.

Here’s what remains publicly unknown:

  • Whether Amazon interprets its existing KDP terms as granting AI training rights or considers such use outside the scope of Section 5.5’s language.
  • Whether Amazon intends to use KDP content for AI training, either for its own models or through licensing deals similar to those signed by traditional publishers.
  • Whether Amazon would offer opt-out mechanisms or compensation if it does license KDP content for AI training.
  • How courts would interpret whether KDP’s “reproduce, index and store” language for purposes of “technically enabling” distribution extends to AI training.
  • What recourse indie authors would have given the irrevocable nature of rights granted and their lack of negotiating power compared to traditionally published authors.

Why This Gap Matters

Academic authors discovered their works were licensed through social media posts about corporate earnings. Traditional authors are now negotiating consent-based deals with explicit terms and revenue splits. But indie authors—who collectively represent millions of works uploaded to KDP—have received no information about what might happen to their content.

The publishing industry has established a consensus: AI training rights are separate from traditional publishing rights and require explicit authorization. The Authors Guild, HarperCollins, and Penguin Random House all acknowledge this principle. Academic publishers who proceeded without consent faced widespread backlash and damaged author relationships. If Amazon doesn’t intend to use KDP books for AI training, saying so would cost nothing and align with the transparency other publishers are providing. If Amazon believes its terms don’t grant such rights, stating that position would clarify the situation for millions of authors. If Amazon’s interpretation is that the terms are ambiguous enough to support multiple readings, that ambiguity affects authors who had no ability to negotiate different language.

Clarification would be simple. The silence, in the absence of any public commitment or clarification, leaves indie authors in a different position than their traditionally published counterparts. Traditional authors know where their publishers stand. Indie authors know only that the question exists, that the contract language is open to interpretation, that Amazon has chosen not to address it publicly.

And that the rights they granted are irrevocable.


Additional Resources:


Discover more from Beyond the Margins

Subscribe to get the latest posts sent to your email.

3 thoughts on “Could Amazon Use Your Books to Train AI?

  1. “An indie author who uploaded their debut novel to KDP in 2015 may have granted Amazon rights that continue indefinitely”. My teen would call that oddly specific.

    I am intrigued by the use of future tense for AI training instead of assuming it’s already done and trained models are being used.

    Like

  2. Fair enough. I would be very surprised, though, if a dataset that already has all the metadata and is within easy reach is not being used.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.