The U.S. Copyright Office (USCO) recently released its highly anticipated Report on Generative AI Training (the third and final part in the USCO’s AI and copyright series) in a pre-publication format. At a time when headlines dominate, and despite the leadership changes underway at the USCO, the report represents a substantial body of work and analysis that explores how copyright law applies to the training of generative AI systems with a level of nuance that reflects the expertise of its drafters and a clear understanding of the importance of both the copyright and AI sectors to the broader innovation ecosystem.
The report’s findings reinforce a key principle long supported by CCC: artificial intelligence and copyright protection both serve important roles and can work together. The future of generative AI depends on licensing frameworks that reward rightsholders for creating content while enabling innovation. The report’s conclusions and recommendations underscore the need for thoughtful, licensing solutions that support sustainable technological advancement and creative ecosystems.
The importance of this balance cannot be overstated, especially when considering the broader economic impact of intellectual property. According to a 2024 U.S. Chamber of Commerce study, in the U.S. alone, IP-intensive industries support over $5 trillion in economic activity, millions of high-quality jobs, and contribute more than $140 billion in IP-related exports. These industries are the backbone of U.S. innovation—powering R&D, raising wages, and supporting communities across every state.
CCC has long emphasized that the future of AI is not anti-copyright. It’s built on copyright. It is possible and indeed imperative to be both pro-AI and pro-copyright: building smart, scalable licensing solutions that enable AI and creators.
If you’re not reading all 100+ pages of the report, here’s a curated selection of what the Office itself says—in its own words:
Setting the Stage:
- “The public interest requires striking an effective balance, allowing technological innovation to flourish while maintaining a thriving creative community.” (p. 1)
On Prima Facie Infringement (Is it Copying?):
- “The steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction.” (p. 26)
- “The training process also implicates the right of reproduction.” (p. 27)
- On Retrieval-Augmented Generation (RAG): “RAG also involves the reproduction of copyrighted works.” (p. 30)
- On Outputs: “Generative AI models sometimes output material that replicates or closely resembles copyrighted works.” “Such outputs likely infringe the reproduction right and, to the extent they adapt the originals, the right to prepare derivative works.” (p.31)
On Fair Use – The Central Question:
- “To the extent that acts involved in developing and deploying a generative AI model constitute prima facie infringement, the primary defense available is fair use.” (p. 32)
- Factor 1 (Purpose & Character):
- “In the Office’s view, training a generative AI foundation model on a large and diverse dataset will often be transformative.” (p. 45) “But transformativeness is a matter of degree, and how transformative or justified a use is will depend on the functionality of the model and how it is deployed.” (p. 46)
- “Where the resulting model is used to generate expressive content, or potentially reproduce copyrighted expression, the training use cannot be fairly characterized as ‘non-expressive.'” (p. 48)
- “Nor do we agree that AI training is inherently transformative because it is like human learning.” (p. 48)
- “In the Office’s view, the knowing use of a dataset that consists of pirated or illegally accessed works should weigh against fair use without being determinative.” (p.52)
- “Copyright owners have a right to control access to their works, even if someone seeks to obtain them in order to make a fair use. Gaining unlawful access therefore bears on the character of the use.” (p. 52)
- Factor 2 (Nature of Work):
- “Where the works involved are more expressive, or previously unpublished, the second factor will disfavor fair use.” (p.54)
- Factor 3 (Amount & Substantiality):
- “In the Office’s view, while there are meaningful distinctions from the intermediate copying cases, their logic suggests that the third factor may weigh less heavily against generative AI training where there are effective limits on the trained model’s ability to output protected material from works in the training data.” (p. 59)
- “Where a model can output expression, however, the question is whether, like Google Books, the AI developer has adopted adequate safeguards to limit the exposure of copyrighted material.” (p. 59)
- Factor 4 (Effect on the Market):
- “Where licensing markets are available to meet AI training needs, unlicensed uses will be disfavored under the fourth factor. But if barriers to licensing prove insurmountable for parties’ uses of some types of works, there will be no functioning market to harm and the fourth factor may favor fair use.” (p. 71)
- “The copying involved in AI training threatens significant potential harm to the market for or value of copyrighted works. Where a model can produce substantially similar outputs that directly substitute for works in the training data, it can lead to lost sales. Even where a model’s outputs are not substantially similar to any specific copyrighted work, they can dilute the market for works similar to those found in its training data, including by generating material stylistically similar to those works.” (p.73)
- “The assessment of market harm will also depend on the extent to which copyrighted works can be licensed for AI training… Where licensing options exist or are likely to be feasible, this consideration will disfavor fair use under the fourth factor.” (p. 73)
- Weighing the Factors & Litigation:
- “We observe, however, that the first and fourth factors can be expected to assume considerable weight in the analysis.” (p. 74)
- “The Office expects that some uses of copyrighted works for generative AI training will qualify as fair use, and some will not. On one end of the spectrum, uses for purposes of noncommercial research or analysis that do not enable portions of the works to be reproduced in the outputs are likely to be fair. On the other end, the copying of expressive works from pirate sources in order to generate unrestricted content that competes in the marketplace, when licensing is reasonably available, is unlikely to qualify as fair use. Many uses, however, will fall somewhere in between.” (p. 74)
On Licensing & Market Development:
- “These developments demonstrate that voluntary licensing may be workable, at least in certain contexts… The Office recognizes, however, that practical challenges remain in many areas.” (p. 103)
- “Collective licensing can play a significant role in facilitating AI training, reducing what might otherwise be thousands or even millions of transactions to a manageable number…Although collective licensing presents its own logistical and organizational challenges, it affords copyright owners and licensees flexibility to tailor agreements to their needs.” (p. 104)
- Regarding opt-outs: “As to the possibility of an opt-out mechanism, the Office agrees that requiring copyright owners to opt out is inconsistent with the basic principle that consent is required for uses within the scope of their statutory rights.” (p. 105)
The USCO’s Conclusion (for now):
- “In applying current law, we conclude that several stages in the development of generative AI involve using copyrighted works in ways that implicate the owners’ exclusive rights. The key question, as most commenters agreed, is whether those acts of prima facie infringement can be excused as fair use.” (p. 107)
- “But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.” (p. 107)
- “Given the robust growth of voluntary licensing, as well as the lack of stakeholder support for any statutory change, the Office believes government intervention would be premature at this time.” (p. 107)
- “American leadership in the AI space would best be furthered by supporting both of these world-class industries that contribute so much to our economic and cultural advancement.” (p.107)
As Maria Pallante (AAP) aptly said “Intellectual property and technology are twin superpowers that work best when they work together.”