Over the last two years, there has been an explosion of new and easily accessible artificial intelligence (AI) tools, based on large language models (LLMs), like ChatGPT, Claude, and Gemini. These tools have captured the attention of the public and hold great promise to improve many areas of our lives. At the same time, they raise important questions about how AI technologies and copyright work together.
The recent article The Heart of the Matter: Copyright, AI Training, and LLMs, forthcoming in the Journal of the Copyright Society, explains how copyright and AI intersect. It addresses several key areas of the relationship between copyright and generative AI, discussing the way copies are made and used in LLMs, the significant copyright liability issues that can arise from these uses, and the inconsistent international landscape, which is subject to court decisions in dozens of ongoing cases. The article finds that licensing is a logical solution to these challenges, with direct and voluntary collective licensing both playing important roles in enabling copyright owners and users to work together and innovate.
Using Copyrighted Works in LLMs
LLMs use massive amounts of textual works—many of which are protected by copyright. To do this, LLMs make copies of the works they rely on, which involves copyright in several ways, such as:
- Using copyright-protected material in the training datasets of LLMs without permission can result in the creation of unauthorized copies: copies generated during the training process and copies in the form of representations of the training data embedded within the LLM after training. This creates potential copyright liability.
- Outputs—the material generated by AI systems like LLMs—may create copyright liability if they are the same or too similar to one of the copyrighted works used as an input unless there is an appropriate copyright exception or limitation.
Licensing is Crucial
While copyright is clearly at issue, how it applies is considerably murkier. Addressing these issues varies by where the technologies and copyrighted works are used. Some countries have or are considering laws specific to AI (or uses often incorporated in AI like text and data mining), while others have not yet enacted any laws in this area. Those countries that have addressed some of these issues through legislation have not done so in a consistent manner, thus requiring a careful review of a complex legal landscape. This is compounded by dozens of lawsuits in different jurisdictions in different stages in their respective processes. These cases most likely will not be decided in the same way.
While there is no single global copyright law, countries vary significantly in their approach to copyright and AI-related issues, and there is no single court that will hand down all decisions on copyright and AI; global licenses can harmonize how copyright owners and users agree to use copyrighted works. Licenses enable innovation and progress by permitting consistent and responsible copyright uses in support of untold scientific and cultural advancements. Licenses have the potential to put an end to much of the uncertainty surrounding pending and future litigation, putting acceptable guidelines on what can and cannot be done with copyrighted material when training and using LLMs.
Both direct and collective licenses are valuable to reduce uncertainty and establish a viable ecosystem going forward. Direct licensing—agreements between one or more copyright owners and users—allows the parties to be flexible in defining terms like payment, timing, and addressing specific, bespoke use cases. Specific high-value or individual uses based on defined sets of copyrighted materials are well-suited for direct licensing and benefit both copyright owners and users.
Voluntary collective licensing is also critical in solving the licensing puzzle. This type of license enables users to obtain a single license that can cover a wide range of copyrighted works from multiple copyright owners without having to negotiate with each copyright owner individually. This approach is highly beneficial for both copyright owners and users because it provides an efficient mechanism to grant and obtain permission for using copyrighted works. Importantly, it provides a harmonized set of rights that can be used globally, regardless of the differences in legislation from country to country. At the same time, a collective license provides copyright owners with the ability to reach and serve a larger number of users than they may be able to work with directly.
In summary, licensing provides compliant access to high-quality copyrighted works, leading to innovative uses. Licensing has consistently provided an outlet for innovation and respect for copyright from the invention of the photocopier to the introduction of the world wide web, and it is poised to do the same with AI.
The Heart of the Matter: Copyright, AI Training, and LLMs is authored by Daniel Gervais (Milton R. Underwood Chair in Law, Vanderbilt University), Noam Shemtov (Professor in Intellectual Property and Technology Law/Deputy Head of CCLS, Queen Mary University of London), Haralambos Marmanis (Executive Vice President and CTO, CCC), and Catherine Zaller Rowland (Vice President and General Counsel, CCC).
Visit CCC’s AI, Copyright & Licensing resource page for recent insights and articles.