Licensing Is the Key to Unlocking the Full Potential of Artificial Intelligence: Part 1

This article originally appeared on the International Trademark Association (INTA) website. Re-published with permission.

The applications and uses of artificial intelligence (AI) continue to expand. Technological leaps forward are being seen across all sectors, including medicine, education, policing, aerodynamics, and the creative industries. Yet, AI technology is merely in its infancy compared to its potential and the possibilities for its utility. The licensing of published copyright-protected source materials is critical to the development of powerful, trustworthy AI models and systems that can realize their full potential.

Part 1 of this article covers the commercial and legal requirements for licensing copyright-protected content for use by AI, and Part 2 will examine whether any copyright exceptions apply in the United States, the European Union, and the United Kingdom, as well as what the transparency requirements are for using copyright-protected materials in AI training.

Do You Need a License to Use Copyright-Protected Material in AI?

To answer this question, let’s consider the commercial and copyright factors.

Commercial Considerations

When companies build AI models and tools, the quality of the output will depend entirely on the quality of the input. Working with licensed copyright-protected works will ensure that the AI system is trained on high-quality content, which will therefore improve the quality of the output. Moreover, AI developers can reduce the time and cost involved in information cleansing and organizing by working with rights holders to ensure high-quality and well-organized material.

The long-term sustainability of AI systems depends on a continuation of high-quality information input. AI companies therefore need to maintain a pipeline of high-quality source material to sustain the quality of their models. Licensing copyright-protected works and using the copyright system, which already encourages the creation and dissemination of content, can achieve this. Licensing, therefore, allows the copyright system to continue to support the production of high-quality content. However, if AI systems do not license, they will undermine the copyright system. This will diminish the quality of content over time, which in turn will negatively impact the sustainable source of quality information provided by copyrighted work for AI systems to use.

In addition, working in collaboration with rights holders will improve the marketability of the AI tool or model, since it will be a trusted service and can benefit from direct access to a relevant market, which the rights holder will already have.

Case Study: Lessons Learned from How YouTube Used Copyright Licensing

Initially, when YouTube first launched, it believed that it did not have to obtain a licence for the use of the music that users shared on the platform. The rights holders, in this instance the music industry, sued to enforce their rights against the alleged infringement that went on for many years. Eventually, however, YouTube realized that it could benefit from working with, and not against, the rights holders, and it negotiated licensing deals. As a result of collaborating with the music-industry rights holders, paying royalties for the use of the music on the platform, and in exchange receiving the benefits of the partnership and upholding the copyright system, YouTube has become the biggest music platform in the world.

The parallels between the YouTube story and the current situation with AI companies and copyright industries are undeniable. This history demonstrates how licensing can avoid lengthy and expensive court cases while at the same time improve the tool or model that is engaging with the copyrighted content.

The parallels between the YouTube story and the current situation with AI companies and copyright industries are undeniable.

Copyright Considerations

Copyright protects original artistic, literary, musical, and dramatic works, as well as sound recordings, films, broadcasts, published editions, and databases. The basic principle of copyright is that, in many cases, it requires users to get permission to use these works. Therefore, when AI models and tools use datasets containing copyright-protected materials, whether that be a book, piece of music, photograph, or a database, copyright law starts from the point that this requires a license, unless an exception applies (discussed in Part 2).

The purpose of copyright is, in short, to encourage the creation and dissemination of culture and knowledge through economic rewards. It does this by granting rights holders a set of exclusive rights, while balancing these rights with limitations, such as the term of protection, and with exceptions in special circumstances. Although copyright laws are territorial, and different countries have nuances in their rules, the core principles and purpose of copyright are aligned. International treaties such as the Berne Convention, which 182 countries have ratified, set out minimum standards that encompass these principles. As such, under the core principles of copyright, the use of copyright-protected works in AI training or programs requires permission in the first instance.

AI systems engage in at least three activities that have copyright implications. The first is the sourcing of the training content.

With the quality of the source materials is integral to the quality of the training and operation of the AI system, sourcing materials without the cooperation of the copyright holder will lead to lengthy and costly data cleansing as well as lower quality of content in the data. To use the copyright-protected content, the AI system will often make and store a copy of the content, which, under general copyright principles, would require a license to avoid copyright infringement claims. This would apply even to systems that do not store a copy of the original copyright-protected materials, since they would still have made an initial copy .

Another risk is that even where certain AI systems may benefit from a copyright exception for their particular use, this might not apply if any of the content used in the data training sets are unauthorized copies. For example, in the U.S. case of Bartz v. Anthropic PBC, 3:24-cv-05417 (N.D. Cal. June 23, 2025), AI company Anthropic initially trained successive large language models (LLMs) on pirated books, downloading unauthorized copies to build a central library that was not justified by fair use.

Not only did Anthropic end up spending many millions on books after realizing that the quality of the unauthorized copies would not suffice, but now it faces the possibility of large copyright damages, potentially in the billions of dollars. To avoid this risk, Anthropic and the authors reportedly agreed to settle the case for at least US$ 1.5 billion.

The second point at which AI systems engage with copyright is in the training of the AI model using copyright-protected content.

While there are nuances across jurisdictions, one can summarize the test for primary copyright infringement generally as (1) was there copying, and (2) if so, were the parts copied substantial? Unsurprisingly, the right to copy the work is at the core of the copyright privileges rights holders obtain. It is therefore a broad legal concept that means reproducing the work in any material form, including storing the work in any medium by electronic means.

Copyright law is, for the most part, not intended to be technologically specific, but rather aims to be technologically neutral to give the legislation endurance and meaning to the principle of copyright. A useful parallel to understand what this means in practice is the UK case Navitaire Inc v. EasyJet Airline Company Ltd, Bullet Proof Technologies Inc., [2003] EWHC 3487 (Ch), involving the copying of a website user interface.

Copyright law is, for the most part, not intended to be technologically specific, but rather aims to be technologically neutral to give the legislation endurance and meaning to the principle of copyright.

In that case, Justice Pumfrey acknowledged that the coding of the programs was different, but that the outcome on the screen, from the user’s perspective, appeared the same. He explained that it was the same as taking the plot of a book:

In the same way that copyright subsisting in a literary work may be infringed by a change in medium in which all that is taken is the plot, so also, it is said, may the copyright in computer software be infringed when the functional structure of the code is appropriated by writing different code which, put crudely, works in the same way.

Therefore, though the code behind the websites was different, the High Court of England and Wales found copyright infringement because the website looked the same from the users’ perspective. Ultimately, it matters not how a copy is made; it is simply enough that a protected work is used without permission. Copyright is not about the regulating of technology; it is about encouraging the creation and dissemination of creativity and therefore it regulates copying, in any material form.

To train and run AI systems using copyright–protected works, the programmer needs to make copies of the copyright–protected works. While some AI firms may argue that in their particular system the copy is temporary, because, they assert, once a copy is tokenized—that is, text is broken down into tokens, which are usually words, subwords, or characters—the original copy is then allegedly discarded. This thinking misunderstands the extent of the legal definition of copying and the purpose of copyright. After all, tokenizing still engages the reproduction and adaptation rights that copyright provides.

In any event, many AI systems store the copies of the content used, which renders the question of copying undeniable. However, even where copies are not stored, the AI system would still have copied the work under copyright law and therefore, without an exception, would require the permission of the rights holders to do so.

The third point of copyright engagement is the output of the AI system. Whether an AI output infringes a copyrighted work depends on the content used as input and the type of output. For example, a deepfake or digital replica is intended to mimic the input content and therefore, by its nature, would require the taking of a substantial or otherwise significant part of the original.

As mentioned, it is widely recognized that the quality of the AI output depends on high-quality input and the best way to achieve this is through engaging with copyright holders. Moreover, when the material input is licensed, similarities between the content generated by the AI system can be mitigated within the terms of the license.

Criminal copyright infringement may also be relevant, since one can commit an offense by making for sale or hire, an article which is an infringing copy of a copyrighted work. For example, committing an offense under the UK Copyright, Designs and Patents Act 1988 (CDPA 1988), Section 107(1)(e) requires knowledge to be known, or reason to believe that the articles made for sale or hire are infringing.

Criminal copyright infringement may also be relevant, since one can commit an offense by making for sale or hire, an article which is an infringing copy of a copyrighted work.

Section 107(2) of the CDPA 1988 also provides that it is a criminal offense to specifically design or adapt something that makes copies of a copyrighted work, where the designer knows, or has reason to believe, that it will be used to make infringing copies. On the question of “reason to believe,” the UK Magistrate’s Court has held that broad knowledge of the items complained of, and the nature of the infringement, is sufficient.

Additionally, secondary copyright infringement occurs when a person, or company, deals in infringing copies. For example, under Section 22 of the CDPA 1988, secondary copyright infringement includes importing an article into the UK which is—and they know or have reason to believe it is—an infringing copy of the work.

Likewise, this includes possessing, selling, or letting for hire, or offering or exposing for sale or hire, exhibiting in public or distributing an article into the UK which is—and they know or have reason to believe it is—an infringing copy, in the course of business, to such an extent as to prejudice the owner of the copyright (Section 23). The issue was tested in the case of Getty Images US Inc v. Stability AI Ltd, [2025] EWHC 38 (Ch). The court concluded that Stability AI’s Stable Diffusion models do not contain or store reproductions of the relevant works on which they were trained and thus are not “infringing copies” for the purposes of secondary copyright infringement. (A companion case is under way in the Northern District of California, Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135-UNA, in which the Getty Images complaint focuses on copyright infringement, trademark infringement, breach of its terms and conditions, and circumvention.)

Getty has said that the UK finding that the training of Stable Diffusion copied its content will assist it in its U.S. claim—albeit unhelpful in the UK matter since it took place outside the jurisdiction). In the subsequent judgment of GEMA v. Open AI, (I 42 O 14139/24, Nov. 11, 2025), a German court found that since an AI model was trained on—in this case—lyrics and could reproduce them as an output, then the model had embodied the content. It was irrelevant whether or not the model held within it a literal copy. 

Therefore, application developers, providers, and hosts of AI systems that infringe copyright may be committing primary, secondary, or criminal copyright infringement.

Case Study: The UK Case in Getty Images v. Stability AI

While Stability AI operates mainly in the United States, the company is being sued in the UK, under the law of England and Wales (as well as in the U.S. case mentioned above). At first instance, Stability AI argued that Getty Images had no jurisdiction to bring the case in the UK. However, since Stability AI’s Stable Diffusion model was available in the UK, the case proceeded to trial in London. At the trial, in June 2025, Getty accepted that the AI training process took place in the United States, and so the focus became whether Stability AI committed secondary copyright infringement by making a U.S.-trained model available in the UK. The court decided on this key issue of statutory interpretation, as to whether Stability AI and its Stable Diffusion system fall within the scope of Sections 22, 23(a), and 23(b) of the Copyright, Designs and Patents Act 1988.

This means that while the outcome of the case gives some guidance and clarification for certain AI systems, it will by no means be applicable to all. In addition, the court still needs to resolve the issue of liability. Getty argued that liability falls on Stability AI, while Stability AI argued that it should be on the individual users.

Both the UK court in Getty and the German court in GEMA held that the AI developers are liable for any infringement, not—as the AI developers had argued—the users who select a prompt which the model then responds to by generating content.

The key reasons, for licensing copyrighted works for use in AI systems include:

Ensuring high-quality sources for training AI models that will result in better quality output and more trusted AI tools.
Allowing the copyright system to continue to support the production of high-quality content. If AI systems do not license, they undermine the copyright system. This will diminish the quality of content, which in turn will negatively impact the sustainable source of quality information for AI systems provided by copyrighted works.
Ensuring the predictability of costs, rather than facing uncertain and potentially significant copyright infringement damages down the line.

Part 2 will examine whether any copyright exceptions apply in the United States, the EU, and the UK, as well as the transparency requirements applicable when using copyright-protected materials in AI training.