This article was originally published on INTA’s website and republished with permission.
In Part 1 of this two-part feature series, we cover the commercial and legal requirements for licensing copyright-protected content for use by Artificial Intelligence (AI). Here, we look at whether any copyright exceptions apply across the European Union, the UK, and the United States, as well as what the transparency requirements are for using copyright-protected materials in AI training.
Copyright Exceptions
Copyright exceptions are specific circumstances where a copyright-protected work can be used without permission. This section explores the applicability of relevant exceptions in the EU, UK, and United States.
European Union
The EU Directive on Copyright in the Digital Single Market 2019/790 (DSM), adopted in 2019, provides two text-and-data-mining exceptions: one for research and another for any purpose, under Article 4.
The research exception permits text and data mining by research organizations and cultural heritage institutions, which, generally speaking, would not apply to commercial AI companies. The second exception, which applies more broadly, comes with an important caveat: copyright holders can opt out. It is important to note that while text and data mining may be a predicate act to AI training, the exception does not apply to other copyright-relevant acts that may occur in AI development and use.
Some have raised concerns about the effectiveness of the exception, both in terms of its ability to protect the rightsholder’s position and to support the advancement of AI developments in the EU. It is unclear how the exception requirements will play out in practice, causing uncertainty for rightsholders and AI developers alike.
The future of the exceptions has even been called into question, creating further uncertainty. The European Parliament’s Policy Department for Justice, Civil Liberties, and Institutional Affairs commissioned a study in 2025 called Generative AI and Copyright: Training, Creation, Regulation. This stated that the current EU text-and-data-mining exception was not designed to accommodate the expressive and synthetic nature of generative AI training, and its application to such systems risks distorting the purpose and limits of EU copyright exceptions.
Unsurprisingly, a court in a member state has made the first referral to the Court of Justice of the European Union (CJEU) on various DSM issues. In Like Company v. Google Ireland, (Case C-250/25), concerning the use of Google’s chatbot in Hungary, the Budapest District Court (Budapest Környéki Törvényszék) is asking the CJEU for clarification on several issues, including:
- Whether the display, in a chatbot, of content that is identical to the protected content found on a publisher’s website is an act of (a) reproduction and (b) making available to the public, and, if so, whether it matters that a chatbot’s response results from a process in which the chatbot merely predicts the next word on the basis of observed patterns;
- Whether AI training engages the right of reproduction; and, if so,
- Whether the text-and-data-mining exception in Article 4 of the DSM Directive applies.
While the EU has a copyright exception that may apply to some activities of some AI systems, the scope of the exception is unclear. There is also an important caveat that rightsholders can opt out of the scheme, meaning that their copyright-protected materials do not fall within the scope of the exception and permission would be required.
While the EU has a copyright exception that may apply to some activities of some AI systems, the scope of the exception is unclear.
United Kingdom
No equivalent exception applies for commercial text and data mining in UK law. The text-and-data-mining exception provided by Section 29A of the Copyright, Designs and Patents Act 1988 only applies to non-commercial uses.
The UK government has considered extending the exception to include activities of AI systems. The previous Conservative government held a consultation in 2022, which set out its preferred option of an opt-out exception similar to that in the EU Directive. The majority of responses—75 out of 88—objected to the proposals.
A second consultation took place in 2024 and received much more interest, with more than 11,500 responses. The Labour government has yet to publish these responses and, in light of the creative industries’ “Make It Fair” campaign in particular, has said it is reconsidering its preferred option. The Make It Fair campaign was launched by the UK creative industries and supported by groups like Sony Music UK, PRS for Music, and Warner Music Group, and aims to stop AI companies from training models on copyrighted content without permission, credit, or compensation.
While the government considers what it will do, the Getty Images v. Stability AI case mentioned in Part 1 of this article continues, and no exception applies under UK copyright law. This means AI companies must license the use of copyright-protected works.
United States
Copyright law in the United States provides a number of exceptions, including for fair use. Fair use is fact-dependent, meaning that it is decided on a case-by-case basis. Under Section 107 of the Copyright Act, four non-exhaustive factors are relevant to whether the use of a copyright-protected work is fair use:
- The purpose and character of the use, including whether it is of a commercial nature;
- The nature of the copyright-protected work;
- The amount and substantiality of the portion used with respect to the work as a whole; and
- The effect of the use upon the potential market for or value of the work.
There are more than 50 cases under way in the U.S. courts at the moment, with three preliminary decisions made and one proposed settlement, highlighting the uncertainty as to whether AI activity—because of the breadth of different AI uses—falls within the scope of fair use. As diverse as these cases are, they generally concern training of AI systems. It is worth remembering that this is simply one aspect of the use of copyrighted materials in AI applications.
The court, in the matter of Thomson Reuters Enterprise Center GMBH v. Ross Intelligence, Inc., 1:20-cv-613-SB (D. Del. 2023), found the use of copyrighted materials for AI training not to be transformative and therefore infringing, particularly because the AI system was used to develop a product that competed with that of the rightsholders.
However, in the case of Bartz v. Anthropic, 3:24-cv-05417 (N.D. Cal. June 23, 2025), a different court considered the AI training that used authorized copies to be transformative in those particular circumstances, though the defendant’s use of pirated works for training was found not to fall within fair use. Anthropic and the authors reportedly agreed to settle the case for at least US$ 1.5 billion.
In Kadrey v. Meta Platforms, Inc., 3:23-cv-03417 (N.D. Cal. 2025), which involved Meta’s downloading of fiction writers’ books for the purpose of training its large language model, LLaMA, the court determined on summary judgment opinion stating that AI training is mostly not fair use, but that in this particular case the lawyers for the authors did not plead correctly and so fair use could apply.
There, the judge said that had the lawyers presented a meaningful argument on market dilution, factor four of the fair use requirements would have needed to go to trial before a jury for decision. He specifically noted that: “This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
As such, much legal insecurity exists as to whether AI training can benefit from fair use. The litigation of these cases will continue for many years, with decisions on them and other cases resting on the specific facts of the cases. This means developers will still be taking risks if relying on fair use where their system operates in a different context.
A different court considered the AI training that used authorized copies to be transformative in those particular circumstances, though the defendant’s use of pirated works for training was found not to fall within fair use.
The Transparency Requirements for the Materials Used in AI Training and Processing
Transparency obligations for AI companies regarding the training data of their AI systems come under the EU AI Regulation 2024/1689. The regulation uses a risk-based system that imposes stricter obligations on high-risk and general-purpose AI models. This regulation requires:
- The disclosure of the content generated by AI;
- The publication of summaries of copyrighted materials used for training; and
- The prevention of models generating illegal content by developers.
The UK government is under pressure from rightsholders to implement similar measures, as transparency is a necessary requirement to enable the enforcement of their rights. However, the House of Commons and the House of Lords could not agree on a section covering these issues in the Data (Use and Access) Bill 2025, which became legislation without the transparency section included. The issue therefore remains unresolved in the UK but is on the agenda for the forthcoming parliamentary term.
Building AI systems with transparency in mind will foster the sustainability of the model and the company. Regulation requiring transparency is already in place across the EU and is likely to follow in the UK and elsewhere. Working in partnership with rightsholders would enable AI companies to comply with transparency rules.
Licensing as a Sustainable and Successful Business Model
This article has highlighted the copyright and commercial benefits that AI systems developers gain from collaborating with rightsholders. Licensing is a solution to the legal insecurities that are seen across the EU, UK, and United States. Uncertainty because of unresolved litigation, statutory interpretation, and policy changes will continue for many years. Partnering with rightsholders offers AI developers certainty, longevity, and high-quality content for training purposes, while also helping to sustain a reliable pipeline of that content.
Licensing copyright content protects AI companies from legal insecurities across jurisdictions. Moreover, partnering with rightsholders enables AI systems to benefit from high-quality, authoritative, and properly documented content for training. This not only allows for the building of sustainable and trusted models but also empowers the copyright system to continue to encourage creativity, which will in turn benefit the information pipeline for AI companies.
Partnering with rightsholders enables AI systems to benefit from high-quality, authoritative, and properly documented content for training.
Key reasons for licensing copyrighted works for use in AI systems include:
- AI developers benefit directly, as collaboration with rightsholders helps ensure high-quality sources for training AI models, resulting in better-quality outputs and more trusted AI tools.
- AI systems benefit in the longer term by allowing the copyright system to continue to support the production of high-quality content. If AI systems do not license, they undermine the copyright system, which in turn diminishes the quality of content and weakens the sustainable source of quality information for AI systems provided by copyrighted works.
- While copyright litigation is under way in several countries, the core global copyright principles point toward the need for the content used in AI systems to be licensed.
- The litigation will be under way for many years, and licensing copyright now will enhance AI development in the meantime.
- Even where courts have issued preliminary judgments, copyright decisions are made on a case-by-case basis, meaning that judgments are case specific and do not bring certainty to other situations. It will take many years for a sufficient body of jurisprudence to develop that can offer reliable guidance, and even then, certainty is not guaranteed.
- Building licensing into the business model of an AI system ensures predictable costs, rather than leaving developers to face uncertain and potentially significant copyright infringement damages down the line.
