heart of the matter - executive summary banner

Copyright Liability for LLM Outputs


The following is an excerpt from the article “The Heart of the Matter: Copyright, AI Training, and LLMs,” authored by Daniel Gervais (Milton R. Underwood Chair in Law, Vanderbilt University), Noam Shemtov (Professor in Intellectual Property and Technology Law/Deputy Head of CCLS, Queen Mary University of London), Haralambos Marmanis (Executive Vice President and CTO, CCC), and Catherine Zaller Rowland (Vice President and General Counsel, CCC).

The full article can be read in the Journal of the Copyright Society.

In addition to copyright liability for using copyrighted works as inputs without permission, there is a lot of discussion about how to treat outputs—those things generated by the AI systems built on training involving copyrighted works.

Understanding copyright liability for outputs generated by LLMs can be complex due to the multiple copyright rights involved. A prominent concern is the right of reproduction. The primary question here is whether the LLM has created something that is indistinguishable from, or substantially similar to, an existing copyrighted work. If so, infringement may occur unless an exception applies or the LLM did not have access to the original work.1

Another key right is the creation of derivative works, which includes adaptations or translations.2 For example, consider an LLM that translates a recent Booker or Goncourt winning novel into another language, such as Japanese or Spanish.3 This action would violate the right to translate, which is a specific aspect of the broader right to create derivative works.4 It would also infringe the Berne Convention’s exclusive translation right, because the Convention text “does not distinguish among means of translation.”5 In addition, such a translation could also violate the rights to reproduce and distribute the original work, not to mention potentially violate the moral rights of the author, especially if the translation uses material without proper attribution.6 It is reasonable to expect that a court would not only enjoin the distribution of an unauthorized translation, but also potentially award damages to the rightful copyright holder. As discussed below, there could be a separate violation if rights management information is removed.7

This does not, however, fully answer hard questions about the right to prepare derivative works under US law. The Copyright Act provides an exclusive right “to prepare derivative works based upon the copyrighted work” and defines “derivative work” in part as any work “based upon one or more preexisting works.”8 This definition of the right could loosely be used as a definitionof machine-learning when applied to the creation of literary and artistic productions because AI machines can produce literary and artistic content (output) that is almost necessarily “based upon”a dataset consisting of preexisting works.9 The definition cannot literally mean what it says because human creations are often, if not almost always, “based upon” some other work that the author has read, seen, consulted, experienced, or been influenced by in some other way.10 As Isaac Newton put it in a nutshell, we “stand on the shoulder of giants.”11

The broad language of the first part of the statutory definition (the “based upon” clause) can be restrained by the enumeration that follows in application of the ejusdem generis rule.12 One can argue that the list captures the major forms of derivation that come under the derivative work umbrella and that the opening clause may then just capture what has elsewhere been labelled “penumbral derivatives,” which one could define as works covered by the broad opening words of the statutory definition of “derivative work” (“a work based upon one or more preexisting works”) but not mentioned in the list of illustrations.13 Other arguments to limit the reach of the right exist. This has been a long-standing question in copyright law. Professor Paul Goldstein, for example, has argued that, in light of the enumeration, the statutory text is intended primarily to protect certain licensing markets.14 It can be argued that the massive copying of protected works to train and fine-tune LLMs constitutes a significant market for licensing, a matter to which the article returns below.

Another controversy that the production of literary and artistic material by LLMs elevates to a core issue is the originality controversy.15 It is beyond cavil that, to be protected as a derivative work, a literary or artistic production must meet the originality condition applicable to other works of authorship—but does that mean that, to infringe the derivative work right (belonging to a third party), the derivative work must also be original?16 This has been a long-standing question in copyright. Professor Goldstein opined that the derivative work right may be infringed even if the derivative production would not qualify for protection as a work, and the Ninth Circuit agrees.17 Professor Nimmer and the Seventh Circuit have taken a different view, however, though not in the context of AI.18

Legal adaption of the copyright framework to LLMs will happen in several ways. An amendment to the copyright statute is only one of them.19 Courts will also play their customary role.20 As of this writing more than two dozen court cases were pending in the United States and elsewhere to determine in particular the scope of exceptions such as fair use in the United States or the 2019 EU Directive.21 Then, private ordering is likely to play a prominent role to put an end to or avoid litigation and increase certainty for all parties involved, probably taking the form of licensing arrangements that would determine what can and cannot be done with copyrighted material used for commercial training purposes. The production of certain derivative works may be a prime target for such a contractual vehicle, considering the fuzziness of the borders of the derivative work right.

A final issue with a number of LLMs is that they are trained on large amounts of data, such as material available online.22 In this case, even if the creators of the LLM claim they may not know exactly what the model was trained on, it can be argued that they knew or should have known that some portion of the material was copyrighted.23 Then, whether rights in a particular work in the dataset have been infringed will presumably follow the traditional infringement analysis.24

  1. For example, an unpublished paper manuscript. See 17 U.S.C. § 106(1); 4 Melville B. Nimmer & David Nimmer, Nimmer on Copyright § 13.03 (“[W]hat is required by the traditional standards of copyright law […] for decades prior to adoption of the 1976 Act and unceasingly in the decades since, has included the requirement of substantial similarity.”). ↩︎
  2. See supra notes 66 and 67 [read full article here]. ↩︎
  3. The US Copyright Act provides an illustrative list of works that constitute a derivative work: translations, musical arrangements, dramatizations, fictionalizations, motion-picture versions, sound recordings, art reproductions, abridgments, and condensations. 17 U.S.C. § 101 (definition of “derivative work”). ↩︎
  4. See id. ↩︎
  5. Ginsburg & Ricketson, supra note 49, ¶ 11.27 [read full article here]. ↩︎
  6. See Jane C. Ginsburg, The Most Moral of Rights: The Right to be Recognized as the Author of One’s Work, Geo. Mason. J. Int’l Comm. L. 44 (2016) (noting that a moral right of attribution on all categories of works is recognized in the copyright laws of Berne Convention member States and that it is a U.S. obligation under Art. 6bis of the Berne Convention). ↩︎
  7. 17 U.S.C. § 1202(a). ↩︎
  8. 17 U.S.C. §§ 101, 106(2) (emphasis added). ↩︎
  9. Otherwise, the LLM could not produce more content of this type, as explained in supra Part II.B [read full article here]. ↩︎
  10. For example, it is well-known that to learn creative writing or art, humans learn from existing masterpieces and other works. See Daniel Gervais, The Derivative Right, or Why Copyright Protects Foxes Better than Hedgehogs, 15 Vand. J. Ent. & Tech. L. 785, 851 (2013) (“By copying a master’s work, the ‘pupil’ might at least get a glimpse of the great author’s mind, which would seem like a normatively desirable process. ‘L’art naît d’un regard sur l’art,’ as the French would say: art is born from a view on existing art.”). ↩︎
  11. See generally Robert K. Merton, On The Shoulders Of Giants 8–12 (1993). Professor Bridy, for example, has argued along those lines “all cultural production is inherently derivative.” Annemarie Bridy, Coding Creativity: Copyright and the Artificially Intelligent Author, 2012 Stan. Tech. L. Rev. 5, 12 (2012). ↩︎
  12. On the ejusdem generis rule, see Garcia v. United States, 469 U.S. 70, 74 (1984). When general terms follow an enumeration of persons or things, such general words are not to be construed in their widest extent, but are to be held as applying only to persons or things of the same general kind or class as those specifically mentioned. In 17 U.S.C. § 101 of course, the general words of the “based upon” clause precede instead of follow, but the canon could still be invoked. The canon, however, “cannot be used to ‘obscure and defeat the intent and purpose of Congress’ or ‘render general words meaningless.’” United States v. Kaluza, 780 F.3d 647, 661 (5th Cir. 2015). ↩︎
  13. For example, mounting pages from art books on tiles as in Mirage Editions, Inc. v. Albuquerque A.R.T. Co., 856 F.2d 1341 (9th Cir. 1988). ↩︎
  14. Paul Goldstein, Goldstein On Copyright § 7.3 (3d ed. 2012); Paul Goldstein, Derivative Rights and Derivative Works in Copyright, 30 J. Copyright Soc’y U.S.A. 209, 221 (1983) (noting that “[i]t is no coincidence that the principal cases establishing broad rights against infringement by derivative works characteristically involve situations in which the alleged infringer had at some earlier point sought a license.”). ↩︎
  15. See Daniel Gervais, AI Derivatives: The Application of the Derivative Work Right to Literary and Artistic Productions of AI Machines, 52:4 Seton Hall L. Rev. 1111 (2022) (discussing the requirement applied by some US courts that a defendant’s production be original to qualify as an infringing derivative work). Interestingly, even if the defendant’s production is original, it would generally not be protected by copyright if it is infringing under 17 U.S.C. 103(a). ↩︎
  16. See id. ↩︎
  17. Paul Goldstein, Derivative Rights and Derivative Works in Copyright, 30 J. Copyright Soc’y U.S.A. 209, 231 n.75 (1983) (“[T]he Act does not require that the derivative work be protectable for its preparation to infringe.”); see also Mirage Editions, Inc. v. Albuquerque A.R.T. Co., 856 F.2d 1341, 1342 (9th Cir. 1988); Munoz v. Albuquerque A.R.T. Co., 38 F.3d 1218 (9th Cir. 1994). In a 1909 Act case, the Ninth Circuit found that it made “no difference that the derivation may not satisfy certain requirements for statutory copyright registration itself.” Lone Ranger Television, Inc. v. Program Radio Corp., 740 F.2d 718, 722 (9th Cir. 1984). ↩︎
  18. Lee v. A.R.T. Co., 125 F.3d 580, 582 (7th Cir. 1997); Melville B. Nimmer & David Nimmer, 1 Nimmer on Copyrights §§ 3.01-3.03. ↩︎
  19. Christopher T. Zirpoli, Cong. Research Serv., LSB10922, Generative Artificial Intelligence and Copy. Law (Sept. 30, 2023) (“Congress may consider whether any of the copyright law questions raised by generative AI programs require amendments to the Copyright Act or other legislation.”). ↩︎
  20. See id. (“Given how little opportunity the courts and Copyright Office have had to address these issues, Congress may adopt a wait-and-see approach. As the courts gain experience handling cases involving generative AI, they may be able to provide greater guidance and predictability in this area through judicial opinions.”). ↩︎
  21. See infra, Part IV [read full article here]. ↩︎
  22. See supra part II [read full article here]. ↩︎
  23. See Sean Hollister, Microsoft’s AI boss thinks It’s Perfectly Okay To Steal Content If It’s On The Open Web, The Verge (June 28, 2024), www.theverge.com/2024/6/28/24188391/microsoft-ai-suleyman-social-contract-freeware. ↩︎
  24. It is not necessary to show that the defendant intended to copy a specific work. The case law has recognized “unconscious crying” as sufficient, for example. See e.g., ABKCO Music, Inc. v. Harrisongs Music, Ltd., 722 F.2d 988, 998 (2d Cir. 1983). ↩︎

Topic:

Author: CCC

A pioneer in voluntary collective licensing, CCC advances copyright, accelerates knowledge, and powers innovation. With expertise in copyright, data quality, data analytics, and FAIR data implementations, CCC and its subsidiary RightsDirect collaborate with stakeholders on innovative solutions to harness the power of data and AI.