This article was originally published in The Scholarly Kitchen.

As a person involved in copyright on a daily basis, I’ve observed a number of events and requests for comment over the last few years on the issue of whether artificial intelligence (AI) systems can be “authors” in the copyright sense (or inventors of patents). I can see the appeal of this question, as it is fundamentally interesting and futuristic. I have often felt, however, that these issues were a bit of a misdirection, with at least part of the tech community treating the copyright community like dogs distracted by squirrels. After all, while we are pondering the weighty issue of future ownership, we are not focusing on the fundamental issue of wholesale copying of works to train AI in a wide variety of situations.

This, of course, could be an accident based on true intellectual curiosity, but I do not believe it. Regardless, as of this writing there are now five cases that may provide some clarity on this less frequently discussed but foundational issue of the unauthorized use of copyrighted materials as training data for AI (I use “AI” here as a shorthand which also includes text and data mining and machine learning). Each of these cases is unique, fact dependent, and likely, if fully litigated on the merits, to shed light on different aspects of copyright law.  Below are my thoughts on what is interesting about these cases. Please note that this is in no ways meant to be a comprehensive analysis of the lawsuits.

Case 1- Doe 1 v. GitHub Inc., N.D. Cal., No. 3:22-cv-06823– Whither transformative?

As I mentioned in my January 5, 2023 post on this case, plaintiff’s attorneys filed this class action under the theory that using openly licensed code without retaining the license and credit language is a violation of the Digital Millennium Copyright Act (DMCA), among other things, but did not allege the obvious claim of copyright infringement per se. I speculated that this was an attempt to avoid a messy fair use dispute.

As I also mentioned, Microsoft’s lawyers seem to think that fair use excuses copying for AI purposes everywhere, so I would expect Microsoft to try that defense here, given its lack of other arguments. One core concept in AI-relevant cases that both find for, and against, fair use (Google Books and Fox v. TVEyes respectively) is the reliance by Defendants on claims of “transformative use.” “Transformative use” is not mentioned in Section 107 of the Copyright Act but has been read into the first of four fair use factors. It is somehow different from the right to make transformative derivative works (where the word “transformed” is used in Section 101) such as film adaptations of books, which clearly require copyright owner consent. If you are confused by the difference between transformation that excuses infringement and transformation that is the exclusive right of the creator, welcome to my world.

As a lawyer by training, I am interested in the art of lawyering. If Microsoft defends on the basis of fair use, assuming it is even relevant to the DMCA claim, it will of course want to assert that the use is “transformative.” Unlike scanning books to perform semantic analysis on the evolution of language (found transformative in Google Books), the functional code alleged to be copied by Microsoft, et al., is being used as code. I look forward to the creativity that will be on display.

Case 2- Anderson, et al. v Stability A.I. Ltd, et al.- Real market harm

Filed January 13, this class action pits illustrators against generative AI companies who, according to the Complaint, used images without permission as training data and allowed people to create works in their “style” without compensation. The Complaint is well drafted, and while I wouldn’t have filed it in the Northern District of California, given that court’s reputation as being unsympathetic to copyright holders, it is worth reading for its detailed but clear explanation of the technology, seemingly sincere accusations of “betrayal” by Defendant DeviantArt, and the addition of a Lanham Act claim.

It is clear from the Complaint that the Plaintiffs are expecting a fair use defense based almost entirely on the issue of transformativeness, and as a result break down the technology in a manner which shows why output from Defendants, in their telling, merely creates “unauthorized derivative works.” Given that the infringements alleged are commercial (Factor 1), the infringed works are highly creative (Factor 2), the works were copied in their entirety (Factor 3), and the infringing output seems to compete in the market with the originals (Factor 4), Defendants need to somehow hit transformative out of the park. Even then, they may still lose as happened to the Defendants in the TVEyes case, where a finding of transformative use did not overcome liability.

I will mention, however, that Cases 1 and 2 are both class action lawsuits, and class actions are strange beasts with complicated rules which often yield unusual results. This raises the possibility that the courts might not get to the substantive copyright issues at play.

Case 3- Thomson Reuters Enterprise Center GMBH and West Publishing Corp. V Ross Intelligence, Inc. – (Some) answers coming soon

This case, which involves the alleged surreptitious copying of the entire Westlaw database (after having been denied a license) in order to create an allegedly competing product, is already significant in that the Complaint survived a motion to dismiss. In other words, alleging infringement by making copies for training purposes, even where the competing product does not itself display the copyrighted content, states an actionable claim.

As the case approaches its three-year birthday, we are now in the summary judgement phase. The Defendant argues (1) that breach of contract (essentially downloading in violation of the terms and conditions) is preempted by copyright law, and (2) that the copying was fair use. I don’t see the court buying the preemption argument so we may get an on-point fair use ruling. Summary judgement can only be granted if there is no dispute as to material facts and therefore no fact finder could legally rule against the moving party. In other words, summary judgement is only granted if the law is settled and the parties aren’t arguing over the facts at issue but, instead, dispute how the law should apply to the facts both parties agree are true. With the bad faith alleged of the Defendant, the wholesale copying and competition, summary judgement seems unlikely. Regardless of who prevails on summary judgement, a court decision on the issue will have a ripple effect on the other US cases discussed here.

Cases 4 and 5- Getty Images v Stability AI – Clean facts in two jurisdictions

Getty Images has filed two parallel cases as of this writing; one in the US and one in the UK. I know little about the UK case other than what is in this press release. That does not, however, diminish my excitement. While US law on training data and AI may be complex (e.g., trying to square Google Books with TVEyes; trying to square the definition of transformative under Section 107 with transformative under 101), UK law is clear. The UK was in the vanguard of creating a non-commercial research exception for TDM, and, as I wrote in the Scholarly Kitchen last July, the UK Intellectual Property Office recently mooted an expansion to commercial use. This proposed expansion of the exception was recently rejected by the UK government. In other words, in the UK, there is a copyright exception for non-commercial research, everything else requires in a license, and there is little if any ambiguity.

While UK law arguably offers more certainty, the US offers statutory damages. In the US case, Getty alleges millions of works were copied by Stability AI, and it specifically cites 7,216 works for which it has copyright registrations. Minimum statutory damages are $750 per work infringed; maximum damages are $150,000 per work if found to be willful. Thus, as long as infringement is found, minimum damages are $5,412,000 and maximum are $1,082,400,000, plus possibly an award of attorneys’ fees.

Almost as interesting is the trademark/Lanham Act claim. In the Complaint, Getty is including images which show AI-generated distortions of Getty’s trademarks and watermarks on images created by the Defendant’s system, presumably trained using Getty works without consent from Getty. This will be hard to defend.

If Case 2 were brought in a jurisdiction that recognized more traditional moral rights, that would provide another basis for a claim. Will a lawsuit in an EU jurisdiction be next?

What does this mean for the future of AI?

These cases are not about the future of AI itself, and even if all of the Defendants are found liable, AI innovation will not cease. While training AI usually involves large data sets, significant AI innovation occurs today by virtue of tech companies (and others) using large datasets licensed by entities such as Getty, STM publishers, and news outlets, among others.

These cases are not against AI. Rather, they will determine whether those who create works have a voice in the use of those works by commercial entities, some of whom compete with the original creators. As such, innovation through AI is not at risk, but these cases may have a long-term impact upon the rules governing reuse of copyrighted, valuable and reliable inputs and the incentives of ongoing creation.


Author: Roy Kaufman

Roy Kaufman is Managing Director of both Business Development and Government Relations for CCC. He is a member of, among other things, the Bar of the State of New York, the Author’s Guild, and the editorial board of UKSG Insights. Kaufman also advises the US Government on international trade matters through membership in International Trade Advisory Committee (ITAC) 13 – Intellectual Property and the Library of Congress’s Copyright Public Modernization Committee. He serves on the Executive Committee of the of the United States Intellectual Property Alliance (USIPA) Board. He was the founding corporate Secretary of CrossRef, and formerly chaired its legal working group. He is a Chef in the Scholarly Kitchen and has written and lectured extensively on the subjects of copyright, licensing, open access, artificial intelligence, metadata, text/data mining, new media, artists’ rights, and art law. Kaufman is Editor-in-Chief of "Art Law Handbook: From Antiquities to the Internet" and author of two books on publishing contract law. He is a graduate of Brandeis University and Columbia Law School.
Don't Miss a Post

Subscribe to the award-winning
Velocity of Content blog