AI & Copyright: What is Old is New Again?

Back in early 2018, I wrote a short “think piece” blog post (which was picked up by TechCrunch), entitled “How AI and copyright would work.” It got some nice pickup at the time, which was of course gratifying. Basically, I argued that, absent any discernable intent to express something – anything, really – on the part of the algorithm producing a work, it would not be appropriate for society to grant recognition of any copyright in the work – to the AI. I believe that, as with the “monkey selfie” case from a few years ago, current circumstances do not rise to the requirements set out in US copyright law and regulation for such machine-generated works to be recognized as bearing any copyright. And the USCO doesn’t think so, either – they have a “human authorship requirement” and quite appropriately, in my view.

In November of that year, for the annual Donald C. Brace Memorial Lecture on copyright, former District Judge Katherine Forrest weighed in on “Copyright Law and AI: Emerging Issues.” Her analysis was focused on addressing two critical questions:

“First, AI that acts as a creator or “author” of works may lead to complex issues of ownership. Second, actions by machines best described as AI, particularly distributed AI, can invoke questions of agency. . . .”

That still seems spot on to me. Since then, there has been a lot of activity, and gigabytes of online discussion, on the topic of AI and Intellectual Property (IP), and specifically on questions of copyright and AI. It’s become a bit of a hot topic in IP circles. This is probably to the good, and now is the right time to be thinking these things through – before an AI comes knocking at the door, demanding its rights.

Recently, in the context of public requests for comments (RFC) processes, CCC has twice provided its perspective on these questions. The first was in response to a broad set of questions issued by the US Patent and Trademark Office (USPTO).

In our January 10^th Comment to the USPTO, we addressed a few key points:

The term “Artificial Intelligence” or “AI” covers a broad range of technologies, and there is no broad, commonly accepted definition. For our purposes, we propose the following working definition: “AI systems facilitate the automation of tasks, normally performed by humans, by incorporating information from the data that they process in order to adjust the outcome of the task.” In the past decade major advances have been made in a subfield of AI that is called machine learning and, in particular, a subfield of machine learning called deep learning, which is a class of algorithms in which data is processed through layers from raw input to greater and greater levels of abstraction, at each layer providing a better representation of reality (and thereby enabling the machine to perform, and to teach itself to perform, more and more sophisticated tasks).
Some of these successful applications of machine learning and deep learning relate to tasks that rely on the processing of copyrighted materials, such as photographs, audio recordings, videos, books, journal articles, and other digital assets. In data terms, these types of content are generally regarded as “non-structured” and therefore more difficult to analyze (as compared to “structured” data, usually in the form of tables and graphs of numbers, which are more readily analyzed by traditional software), and the higher quality the unstructured data are, the higher quality the ultimate uses of them (for example, through deep learning) are likely to be.
… CCC is of the view that natural persons will continue to play the predominant role in engineering and directing AI projects and thus will continue to be authors and contributors to copyrighted works produced through an AI mechanism to the extent that such works are fixed and otherwise non-functional and therefore copyrightable. AI works that involve multiple contributors should be analyzed as collective or joint works under U.S. copyright law and will often be dealt with as a matter of contract or work for hire status. CCC is also of the view that many “data-driven” AI projects will involve the ingestion of the copyrighted works of third-party rightsholders as such works are often specialized and otherwise useful for such projects. Licensing is the obvious market solution for the ingestion of professionally-produced and -curated copyrighted works for commercial purposes (otherwise, such use amounts to infringement) or where the use supplants existing markets.

In February, CCC also submitted Comments in response to the World Intellectual Property Organization’s (WIPO) “Conversation on IP and AI.” Among the (admittedly related) points we addressed in that context are:

… [M]any AI practices involve the ingestion of copyrighted content, including the content found in journals, newspapers, books and databases, the rights for which comprise CCC’s repertories available for licensing. The result of significant ideas and research, thoughtful analysis of facts and theories, and conscientious and (hopefully) clear writing skills, this kind of copyrighted content has driven scientific, political, economic and business decision-making for hundreds of years. And it is the qualities of this type of content that make it most desirable for training and as datasets in various forms of AI applications, just as it has been used for the training of humans since (at a minimum) the advent of writing and has formed the “datasets” (usually called research materials) for those humans. This point about quality is widely recognized: for example, a September 2019 WIPO conversation on AI and intellectual property reported that “[a] common misunderstanding is about the quantity of data needed for machine learning when in reality the quality of data is really the key” (emphasis added). In fact, quality data inputs, including inputs of copyrighted content, are now widely considered one of the most valuable assets for businesses and other organizations, deployed to operate successfully and efficiently.
… [O]ur view is that existing copyright law does not favor wholesale and/or systematic ingestion of copyrighted content for clearly commercial purposes, when, regardless of ultimate use, (i) such “ingestion” is merely a form of copying – an act reserved to the rightsholder for the entire history of copyright law, (ii) the content has been made available to the public specifically for purchase, subscription or licensing, and (iii) the content copied has been specifically chosen for such purpose because of its value for such purpose. Such activities amount to direct copyright infringement – unless licensed. It is worth noting that, to a substantial extent, the acts performed by AI systems are similar to those performed by humans, although typically on a different (vastly greater) scale. Thus, if it is, in the normal case, an infringement for organizations to make unauthorized copies of entire works for humans to learn from, i.e., to study and to read, it is a fortiori an infringement to do so at scale for similar machine use.

Obviously, analysis of such issues is still at an early stage, and even 6 months from now, we all might be observing them from another perspective. I’m looking forward to going through the roundup of Comments from both the USPTO (when they are published) and those from the ongoing WIPO inquiries.

Most recently, as I mull these things over in the light of new information, I’ve gone from thinking that the impact of AI on copyright has advanced from being slightly over-the-horizon invoking the old saw, “Objects in your future are closer than they appear.” Kidding aside, I am starting to think that impact is pretty close, especially in some media sectors, such as music and the visual arts.