Copyright and Artificial Intelligence: An Exceptional Tale

This post originally appeared in Medium. Reprinted with permission.

As the US government begins to consider some of the legal implications for copyright in connection with the development and deployment of artificial intelligence, it is important to first step back to ensure that we are properly guided by context and a proper understanding of our goals — grounded in an informed grasp of the relationship of copyright to the development of AI, and a fair observation of the state of legal developments around the world. Far too many observers have oversimplified how various countries have addressed the relationship between copyright and AI. The reality is that all who have done so have rejected the notion that copyright is not implicated, and have developed legal norms which carefully limit the scope of any exceptions with an eye towards facilitating licensing, even when they seek to expand the development of AI as a national economic imperative.

I have written about the approach taken by the EU in the updated Copyright Directive, and note here that despite claims about Japan’s legislation, even their provisions — as manifested in the 2018 amendments, are designed to avoid conflict with the legitimate interests of copyright owners. While I don’t necessarily agree with Japan’s approach, it is important to highlight that even its exceptions, as I understand them: (1) recognize that text and data mining/machine learning does in fact implicate copyright; (2) apply only to materials that have been lawfully acquired; (3) require that the use of each work is “minor” relative to the TDM effort; and (4) provide that license terms must be honored. While it remains unclear to me that Japan’s goal of respecting copyright as required by international law has been achieved, it is important to understand that claims that Japan has removed copyright as an issue that must be addressed in the development of AI are inaccurate. The Japanese legislature has taken pains to try to ensure that any exceptions conform to the requirements of the Berne Convention and TRIPS, and neither result in a conflict with a normal exploitation of the work nor unreasonably prejudice the legitimate interests of the copyright owner.

Returning to the current examination by agencies of the U.S. Government, my strongest guidance is to avoid the sense that we’re in some kind of race that eliminates the luxury of maintaining core values. It’s not clear to me that the legal situation requires the US to take any present action with respect to text and data mining to facilitate the development of AI. We should endeavor, to the maximum extent possible, to ensure and expand the role of consent in building databases, and should resist the notion that any and all ingestion of copyright works is protected by fair use. I’ve written separately on that point. Policy makers should look closely at what the EU did in the updated Copyright Directive which, while imperfect, is thoughtful — particularly inasmuch as it contemplates the right to prohibit uses at the copyright owner’s discretion.

An even more fundamental starting point for consideration of AI and copyright should be to unpack some of the words used in this exercise. Artificial intelligence. Machine learning. There is no artificial intelligence — only intelligence, and computers are not capable of possessing it — at least not in the way that we generally associate intelligence with reference to judgment. Computational efficiency — the ability to discern patterns through mountains of data so as to be able to predict outcomes, or to generate new patterns based on the past, is not an exercise in intelligence, and the anthropomorphism employed here affects the way we think about these issues. We need to exercise great care at the outset. Intelligence isn’t reducible to stimulus-response, regardless of the sophistication and complexity of the experiment. Aggregations of data can tell us where we’ve been. Perhaps also where, based on the past, we are likely to go. They can find patterns that may escape human perception that can inform future decision-making, but they necessarily are engaged in measuring the past, not dreaming about the future. They’re incapable of being inspired. So let’s be really careful here and not entrust our future to the guardians of the past. I will leave it to philosophers to debate the metaphysics of the meaning of intelligence — I think it will suffice here to observe that it involves more than mere computation, thus we employ the adjective “artificial.” But here, the adjective actually negates the thing it’s supposed to be modifying and therefore philosophically defines a null set. That must be borne in mind as we proceed.

Likewise, the same notions also apply to “machine learning,” a rhetorical construction that I think we should cease employing immediately. Learning implies a kind of comprehension and relationship to facts that eludes machines, regardless of sophistication. Learning involves non-computational abstraction. If we bear this in mind, perhaps we retain greater agency over the output of machines, and are in a better position to avoid the learned helplessness of those that would frame machine learning and AI as our inevitable future, shaped by forces outside of our control. That we will stop and think about the trade offs when certain champions of a tech manifest destiny declare the benefits of efficiency. Efficiency is great, but only in support of goals that we deem valuable. Perhaps it’s just me, but I don’t value greater efficiency in the dehumanization of our existence. Our choices reflect our values — values which are necessarily competing and constantly evolving. But we must not lose sight of the fact that we are making choices — that we decide what conditions should attach to developments in technology. Technological capacity is not self-fulfilling and decisions don’t take place by default. Inaction based on perceptions of lack of agency are our enemy, and these are fueled by the use of anthropomorphic language to describe the operations of the machine age. So let’s not do that.

Of course, the preceding merely describes the awareness with which we should embark as we consider what values we want to observe and empower in our approach to employing technology in advancing the needs of society. Building sophisticated databases that can be used for understanding the world that would elude human observation and capacity is obviously essential and salutary. But it isn’t necessarily beneficial without regard to how it is employed, or the means by which it is constructed. These are all decisions to be made by humans. The development of AI related to facial recognition is a great case in point, illuminating how decisions about the collection of underlying data and the use of that data will affect our views on the cultivation of appropriate norms of conduct, and the construction of guardrails to ensure compliance with normative decisions. The existence, or lack thereof, of appropriate safeguards may well be determinative of whether certain technologies should be developed. Facial recognition company, Clearview AI, was recently in the news both for the fact that it scraped Facebook and other sites for images to populate its database, and for the likelihood that the resulting information would reinforce existing injustice, to say nothing of ending privacy as brilliantly observed by Kashmir Hill in her NY Times piece entitled: “The Secretive Company That Might End Privacy As We Know It.”

The Clearview AI example highlights a number of issues that should be kept in mind as the Copyright Office continues its examination of the legal principles that should undergird the development of AI. The most obvious one — although somewhat out of the direct purview of the Copyright Office, is to bear in mind that the aggregation of data, in and of itself, is not necessarily beneficial and something to be encouraged. The related, but more direct implication for the Copyright Office, is to resist the involuntary ingestion of data in order to facilitate the building of databases so as to facilitate the analysis of such data known as AI. There was widespread condemnation of the scraping of images used in the Clearview AI dataset, including from many who are not generally perceived as fans of copyright. And of course, there has been an ongoing discussion about whether it might make sense to establish some form of property right in data which would address the uncompensated non-permissioned use of data to build systems necessary for AI. But of course, private ownership of “data” in the sense of original expression already exists in the form of copyright. Most observers intuitively understand that private or personal images (even when shared on public facing social media sites) should not be harvested without consent. I agree. But at the same time, a variety of organizations and companies simultaneously press for the adoption of rules that would permit the uncompensated, unpermissioned use of expressive works protected by copyright in order to build databases to facilitate the development of AI. These arguments take various forms: that consent would impede innovation and social progress; and that consent would interfere with our ability to compete with a China that is unconstrained by observance of copyright.

Let’s start with the most philosophical. Let’s assume — just for the sake of argument, that securing consent would complicate the building of databases and therefore slow innovation. Here’s where we confront the issue of values. Do we want to build a society which champions lack of consent as a virtue? Where consent must be foregone to achieve progress? Where personal (and state sovereignty) are decried as a rebuke of modernity? I for one don’t want that world. Let’s eschew a race towards dehumanization and the erosion of free will. We don’t need to chase China down the rabbit hole of technology that offends our values. (Side note: this is not intended to suggest that China is in fact embracing such developments — only to respond to those that suggest we need to avoid complexities like consent in order to compete with China.) Let’s ensure that our technologies reflect our values, or we risk building a world that we don’t care to inhabit.

Most importantly, remember that we are constructing the rules for an emerging digital society. If those rules fail to give effect to core values of personhood and choice, we will be facilitating the cultivation of a dehumanized and dehumanizing society. To what end?