Artificial Intelligence and Copyright Today | Scholarly Communication and Publishing | University Library

©hat

Artificial Intelligence and Copyright Today

00:00 / 18:49

Transcript:

Speaker 1 (00:05):

Hello and welcome to another edition of Copyright Chat. It’s been a while, things have gotten hectic. Technology is changing, and it seems like artificial intelligence is everywhere. Not only is it everywhere, it seems like everyone’s talking about it. They’re talking about how to use it in their classrooms. They’re talking about how to use it as students. They’re talking about how to use it for research. They’re talking about how to use it for corporate development. It seems like everyone’s talking about artificial intelligence. So today I want to break down a few things about artificial intelligence, at least from a copyright perspective. And you know who else is talking about copyright and artificial intelligence? The United States Copyright Office, they too are talking about copyright and artificial intelligence because there are some of the first who have to deal with it in terms of whether the work that was created using artificial intelligence can be registered or not.

(01:22):

And so I wanted to address some of the copyright considerations, but I also want to note that this is an evolving topic. So what I tell you today may be irrelevant pretty soon, maybe even tomorrow. I doubt it. I doubt that all of a sudden tomorrow we’re going to have a slew of court cases, legislation and copyright office administrative regulations governing ai. But I will say that it is a very, very developing topic and there are no real clear answers on many of the questions, at least for now. But I can at least summarize what I know at this point in time, which hopefully is helpful to you and where I’m looking for different pieces of information about the developing world of copyright and ai. So to begin with, how does generative AI get started? I mean, obviously someone writes a script and then the artificial intelligence can read a bunch of data.

(02:38):

Usually it could be pictures, it could be text, but the key here is that the artificial intelligence has to learn from looking at a bunch of other work. And the question is, is that legal? And the answer is as in many cases, it depends. One of the things that you can think of here is the HTI Trust Digital Library case in the second Circuit. In that case, the HTI Trust was digitizing entire copies of books and using it for text and data mining and not showing all parts of the book. Right? When a researcher wanted to see if a book was relevant to their research, they might put in a search term such as anaphylactic shock, and then see where in the book that term appeared. Let’s say it didn’t appear at all. They would know that this book likely is not relevant to their research.

(03:48):

Or let’s say it appeared on 30 different pages in the book and the term appeared a hundred different unique times that might tell them that this book is highly relevant to their research. And so in that case, the court said that the HTI trust’s use of this type of corpus was a quintessentially transformative use. Why? Because they weren’t producing books for the sake of reading or what we call consumptive use. They were producing these books for the purpose of using them in a new way for research and text and data mining and building on that logic. Many lawyers think, and I agree, although I will give you the disclaimer that you shouldn’t take any of this as legal advice. I would agree that most training of AI on corpus, such as books or pictures, that might be considered a fair use, right? Because they’re not simply trying to read or consume that data.

(05:04):

The entire purpose of generative AI is to try to come up with something new, something different, right? You tell the ai, create a song in the genre of this or create a play about a monkey written in the form of Shakespeare or something of that sort. And so it’s not meant to merely duplicate or just read or just simply copy the work it is trying to learn to create, to do something new that might be all fair and well and good. In the case of HathiTrust, where we were scanning, we, I say because libraries, academic and research libraries we’re scanning physical books. What’s the difference between what the HathiTrust was doing and generative AI today? Well, there are a few differences. One of the key differences is that they were using physical books. And so physical books carry no licensing, right? There’s nothing restricting me from copying besides copyright law. There’s nothing, no contract restricting me from making that copy. The other difference, obviously, is they weren’t trying to train using the corpus of the book. So that’s a leap you have to make from text and data mining uses, which the court in the second circuit at least thinks are quintessentially transformative to generative ai. And whether a court would agree that generative AI also is quintessentially or transformative at all.

(06:46):

The leap that I make about whether it’s transformative, I think most experts agree that they think that the use of generative AI to train on that type of corpus is likely a fair use. The part that we mostly get hung up on is this licensing aspect or contractual restriction aspect. And that’s because you can get rid of all sorts of rights that you have such even fair use with a contract. And we get into contracts all over the place when we’re trying to access books on the web, when we’re trying to access images on the internet, when we’re trying to access library databases. There are a lot of folks who want to do AI work using library databases. However, libraries of course, license their databases on behalf of their patrons, and the patrons too are bound by the terms of that license. If it does not allow large scale scraping, then it’s not going to allow generative AI because you would need to scrape the data.

(08:01):

And so I think there’s a huge caveat here, which is the whole licensing aspect, and we really haven’t seen what the courts are going to say about that necessarily, and how that’s going to play out in different ways. I mean, in many cases courts seem to think that, so-called browse wrap contracts, terms of service, for instance, that are buried on a website are not as enforceable as say, click through licenses where you have to actually click your agreement or ascent. But in any event, these lawsuits are likely to happen. And in fact, one is pending already about chat. GPT fiction authors have claimed that chat, GPT trained using their corpus through illegally uploaded books to pirate websites. And so they’re saying, Hey, this was not a legal thing in the first place. These were illegal copies and violated our copyrights, and then the AI was trained using it.

(09:07):

So we will see there’s a variety of different lawsuits that could come out on the training side, even though, as I said, many folks think that side of the coin may be a fair use in many instances. So when creating generative ai, you have to get a corpus that may or may not be create many copyright issues for you depending on the licensing issues and whether the corpus was illegally uploaded to the internet in the first instance. Those things can create a lot of problems. And then after you’ve created the generative AI process, after your code has read that all of this data and trained, then you have the output, then you have someone, let’s say myself, I try to engage with chat GPT, and I put in a prompt and I want to own copyright own whatever it spits out. So I say, write me a play about a gorilla in the genre of Shakespeare, and that’s all I say.

(10:21):

And then it spits out 20 pages, and I want to own that. And that is where we have some guidance, at least from the US Copyright Office, because they have had to deal with some attempted registrations. And most of these have dealt with art through Midjourney and other AI generators. And what happens is we know, and this is not a novel concept, but we know that copyright law, at least in the United States and pretty much everywhere, requires a human author, right? The human author has to own the copyright. We know that there’s the famous Monkey selfie case where no, the court would not allow a monkey to own their own selfie because they’re an animal, they’re not a human. Can’t own it. Well, the same goes for machine generated or computer generated art.

(11:22):

If the applicant or the registrant when they turn in their registration to the copyright office says, I asked AI to generate this picture, and here it is, then that’s likely to get rejected because there’s not enough originality in the terms of creativity from the human there to create a copyright. That is where we get into, well, is it simply because you used a machine to create it? Is it simply because it’s human creativity but aided by a machine? Well, that answer we’ve had for a very long time. Supreme Court, the Supreme Court pointed that out in the case of Avni when we had a photographer taking a picture of Oscar Wilde, and the debate was, well, can the human own this copyright? Because they were aided by a camera. And of course we know we can own copyrights and photos. But back in the 1880s when the case was decided, that was a question.

(12:41):

And the court said, of course there’s human creativity in the pose. They put Oscar Wilde into in the lighting, the timing of day when they took the photo, all of these different things. And that’s the key question with AI today. How much human creativity went into the arrangement of this AI generated thing? So potentially you could have a thin layer of copyright in some sort of creative arrangement if your prompt is sufficiently creative, or if you have a derivative work from an AI creative work, and you’ve put enough creative choices into it. So there are ways, of course, using AI to still own a copyright. But if you simply say to the copyright office, I put in this really simple prompt and this is what it spat out at me, that’s not going to cut it. And then there’s one more aspect though that has to be considered here, and that is what results that picture or that text needs to be different, right? Because we know that the AI trained on corpus, and if the image, let’s say, that it spits out is too substantially similar to the input to the image that was put in, then we’re still going to face an issue. We’re going to have a problem with a potential copyright infringement claim.

(14:29):

I’m just thinking in my head of somebody who, let’s say I have, I’m a very big fan of Mark Rothko, and I say to the AI generator, show me an exact replica of Mark Rothko’s painting X. Maybe Mark Rothko isn’t the best example as most of his paintings were just blocks of color. But you can get the point. If you pick a modern contemporary artist who does own copyright in their work, their work was inputted into the generator, and you tell it to give you an exact replica, and it does well, that’s problematic, and that’s pretty obvious to see. But you may not even ask for an exact replica. If the AI generator is not doing a good job in creating new things, in learning how to be unique and change different language models, then you just may end up with some copyright infringement on your hands.

(15:37):

So there are three different things to think about with AI and copyright, right? There’s the training is the data, is the corpus, is the stuff you’re training your AI on, is that legal? And the answer there is, it depends, right? Are there contracts at play? If not, maybe it’s a fair use. And then you have the second question, can you own the output or is the output just not copyrightable because it’s computer generated? And in addition to that, is there any copyright infringement? Because whatever is spit out, is it too similar to the corpus it was trained on? So these are the kinds of things that I expect I fully expect to see crop up in court cases. I think we’re going to have a lot more of those about all of these different issues. We’re going to have more guidance, of course, from the copyright office as more and more folks try to register their work, and we’re going to have lawsuits for illegal training of AI data and illegal infringement on the backend.

(16:57):

So all of these things are things to watch for. One of the places that I keep an eye on in terms of trying to stay up to date on these AI issues is the Copyright Office has a dedicated site about artificial intelligence. So I watched that space. They had a recent call for comments from the public, and I expect that they will be putting out a discussion document quite soon. Another place that’s really great to follow is the Georgetown University has an AI database of all of the cases about AI in the law today, and you can filter it by copyright infringement related cases or fair use related cases. And it’s a really nice way to just try to keep up on what types

Speaker 2 (17:48):

of are currently pending. So those are the sources I would recommend to you, my listeners. That’s what I’m trying to keep up on. I hope that this brief introduction was helpful and food for thought and has you thinking about the issues as they develop. Until next time, I hope you stay well. Thanks for listening.

Updated on January 29, 2026 by

News Archive