Copyrighted Material and AI Training

The most recent and noisy area of AI development is, without a doubt, the field of generative AI.

The term refers to an area of artificial intelligence that focuses on systems that can generate content, such as DALL-E and Midjourney (that generate images), GameGAN (generates game levels), ChatGPT (generates text), Co-Pilot (generates code), and so forth.

On a few occasions, images generated by Midjourney were used to illustrate media publications and even to win an art competition.

Of course, the ability of a machine to generate images that are on par, or sometimes better, than those produced by humans made creators worried beyond public outcry and to the extent of filing a lawsuit.

For example, Sarah Andersen, the author of a reasonably funny comic strip loosely based on her own challenges of having to deal with a reasonably easy life, in January this year has joined forces with a few other authors to sue prominent operators of generative AI.

They claim that their works cannot be used to train AI without explicit permission. In other words, they insist on excluding the training of AI from the concept of fair use.

Fair Use and the Israeli Government

Fair use is a legal doctrine that allows the use of copyrighted material without the owner’s permission under certain circumstances. It is not a right but rather a defense in court: if one is sued by the owner of the rights, the court may decide that the use falls under the category of “fair” and indemnify the defendant.

Fair use may include using a copyrighted work for criticism, comment, news reporting, teaching, scholarship, or research. The fair use defense stands a better chance if the work in question is not a work of art (which isn’t the case here).

An additional criterion for fair use is its effect on the market value of the original piece.

Cue the Israeli government.

With a portion of cutting-edge AI research happening in Israel, the Israeli Ministry of Justice has decided to reduce uncertainty for startups, mature companies, and the government by doing something it rarely does: issuing a proactive legal opinion on the use of copyrighted material (the document is in Hebrew, but there is an English summary).

Its key conclusion is that, according to Israeli law and under the existing copyright doctrines, copyrighted materials can be used to train ML models.

The only exception is if the ML model is trained on the works of a single author with the goal of competing with them in their current market.

However, there’s a catch.

The “fair use” coverage for AI only covers training the model. And it is a potentially serious limitation since if the AI is generative and if an output it produces is close enough to a work by a human author, the owner of the AI can be liable for copyright infringement.

In other words, a company can teach the AI how to paint, but if it generates a painting that is too similar to that of an actual artist, the company can get sued and lose, at least in Israel.

The AI itself cannot possess copyright. The US Copyright Office clearly stated that non-human entities are ineligible for copyright protection (that includes animals, divine entities, and, recently, computer programs).

In other words, a company that trains generative AI will have a lot of training material at its disposal. But they will have to make sure that the outputs are not too similar to the works of existing authors.

To complicate matters more, the AI outputs can themselves be a copy of a licensed work. Take Copilot: it was trained on a vast volume of code in various open-source projects. Some of the code it was trained on is under the BSD license: you can do whatever you want but you have to mention the origins of the code somewhere. At some moment in time, someone will use Copilot to generate a piece of code that will be identical or very close to an original under the BSD license. The generated code will go into someone else’s codebase with no license clauses, violating the license.

TL;DR

In summary, the use of copyrighted material for training machine learning models is legal in Israel as long as the model is not intended to compete with a single author’s work. However, the “fair use” coverage for AI only covers training the model and not the output it produces, which could potentially infringe on existing copyrights. Additionally, AI outputs themselves can be copies of licensed works, which further complicates the legal landscape.

While there is still much legal debate ahead, these are exciting times for the field of generative AI.