Searchable Database Containing Book Titles Used to Train Artificial Intelligence

Author
The Writers' Union of Canada
Body

 

This week, the US publication The Atlantic published a searchable database containing the titles of books used to train a number of US-based artificial intelligence projects. Many of those titles are by Canadian authors. You can see this database, and search for your own works in it here.

Permission to use these books as AI training material was not sought from the rightsholders and, in fact, it seems increasingly likely that many of these works were accessed from pirate websites. This is an outcome The Writers’ Union of Canada (TWUC) and our international colleague organizations have been warning governments, industry, and regulators about for more than a decade. It is the inevitable result of the systematic weakening of copyright laws and regulatory structures, fueled by powerful high-tech lobbying, since the beginning of the digital age.

When the world’s writing organizations stood against the Google Books project, the HathiTrust, and most recently the Internet Archive’s so-called “controlled-digital-lending” project, the potential use of large unpermitted datasets for machine-learning and artificial intelligence training was always part of our complaint. For the most part, sadly, governments and regulators downplayed or even ignored author concerns.

And here we are.

Our US counterparts at the Authors Guild have launched a class action lawsuit (along with several high-profile US authors) against the San Francisco-headquartered company OpenAI. As most AI development companies operate out of the United States, the initial use of the courts to rein in this damaging and illegal practice must take place there. It is not yet clear if the dataset published by The Atlantic was part of the training protocols at OpenAI. Details like that will be revealed as the case moves forward.

To be clear, this is a US-based lawsuit involving US authors. Canadian authors cannot financially benefit from any finding in this case as it has been presented to the court. However, the benefit to Canadian authors is clear — stop this practice at its source, and all authors will be better protected. Whether there is also recourse to Canadian courts on this issue is something TWUC is closely studying.

We have also developed new wording for our Model Trade Book Contract aimed at reserving the author’s right to deny permission for AI training and the use of AI in publication. That wording will be presented to members by National Council soon.

TWUC is in close contact with the Authors Guild, and stands ready to support their case with international witness testimony through amicus filings. We will keep members informed as this case advances through the courts.

See the Union’s advocacy page on artificial intelligence for more background on this issue.

As well, both the US Authors Guild, and the UK’s Society of Authors have recently published advice to authors who find their work in AI databases. See these excellent resources at the links above.

DATE: September 28, 2023