Playing catch-up on copyright
Generative AI technology has raced ahead of the law. In response, news outlets have filed a lawsuit to protect the content they produce, shedding light on a legislative grey area in the process
In a case that could help the legal system catch up with a technology racing ahead of the law, six Canadian media companies have filed a joint lawsuit against OpenAI, the creator of ChatGPT, for copyright infringement.
The outlets, including the CBC/Radio-Canada, the Canadian Press, the Globe and Mail, Postmedia, the Toronto Star and Metroland, filed their statement of claim in the Ontario Superior Court of Justice last week.
It alleges that OpenAI has taken significant portions of news content for its training data without regard for copyright or terms of use, with consequences for the outlets involved in the suit and benefits for OpenAI, which is currently valued at roughly $157 billion.
“OpenAI has capitalized on the commercial success of its GPT models, building a suite of GPT-based products and services, and raising significant capital—all without obtaining a valid licence from any of the news media companies,” Sana Halwani, partner at Lenczner Slaght LLP and lead counsel on the claim said via email.
Canadian outlets are seeking punitive damages and any profits derived from the use of the copyrighted work.
“[The outlets] simply want to be compensated for what OpenAI has taken and to stop OpenAI from continuing to take their content without authorization,” Halwani added.
The case is the latest development in a string of lawsuits as media outlets and other content providers in Canada and beyond attempt to navigate the uncharted waters of generative AI.
Currently, Canadian copyright law does not explicitly address artificial intelligence. Publishers and creators have repeatedly criticized the unauthorized use of their content to train generative AI systems and urged the government to act. While there have been public consultations about whether the Copyright Act should be amended to address it, no changes have been made.
That doesn’t mean the legislation isn’t relevant to this lawsuit, says Jay Kerr-Wilson, a partner at Fasken in Ottawa and leader of the firm’s copyright practice group. Beyond the right of exclusive reproduction, relevant parts of the legislation include the provisions on temporary reproductions for technological purposes, as well as unauthorized copying under fair dealing for specific purposes.
In some cases, using content for training data can violate copyright if the output is a direct copy of some of the content used in the training, Kerr-Wilson says. This type of copying was cited in the New York Times lawsuit against OpenAI filed in late 2023. That complaint included examples where the chatbot produced near-verbatim excerpts of the newspaper’s stories (OpenAI responded with a statement calling this ‘regurgitation’ and said it was working on it).
Kerr-Wilson says it gets trickier in situations where the output was heavily influenced by the content used in training but isn’t an exact copy. This appears to be the case in the Canadian lawsuit, as the claim doesn’t include examples of verbatim copying. It asserts that news content was used in training data, though it says the full particulars are within OpenAI’s knowledge, not the news outlets.
In a statement provided to CBA National, OpenAI said its models are trained on publicly available data and “grounded in fair use and related international copyright principles that are fair for creators and support innovation.”
Michael Duboff, an entertainment lawyer with Edwards Creative Law, says people sometimes equate fair use (which applies in the US) with fair dealing. The fact that this is a Canadian lawsuit is significant because fair dealing is a much more restrictive principle than fair use. In the US, the list of allowances is illustrative, but the Canadian counterpart—which includes research, education, criticism, and news reporting—is exhaustive.
“It's a much more rigid system where the use has to fall into one of these defined categories to not constitute copyright infringement,” he says.
“That ultimately may have a large part to play in how this claim proceeds, and if it does go to judgment, how the judgment ends up lying.”
That said, what counts as fair dealing in Canada is unclear. Some legal experts have argued that OpenAI’s use of content to turn a profit eventually excludes it from fair dealing. Duboff notes that one factor in determining fair dealing is how detrimental that use is to the original work.
“That would be a pretty fundamental aspect in these cases because OpenAI and ChatGPT are for-profit,” he says.
However, just because a company is profiting doesn’t necessarily mean that making a copy — for instance, for research purposes — is disallowed, says Robert Diab, a law professor at Thompson Rivers University who writes about technology.
“To decide whether certain conduct falls within one of the statutory fair dealing exceptions, the court employs a set of factors.”
He points to a 2004 case brought against the Law Society of Upper Canada by legal publishers. At issue was the law society photocopying case books at the request of firms. Even though the firms’ purposes were commercial, the Supreme Court found the copying was a kind of research that counted as fair dealing.
Fundamentally, the court will have to decide two questions, Diab says. First, whether the scraping of data constitutes copying — versus reading, the way a person might — and second, whether that copying falls under fair dealing.
“If I visit your website and cut and paste its content to train my language model, am I engaging in unauthorized use? This is not a question courts anywhere in the common law world have dealt with directly,” he says.
Undepinning this is the murkiness of large language models themselves. When even their creators don’t fully understand how they work, determining what degree of influence is legally defensible and how informed work can be by other content before it becomes a copy will be difficult, Kerr-Wilson says.
“AI is going to challenge a lot of the notions of copyright.”
Diab says news outlets are facing an uphill battle in this realm. Given the financial pressure they are under, there’s an incentive to settle, as outlets in the US have done.
“They would have to prove their material was scraped in the process of creating the various language models and…how much of the profits that OpenAI has made can be attributed to that. That’s very complicated.”
Despite this, there’s a good reason for news outlets to pursue a judgment. Their case is just one of several underway at the moment. In addition to the Times’ suit, others have been brought by visual artists, publishers, and music labels. There’s also a test case seeking the Federal Court of Canada’s determination as to whether AI can be considered an author under Canadian copyright law. South of the border, courts have already ruled that it can’t.
Without a judgment in these cases, Duboff says we’re left in a grey area, without a clear answer for how copyright law applies in each jurisdiction. Technology has moved well beyond the laws meant to protect news outlets and other human sources of art and information.
“[These lawsuits] are the mechanism for how the law catches up.”