views
The gloves are off. The New York Times, one of the most respected publications in the US with a global footprint, has sued ChatGPT-maker OpenAI and its partner Microsoft, accusing them of multiple counts of copyright infringement.
The Times becomes the first major publication to open a legal front against ChatGPT, which has taken the world by storm since its debut late last year — and has also triggered existential questions around the future of workplaces and certain industries, including the media.
The outcome of the case, filed in a US court, could shape how established publications and artificial intelligence tools might co-exist at a time when the media across the world is feeling the revenue squeeze in a cut-throat digital age.
What Does The Lawsuit Say?
The Times argues that:
• Independent journalism is “increasingly rare and valuable”, and The Times has given the world “deeply reported, expert, independent journalism” over the years
• Defendants’ use of “unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service”
• Using the valuable intellectual property of others without paying “has been extremely lucrative for defendants”
• The US “Constitution and the Copyright Act recognise the critical importance of giving creators exclusive rights over their works”, but the defendants “have refused to recognise this protection”
• The law does not permit the kind of systematic and competitive infringement that defendants have committed
What Is At The Heart Of The Dispute?
Original content created by news publishers (The Times in this case) and its copyright violation by way of free dissemination through AI chatbots (ChatGPT) are the major bones of contention in this raging debate.
The most critical question at this juncture is: how exactly has ChatGPT threatened The Times’s journalism? The answer lies in how AI chatbots that are examples of what is known as Large Language Models (LLMs) acquire their knowledge.
By now, we know the amazing things ChatGPT can do — help us write and edit, create a fairytale from scratch, do Maths and codes, and even translate text from one language to another. And to be able do that, bots such as ChatGPT need a lot of training involving huge amounts of data.
While critics say AI-makers have not been very transparent about where the data comes from, chatbots in question process this data through a complex neural network. For example, transformers are a popular neural network architecture that can read volumes of text and find patterns, thereby growing intelligent with more and more training and acquiring the ability to strike human-like conversations.
The Times’s lawsuit objects to the use of its data, read journalism, for free for the training of ChatGPT. It alleges: Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.
The Times website is behind a paywall – and users must pay subscription fees to access its online offering. The fear is that if the same content is available free through AI chatbots, The Times will lose subscribers – and, in turn, revenues. If this becomes a wider pattern, the already-struggling media industry could face massive consequences.
What Does OpenAI Say?
OpenAI’s decision not to contest the lawsuit is equally significant. In fact, the company has been trying to settle the case out of the court.
The truth is AI cannot sustain and improve without the data it is being trained on, and that’s where the copyright question comes into play.
For its part, OpenAI stresses that ChatGPT is developed using:
• Information that is publicly available on the internet
• Information that they licence from third parties
• Information that their users or human trainers provide
On its website, OpenAI says ChatGPT does not copy or store training information in a database, and goes on to add:
“…we only use publicly available information that is freely and openly available on the Internet – for example, we do not seek information behind paywalls or from the “dark web.” We apply filters and remove information that we do not want our models to learn from or output, such as hate speech, adult content, sites that primarily aggregate personal information, and spam. We then use the information to teach our models”
What Is The Counterargument?
The Times lawsuit cites several conversations with ChatGPT to counter this. In one of these, it shows – using a screenshot – how ChatGPT allegedly quoted “part of the 2012 Pulitzer Prize-winning New York Times article” Snow Fall: The Avalanche at Tunnel Creek. The Times says it was “generated in response to a prompt” by a user who complained of not being able to access the article behind the paywall.
The Times also cited Microsoft’s Bing search index, “which copies and categorises The Times’s online content, to generate responses that contain verbatim excerpts and detailed summaries of Times articles that are significantly longer and more detailed than those returned by traditional search engines”. Microsoft is a major backer of OpenAI with a substantial investment in the AI start-up.
The Times has sought damages and requested that the defendants be directed to stop using its content. “This action seeks to hold them responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works,” The Times says, without specifying the amount.
The Times alleges that the defendants seek to free-ride on its “massive investment in its journalism by using it to build substitutive products without permission or payment” and that “Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone”.
Microsoft has yet to react to the lawsuit. OpenAI says it respects “the rights of content creators and owners and are committed to working with them to ensure they benefit from A.I. technology and new revenue models”.
Have There Been Similar Cases?
The Times lawsuit is an important episode in a fast-evolving chapter, with new ethical and operational questions surrounding the use of AI tools coming to the fore frequently.
In September, A Game of Thrones author George R.R. Martin and other noted writers sued OpenAI for copyright infringement. The very next month, Universal Music Group and other music publishers sued AI firm Anthropic for distributing what they said were copyrighted lyrics.
On the other end of the spectrum is the Associated Press news agency, which has struck a deal with OpenAI “for the artificial intelligence company to license AP’s archive of news stories”.
And then there is Axel Springer, the publisher of Business Insider and Politico that has tied up with OpenAI. The German media group will be paid for allowing ChatGPT to summarise its articles in responses by ChatGPT.
Good or bad, AI’s impact on journalism and the media industry is unmistakable — and changes are inevitable.
“It’s easy to reflexively dismiss legal filings as an inevitable marker of a tech boom — if there’s hype and money, lawyers are going to follow. But there are genuinely interesting questions at play here — about the nature of intellectual property and the pros and cons of driving full speed into a new tech landscape before anyone knows the rules of the road. Yes, generative AI now seems inevitable. These fights could shape how we use it and how it affects business and culture,” says an article on the Vox website.
Comments
0 comment