ANI vs OpenAI: Copyright, AI, and the future of text data mining

By Shreya Kapoor and Abir Roy

The rapid rise and adoption of artificial intelligence (“AI”) has fundamentally changed how we live and work by making countless volumes of information accessible at our fingertips. Perhaps the most evident use case of AI can be seen in the form of chatbots like ChatGPT, which work on generative AI that is powered through large volumes of data. Generative AI has great potential to help users, but it also raises important questions about the rights of those who own the data used to train the AI.

Story continues below Advertisement

Remove Ad

In this context, the recent legal dispute between Asian News International (“ANI”) and OpenAI i.e., the AI company behind ChatGPT represents a landmark development in the evolving legal landscape. While similar cases have already been initiated in USA, UK etc., this is the first case in India that involves an interpretation of intellectual property rights related to AI.

In this article we will talk about the dispute between ANI and OpenAI and discuss in detail about their issues brought forth.

Text and Data Mining: Copyright Infringement Issue?

Story continues below Advertisement

Remove Ad

To give effective results, generative AI applications require large volumes of data to train the Large Language Model (“LLM”) which generates the results. In cases of chatbots like ChatGPT, this simply means responding to user queries. In general, more and better data would imply better results from the AI system since the system becomes better trained. One method of obtaining such data is through the process of text and data mining (“TDM”) which involves systematic copying and analysis of data to gain insights.

Consequently, issues crop up between generative AI applications and copyright protection at three stages: (i) training the LLM of the AI system with copyrighted data, (ii) whether the work or output generated by AI can be copyrighted and, (iii) the potential copyright infringement by such output. In the ongoing proceedings before the Delhi High Court, ANI’s challenge to OpenAI appears to strike at stages (i) and (iii).

ANI has alleged that OpenAI violates copyright laws by exploiting its archive of articles, interviews, exclusive statements etc. built over five decades. As the owner of the copyright in these works, ANI has the exclusive right to reproduce, exploit and store these works under Section 14 of the Copyright Act, 1957 (“Act”). However, ANI claims OpenAI has infringed its copyright by training its LLM with ANI’s works and reproducing them when a user query is asked. The High Court is now seized of the matter and has framed some preliminary issues for determination.

Story continues below Advertisement

Remove Ad

The primary issue to be determined by the High Court is whether using copyrighted material to train AI models and generate responses to user queries is copyright infringement. This requires an assessment of whether the training, storage and reproduction of copyrighted material for user benefit constitutes copyright infringement or it can be considered fair use.

Use of copyrighted data by AI: Is it ‘fair use’?

‘Fair use’ or ‘fair dealing’ is a legal principle that allows limited use of copyrighted material without permission from the owner of such material.

Under Section 52(1)(a) of the Act, certain uses of copyrighted material are exempt from copyright infringement such as private or personal use including research, its criticism or review and reporting of current affairs.

However, certain issues may arise in applying the fair use principle to generative AI technologies. In this case, OpenAI’s use of data may not be considered as ‘personal’ or ‘private’ use. OpenAI also provides a premium paid version of ChatGPT called ChatGPT Plus. Hence, the commercial nature of ChatGPT Plus complicates the claim of fair use.

This is also where TDM comes into picture. Unlike some other jurisdictions, India does not have any specific law dealing with TDM. The UK for instance, included an exception to copyright which allows TDM for non-commercial research provided there is a lawful access to the concerned work. Lawful access means obtaining the copyrighted material through legitimate or authorized methods.

Hence, it will be interesting to see how this case shapes the discussion on TDM in India from a copyright law perspective.

In general, the more transformative a use is, the more likely it is to be considered as fair use. Copyright protects the expression of ideas, not the facts or ideas themselves. In this context, a question arises whether re-stating information from news reports can be considered as reproduction of factual information.

Further, using copyrighted material for educational purposes is considered as fair use. Thus, the present dispute may also require an examination of whether generating responses using copyrighted material to answer user queries can be considered as an educational purpose.

What is the opt-out model?

An opt-out is a mechanism to allow content creators to prevent their data from being used for training AI by signalling their withdrawal.

The High Court took notice of OpenAI’s submission that it has blocklisted ANI’s domain ‘www.aninews.in’ which will exclude it from future training of OpenAI. OpenAI also highlighted that such opt-out model is available to everyone.

The European Union has clear guidelines on this issue to protect content creators wherein, TDM is not allowed if such rightholders explicitly reserve their rights. For instance, they can use machine-readable means to block their online content from AI training. Simply put, TDM is not allowed if the content owners have ‘opted-out’ i.e., said “no” to extraction of their data by AI tools.

In this regard it is pointed out that the opt-out model raises a question about whose burden is it to prevent copyright infringement through TDM – Is it the AI model which is mining the information or the owner of the copyright whose work is being mined?

Secondly, even with the opt-out option, the data used for training the LLM still remains on the servers and cannot be removed. Hence, question arises whether storage of such data permanently and indefinitely is permissible under the law.

Unjust enrichment and unfair competition

ANI has also made claims under tort law regarding unjust enrichment and unfair competition, alleging that OpenAI has used its content to publish and timely report on current affairs. This not only draws away ANI’s readers but also affects its business model and operations - by using its content (for which it has incurred significant labour) for free and it would be unjust for OpenAI to retain them.

This is where an interpretation of whether ANI’s content constitutes ‘facts’ will come into picture: there can be no copyright on facts which indicates some level of permitted use but that would mean drawing a distinction between different kinds of works such as articles, news reports, interviews etc.

Other relevant factors for consideration

Another concern flagged by ANI was that ChatGPT falsely attributes certain news updates to ANI risks in spreading fake news and damaging ANI’s reputation as a news publisher. This may be a relevant factor while determining what is the harm caused from the alleged copyright infringement.

Together with the innovative push for AI, one must bear in mind that the rights available under various exceptions to copyright infringement must be considered on an equal footing with the rights of the copyright owner.

Possible remedies and their implications

Other developers have opted to train their generative AI models on licensed content such as Adobe with its Firefly model. Further, a similar suit initiated by The New York Times against OpenAI and Microsoft in the USA in December 2023, ended with negotiations on licensing agreements for accessing such content. Hence, such agreements may help to balance the concerns of content owners with the development of generative AI by providing a means to secure remuneration for accessing their content. However, it remains to be seen how effective such negotiations are in practice.

At present, there are no specific Indian laws that govern AI or its use. This case underscores the critical need for industry wide standards and practices to effectively govern the use, development and deployment of generative AI technologies.

Similar to parallel actions initiated in other jurisdictions, no injunction has been granted by the High Court at this stage to restrain OpenAI from continuing its activities. This highlights the novel and complex nature of the dispute warranting detailed scrutiny from the get-go.

In totality, this dispute reflects a consideration of public interest in AI development versus private rights and is likely to set a major precedent governing content use by AI companies.

(The authors, Shreya Kapoor and Abir Roy, are a part of Sarvada Legal.)

Views are personal, and do not represent the stand of this publication.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

ANI vs OpenAI: Copyright, AI, and the future of text data mining

The ongoing legal dispute between ANI and OpenAI raises critical issues around copyright infringement, data mining, and AI's use of copyrighted materials in India, highlighting the tension between content ownership and technological innovation

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links