Earlier this year, India opened discussion into a contentious policy issue: How to treat copyrighted material used to train artificial intelligence systems.
Now, a recently-released working paper from the Department for Promotion of Industry and Internal Trade (DPIIT) has triggered wide debate among creators, publishers and global technology companies.
Here's a deep look into what the policy paper says, what critics are saying regarding its concerns and so on.
Why is this debate happening now?Artificial Intelligence models learn by processing enormous amounts of data which includes books, music, articles, images, code, videos, news reports, social media scrolls and so on. Much of this material is copyrighted.
Right now, India has no dedicated policy on whether AI developers can use copyrighted works for training.
This has often led creators to complain that their work was being scraped by AI models without their consent or payment.
This is the problem the committee was asked to solve.
What is the proposed model?The working paper proposes a "hybrid model" -- a mix of statutory licensing, a blanket permission structure and centralised royalty collection.
At its core, the model has three pillars --
One blanket licence for AI developers: AI companies would get a single, nationwide licence allowing them to use all "lawfully accessed" copyrighted material for training their models.
This means:
A new body to collect and distribute royalties: Royalties would be paid into a new entity called the Copyright Remuneration Collective for AI Training (CRCAT).
CRCAT would receive payments from AI firms and distribute them to:
Payments trigger only when an AI tool is commercialised: The government stresses that AI developers do not pay upfront. Payments begin only when the resulting AI product starts earning money.
The paper also recommends a retroactive obligation-- meaning even if the model was trained earlier, compensation would kick in once the model is commercialised after implementation of the law.
How will revenue sharing and compensation actually work?Step 1: AI developers file a basic disclosure: Firstly, AI developers will have to submit a short form to CRCAT listing the broad categories of material used -- text, images, audio, video and the rough proportions.
Step 2: Revenue share is fixed by a government-appointed committee: Although, the exact percentage is not decided yet. But the rate will be --
Step 3: CRCAT allocates royalties to copyright societies and CMOs: If 30 per cent of the model’s training data was text, that portion of revenue goes to the literary CMO or society, and so on.
Step 4: CMOs distribute royalties to individual creators: They can do this by pro-rata distribution, analogy-based "value assessments".
Creators who are not members of any society can register specifically to receive their share. CRCAT will hold undistributed royalties for three years. If no CMO or claimant emerges, the funds will move to a welfare pool for that sector.
Why are many technology firms opposing this?While creators and some large publishers support the idea of guaranteed compensation, nearly every major global tech company has flagged serious concerns.
A major global technology company warned in its analysis that the hybrid model could "kill India’s AI start-up ecosystem", arguing that developers may struggle with revenue-sharing obligations, audits and retroactive liabilities.
Companies say the government is underestimating:
Many countries allow text and data mining (TDM) exceptions, permitting AI training without licences as long as the data is lawfully accessed, these companies argue.
NASSCOM, in its dissent note within the working paper itself, stressed that the Indian IT industry "needs a TDM exception" to remain competitive.
Business Software Alliance (representing Adobe, AWS, IBM, Microsoft, Oracle and others) has also urged India to adopt such an exception, saying AI training extracts "non-copyrightable information — probabilities, relationships, and patterns."
The working paper states that creators cannot withhold their works from AI training once they are lawfully accessed.
Tech companies argue this:
Under the proposal, if a creator claims their work was used in training, the AI developer must prove it was not.
Tech firms say this is technically impossible because:
The working paper is out for public consultation.
A second paper will follow in two months, examining whether AI-generated content should itself receive copyright and who should be considered its author.
The proposed model will eventually require amendments to India’s Copyright Act, meaning Parliament will ultimately decide whether India follows this hybrid path or moves toward a lighter-touch regime like Japan and Singapore.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.