Earlier this year, India opened discussion into a contentious policy issue: How to treat copyrighted material used to train artificial intelligence systems.
Now, a recently-released working paper from the Department for Promotion of Industry and Internal Trade (DPIIT) has triggered wide debate among creators, publishers and global technology companies.
Here's a deep look into what the policy paper says, what critics are saying regarding its concerns and so on.
Why is this debate happening now?
Artificial Intelligence models learn by processing enormous amounts of data which includes books, music, articles, images, code, videos, news reports, social media scrolls and so on. Much of this material is copyrighted.
Right now, India has no dedicated policy on whether AI developers can use copyrighted works for training.
This has often led creators to complain that their work was being scraped by AI models without their consent or payment.
This is the problem the committee was asked to solve.
What is the proposed model?
The working paper proposes a "hybrid model" -- a mix of statutory licensing, a blanket permission structure and centralised royalty collection.
At its core, the model has three pillars --
One blanket licence for AI developers: AI companies would get a single, nationwide licence allowing them to use all "lawfully accessed" copyrighted material for training their models.
This means:
- If content is free on the open web, it counts as lawful access.
- If content sits behind paywalls or under technological protection measures (TPMs), developers can use it only if they have paid for access or followed the required permissions.
CRCAT would receive payments from AI firms and distribute them to:
- Existing copyright societies
- New collective management organisations (CMOs) that creators may form
- Individual creators who are neither, as long as they register for royalty receipt
The paper also recommends a retroactive obligation-- meaning even if the model was trained earlier, compensation would kick in once the model is commercialised after implementation of the law.
How will revenue sharing and compensation actually work?
Step 1: AI developers file a basic disclosure: Firstly, AI developers will have to submit a short form to CRCAT listing the broad categories of material used -- text, images, audio, video and the rough proportions.
Step 2: Revenue share is fixed by a government-appointed committee: Although, the exact percentage is not decided yet. But the rate will be --
- Based on global revenue
- Open to judicial review
- Common for all
Step 4: CMOs distribute royalties to individual creators: They can do this by pro-rata distribution, analogy-based "value assessments".
Creators who are not members of any society can register specifically to receive their share. CRCAT will hold undistributed royalties for three years. If no CMO or claimant emerges, the funds will move to a welfare pool for that sector.
Why are many technology firms opposing this?
While creators and some large publishers support the idea of guaranteed compensation, nearly every major global tech company has flagged serious concerns.
A major global technology company warned in its analysis that the hybrid model could "kill India’s AI start-up ecosystem", arguing that developers may struggle with revenue-sharing obligations, audits and retroactive liabilities.
Companies say the government is underestimating:
- Compliance costs
- Legal exposure
- The complexity of tracking claims across millions of copyrighted works
Many countries allow text and data mining (TDM) exceptions, permitting AI training without licences as long as the data is lawfully accessed, these companies argue.
NASSCOM, in its dissent note within the working paper itself, stressed that the Indian IT industry "needs a TDM exception" to remain competitive.
Business Software Alliance (representing Adobe, AWS, IBM, Microsoft, Oracle and others) has also urged India to adopt such an exception, saying AI training extracts "non-copyrightable information — probabilities, relationships, and patterns."
The working paper states that creators cannot withhold their works from AI training once they are lawfully accessed.
Tech companies argue this:
- Removes creators’ freedom to negotiate
- Forces uniform licensing even when licensors and licensees prefer bespoke deals
- Contradicts global voluntary licensing approaches
Under the proposal, if a creator claims their work was used in training, the AI developer must prove it was not.
Tech firms say this is technically impossible because:
- AI models convert data into mathematical patterns
- Raw training data cannot always be reconstructed
- Open-source models often co-mix millions of sources
The working paper is out for public consultation.
A second paper will follow in two months, examining whether AI-generated content should itself receive copyright and who should be considered its author.
The proposed model will eventually require amendments to India’s Copyright Act, meaning Parliament will ultimately decide whether India follows this hybrid path or moves toward a lighter-touch regime like Japan and Singapore.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!