Moneycontrol PRO
HomeTechnologyDeepSeek introduces sparse attention model to slash API costs

DeepSeek introduces sparse attention model to slash API costs

DeepSeek has launched the V3.2-exp model, introducing Sparse Attention to cut inference costs in long-context tasks by nearly half, with open access on Hugging Face for testing.

September 30, 2025 / 14:52 IST
Deepseek AI

DeepSeek has unveiled an experimental model named V3.2-exp, designed to lower inference costs significantly in long-context operations. The company announced the release on Hugging Face and published a linked research paper on GitHub.

At the core of the new model is DeepSeek Sparse Attention, a mechanism built around two key systems. First, a “lightning indexer” identifies relevant excerpts from the broader context window. Then, a “fine-grained token selection system” narrows down tokens within those excerpts to fit into the limited attention window. This combination allows the model to process long stretches of text with reduced computational load.

Early tests suggest that the approach can cut the price of API calls by nearly half when handling long-context tasks. While more independent evaluations will be needed, the open-weight release on Hugging Face means researchers and developers will quickly put it to the test.

This development comes as part of a wider effort to address inference costs, which are distinct from training expenses and relate to the server resources needed to run a pre-trained model. DeepSeek’s work shows that improvements to the transformer architecture are still possible, even in areas many thought had plateaued.

The China-based company has been an unconventional player in the AI race, previously making headlines with its R1 model trained primarily with reinforcement learning at a fraction of the cost of U.S. counterparts. However, R1 did not lead to the sweeping changes some expected, and DeepSeek has kept a lower profile since.

While the new sparse attention method may not spark the same debate, it highlights practical ways to make AI models more efficient and could influence how American providers approach inference cost reduction.

 

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Ayush Mukherjee
first published: Sep 30, 2025 02:51 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347