DeepSeek introduces sparse attention model to slash API costs

DeepSeek has launched the V3.2-exp model, introducing Sparse Attention to cut inference costs in long-context tasks by nearly half, with open access on Hugging Face for testing.

Ayush Mukherjee

September 30, 2025 / 14:52 IST

Deepseek AI

DeepSeek has unveiled an experimental model named V3.2-exp, designed to lower inference costs significantly in long-context operations. The company announced the release on Hugging Face and published a linked research paper on GitHub.

At the core of the new model is DeepSeek Sparse Attention, a mechanism built around two key systems. First, a “lightning indexer” identifies relevant excerpts from the broader context window. Then, a “fine-grained token selection system” narrows down tokens within those excerpts to fit into the limited attention window. This combination allows the model to process long stretches of text with reduced computational load.

Early tests suggest that the approach can cut the price of API calls by nearly half when handling long-context tasks. While more independent evaluations will be needed, the open-weight release on Hugging Face means researchers and developers will quickly put it to the test.

This development comes as part of a wider effort to address inference costs, which are distinct from training expenses and relate to the server resources needed to run a pre-trained model. DeepSeek’s work shows that improvements to the transformer architecture are still possible, even in areas many thought had plateaued.

The China-based company has been an unconventional player in the AI race, previously making headlines with its R1 model trained primarily with reinforcement learning at a fraction of the cost of U.S. counterparts. However, R1 did not lead to the sweeping changes some expected, and DeepSeek has kept a lower profile since.

While the new sparse attention method may not spark the same debate, it highlights practical ways to make AI models more efficient and could influence how American providers approach inference cost reduction.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Ayush Mukherjee

first published: Sep 30, 2025 02:51 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Watch

Watch more

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Email address *
Subscribe

DeepSeek introduces sparse attention model to slash API costs

Watch

Subscribe to Tech Newsletters

Trending news