DeepSeek has unveiled an experimental model named V3.2-exp, designed to lower inference costs significantly in long-context operations. The company announced the release on Hugging Face and published a linked research paper on GitHub.
At the core of the new model is DeepSeek Sparse Attention, a mechanism built around two key systems. First, a “lightning indexer” identifies relevant excerpts from the broader context window. Then, a “fine-grained token selection system” narrows down tokens within those excerpts to fit into the limited attention window. This combination allows the model to process long stretches of text with reduced computational load.
Early tests suggest that the approach can cut the price of API calls by nearly half when handling long-context tasks. While more independent evaluations will be needed, the open-weight release on Hugging Face means researchers and developers will quickly put it to the test.
This development comes as part of a wider effort to address inference costs, which are distinct from training expenses and relate to the server resources needed to run a pre-trained model. DeepSeek’s work shows that improvements to the transformer architecture are still possible, even in areas many thought had plateaued.
The China-based company has been an unconventional player in the AI race, previously making headlines with its R1 model trained primarily with reinforcement learning at a fraction of the cost of U.S. counterparts. However, R1 did not lead to the sweeping changes some expected, and DeepSeek has kept a lower profile since.
While the new sparse attention method may not spark the same debate, it highlights practical ways to make AI models more efficient and could influence how American providers approach inference cost reduction.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.