Apple’s latest study introduces a fresh take on how generative AI can process and produce language. Traditionally, large language models (LLMs) like ChatGPT rely on an autoregressive approach — they generate content one token at a time, using both the input prompt and the sequence of previously produced words to decide what comes next. While this method ensures coherence, it’s inherently slow because it processes text sequentially.
In contrast, diffusion models take a parallel approach. Instead of predicting one word at a time, they generate multiple tokens simultaneously and then refine them across several steps. This iterative process gradually transforms random noise into coherent text. Building upon that, flow-matching models, a subset of diffusion models, simplify this process even further by learning to generate the final output in one smooth step, eliminating the need for thousands of refinement iterations.
Apple’s new approach, FS-DFM, is designed to combine the best of both worlds: high output quality with significantly faster generation speed. In their paper titled “FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models,” the researchers demonstrate that FS-DFM can create complete passages in just eight refinement rounds while matching the quality of diffusion models that typically require over a thousand steps.
To achieve this, the researchers implemented a three-stage training process. First, they trained FS-DFM to handle different levels of refinement efficiently. Then, a “teacher” model was used to guide the process, helping the system make more precise and stable updates with each iteration. Finally, they fine-tuned how each iteration behaves to reduce overshooting and accelerate convergence toward the final text.
When evaluated against larger diffusion models such as Dream (7 billion parameters) and LLaDA (8 billion parameters), FS-DFM, even in smaller variants with 1.7, 1.3, and 0.17 billion parameters, consistently delivered superior results. It achieved lower perplexity scores, meaning the generated text was more coherent and natural, and maintained stable entropy values, indicating balanced confidence in word choice without veering into repetition or randomness.
The researchers plan to release the FS-DFM code and model checkpoints publicly to encourage further study and reproducibility. For those interested in a deeper technical dive, the full paper is available on arXiv, complete with visual examples showing how the model refines text over successive iterations.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.