Apple’s work on lightweight AI continues to impress. The company’s FastVLM (Visual Language Model), first announced a few months ago, can now be tried directly in the browser via Hugging Face. Originally available only through GitHub and designed to run on Apple Silicon Macs, the demo makes it easier than ever to see Apple’s model in action.
FastVLM was built on MLX, Apple’s in-house machine learning framework for Apple Silicon. The model stands out for its efficiency: up to 85 times faster at video captioning and more than three times smaller compared to similar systems. The browser demo uses the lighter FastVLM-0.5B version, which makes it possible to test it without heavy hardware demands.
Loading the model can take a few minutes depending on your device, but once running, it generates accurate, real-time captions. It can describe facial expressions, background details, objects in view, and even respond to tailored prompts like “Describe what you see in one sentence” or “What is the color of my shirt?”
Because it runs locally in the browser, the demo keeps all data on-device and can even function offline, an approach with strong potential for wearables and assistive tech, where speed and privacy are critical.
While the demo showcases the smaller 0.5B model, Apple has also released larger FastVLM variants with 1.5B and 7B parameters. These could deliver even better performance, though running them entirely in-browser would be less practical.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.