Apple’s FastVLM video captioning model now runs directly in your browser

Apple’s work on lightweight AI continues to impress. The company’s FastVLM (Visual Language Model), first announced a few months ago, can now be tried directly in the browser via Hugging Face. Originally available only through GitHub and designed to run on Apple Silicon Macs, the demo makes it easier than ever to see Apple’s model in action.

FastVLM was built on MLX, Apple’s in-house machine learning framework for Apple Silicon. The model stands out for its efficiency: up to 85 times faster at video captioning and more than three times smaller compared to similar systems. The browser demo uses the lighter FastVLM-0.5B version, which makes it possible to test it without heavy hardware demands.

Story continues below Advertisement

Remove Ad

Loading the model can take a few minutes depending on your device, but once running, it generates accurate, real-time captions. It can describe facial expressions, background details, objects in view, and even respond to tailored prompts like “Describe what you see in one sentence” or “What is the color of my shirt?”

Because it runs locally in the browser, the demo keeps all data on-device and can even function offline, an approach with strong potential for wearables and assistive tech, where speed and privacy are critical.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Apple’s FastVLM video captioning model now runs directly in your browser

Apple’s FastVLM video captioning model can now be tested in-browser via Hugging Face. The lighter 0.5B version runs locally, processes video in real time, and highlights Apple’s push for fast, private, on-device AI.

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links