HomeNewsTechnologyWhat is Visual ChatGPT, and what does it do?

What is Visual ChatGPT, and what does it do?

Microsoft Research's Visual ChatGPT uses different Visual Foundation Models to let users interact with ChatGPT. This connection enables users to send messages through chat and receive images during the chat. It also allows them to edit the images by adding a series of visual model prompts.

March 12, 2023 / 11:50 IST
Story continues below Advertisement
Microsoft's latest AI chatbot version can generate videos from basic text prompts. (Illustration by Suneesh K.)
Microsoft's latest AI chatbot version can generate videos from basic text prompts. (Illustration by Suneesh K.)

Microsoft has just introduced a new model named Visual ChatGPT, which combines visual foundation models (VFMs) such as Transformers, ControlNet, and Stable Diffusion with ChatGPT. In addition, the system enables ChatGPT interaction beyond language.

How does it work?

Story continues below Advertisement

ChatGPT draws interdisciplinary interest because it provides a language interface with extraordinary conversational competence and reasoning abilities across various fields. However, ChatGPT is currently incapable of processing or producing images from the visual environment due to its linguistic training. On the other hand, visual foundation models, such as Visual Transformers or Stable Diffusion, are only adept at specialised tasks with one-round fixed inputs and outputs. However, demonstrate excellent visual comprehension and generating capabilities.

To this end, Microsoft researchers have developed a system called Visual ChatGPT, which incorporates many visual foundation models and enables users to interact with ChatGPT using graphical user interfaces. It is capable of: