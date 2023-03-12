 GET LIVE MARKET QUOTES & NEWS
What is Visual ChatGPT, and what does it do?

Nivash Jeevanandam
Mar 12, 2023 / 11:50 AM IST

Microsoft Research's Visual ChatGPT uses different Visual Foundation Models to let users interact with ChatGPT. This connection enables users to send messages through chat and receive images during the chat. It also allows them to edit the images by adding a series of visual model prompts.

Microsoft's latest AI chatbot version can generate videos from basic text prompts. (Illustration by Suneesh K.)

Microsoft has just introduced a new model named Visual ChatGPT, which combines visual foundation models (VFMs) such as Transformers, ControlNet, and Stable Diffusion with ChatGPT. In addition, the system enables ChatGPT interaction beyond language.

How does it work?

ChatGPT draws interdisciplinary interest because it provides a language interface with extraordinary conversational competence and reasoning abilities across various fields. However, ChatGPT is currently incapable of processing or producing images from the visual environment due to its linguistic training. On the other hand, visual foundation models, such as Visual Transformers or Stable Diffusion, are only adept at specialised tasks with one-round fixed inputs and outputs. However, demonstrate excellent visual comprehension and generating capabilities.

To this end, Microsoft researchers have developed a system called Visual ChatGPT, which incorporates many visual foundation models and enables users to interact with ChatGPT using graphical user interfaces. It is capable of: