It has already been half a year since notable figures such as Elon Musk, Steve Wozniak, and Yoshua Bengio penned an open letter requesting tech companies to cease the development of AI language models that surpass OpenAI's GPT-4.
Of course, that didn't happen.
Instead, the Sam Altman-led company launched several new features that are making waves in the AI space and widening the gap between OpenAI and its competitors.
Also Read | 'Risks to society': Elon Musk, experts, in an open letter urge pause on AI systems
OpenAI's pre-trained transformer model, ChatGPT, now has vision capabilities through GPT-4V (GPT-4 Vision). It can analyse images and other visual content and also supports speech input.
OpenAI also announced DALL-E 3, the third version of its generative AI visual art platform, which now lets users use ChatGPT to create prompts and includes more safety features.
Let's take a look at what these updates are, how users are using them and the issues around them.
What is GPT-4V?
GPT-4V, a new multimodal model from OpenAI, allows users to ask questions about an image and receive text-based answers. This Visual Question Answering (VQA) feature is rolling out as of September 24 and will be available to GPT-4 subscribers on both the iOS app and web interface.
How to use GPT-4V?
GPT-4V requires a $20/month ChatGPT-Plus membership to upload images via the website or app.
Use cases
OpenAI's GPT-4 with Vision is being hailed by some as a critical frontier in AI research and development.
As more users gain access to the new feature, they are sharing examples of how GPT-4 with Vision works. For example, the model can:
Analyse handwriting: GPT-4 with Vision can accurately transcribe handwritten text, even if it is messy or difficult to read. This could be useful for a variety of tasks, such as digitising historical documents.
Create code with a drawing: GPT-4 with Vision can take a napkin drawing of a website design and generate code to implement the design.
Building on the concept of AutoGPT, Matt Shumer, CEO of AI startup HyperWrite, has developed a new system that uses GPT-4V to continually improve code on its own.
The system works by using the output of one run as the prompt for the next, allowing it to refine and iterate on the code until it reaches a satisfactory level.
Teaching assistant: Users can engage in conversations with the chatbot to gain insights into a wide array of subjects. As demonstrated by Mckay Wrigley, GPT-4V can decipher intricate infographics, such as the one illustrating the components of a human cell.
As illustrated in his example, it is capable of providing a concise explanation of the cell's structure, that a student in the ninth-grade can understand.
AI art gets a makeover
OpenAI also unveiled Dall-E 3, the latest iteration of its text-to-image model, which boasts improved accuracy compared to its predecessor, Dall-E 2.
Dall-E 3 is capable of understanding nuances and details, making it easier for users to translate their ideas into images.
Dall-E 3 is in the research preview stage and will become accessible to ChatGPT Plus and Enterprise customers via the API starting in early October.
However, Dall-E 3 is accessible for free through Microsoft Bing, powering the Bing Image Creator tool. With Bing Image Creator, users can describe an image they have in mind, provide additional context such as location or activity, and specify an art style. The tool then generates the image based on these inputs.
Limitations and safeguards
The Bing Image Creator operates under similar limitations as Dall-E 3, which means it cannot generate explicit or violent content. Additionally, requests for images of public figures by name or images in the style of living artists will be declined by Dall-E 3.
“DALL-E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like the generation of public figures and harmful biases related to visual over/under-representation in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation,” OpenAI said.
It's worth noting that all images generated by Bing Image Creator now come with an embedded digital watermark following the Coalition for Content Provenance and Authenticity (C2PA) specification. This watermark contains information about the image's creation time and date and serves to verify that the image was generated by an AI system.
Do these advancements have concerns?
OpenAI has identified several potential risks associated with the use of GPT-4V, including:
Privacy risks: GPT-4V can identify people in images and determine their location, which could have implications for companies' data practices and compliance.
Bias: GPT-4V's image analysis and interpretation could be biased against certain demographic groups.
Safety risks: GPT-4V could provide inaccurate or unreliable medical advice, specific directions for dangerous tasks, or hateful/violent content.
Cybersecurity vulnerabilities: GPT-4V could be used to solve CAPTCHAs or perform multimodal jailbreaks.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
