Is Anthropic’s Claude 3 better than ChatGPT and Gemini? All you need to know

The Google and Amazon backed startup claims its models can beat OpenAI’s GPT-4 and Google’s Gemini, the best in the business till now, on a number of parameters including undergraduate-level knowledge and graduate-level reasoning.

Reshab Shaw

March 05, 2024 / 16:54 IST

Fight between Claude 3- GPT and Gemini (AI)

The company claims that all its Gen AI models have enhanced capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages

The race among artificial intelligence (AI) companies is intensifying, with Anthropic, a startup founded by former OpenAI employees, unveiling its latest and most powerful generative AI (Gen AI) model yet – Claude 3.

The startup, backed by Google and Amazon, claims that its models can outperform OpenAI's GPT-4 and Google's Gemini, the best in the business so far, on several parameters, including graduate-level reasoning, math problem-solving, and others.

Parameters in LLMs represent the accumulated knowledge during the model's training phase. More parameters generally lead to more accurate predictions because the model has access to a greater amount of contextual information.

Claude 3, an improved version of its predecessor Claude 2.1, will be available in three sizes: Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku. These are multimodal Gen AI models, capable of extracting and processing information from photos, graphs, charts, PDFs, and slides.

The lower-end of the Gen AI models -Haiku- is the “fastest” and most “cost-effective” model, available against its competitors, the company claimed. It can read information, with charts and graphs, in less than three seconds.

Anthropic highlights 10 parameters on which Claude 3 surpasses OpenAI’s GPT-4 and GPT-3.5, and Google’s Gemini Ultra and Pro versions, including undergraduate and graduate-level understanding, math problem-solving, elementary school knowledge, multilingual math, coding, reasoning over text, and mixed evaluations.

Anthropic conducted several benchmark tests comparing Claude 3 with GPT-4 and Gemini. Claude 3 Opus scored 86.8 percent on the Massive Multitask Language Understanding (MMLU) test, slightly outperforming GPT-4 (86.4 percent) and Gemini Ultra (83.7 percent). MMLU uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

It's noteworthy that Claude 3 Opus outperformed GPT-4 and Gemini 1.0 Ultra on the Multilingual Math parameter, even though it was tested using 0-shot as opposed to the 8-shot evaluation for the OpenAI and Google models.

Claude 3 Opus achieved a score of 90.7 percent on 0-shot, while GPT-4 and Gemini 1.0 Ultra scored 74.5 percent and 79 percent, respectively, on 8-shot.

Claude 3 benchmarks

How does it score on hallucinations?

Anthropic said Opus demonstrates a twofold improvement in accuracy or correct answers, on challenging open-ended questions when compared to the earlier Claude 2.1. It has shown reduced levels of incorrect answers.

To assess this, the AI company uses a large set of complex and factual questions that targets known weaknesses in current models. It categorises the responses into correct and incorrect answers, also known as hallucinations. The categorisation further extends to admissions of uncertainty, where the model says it does not know the answer, instead of providing incorrect information.

“In addition to producing more trustworthy responses, we will soon enable citations in our Claude 3 models so they can point to precise sentences in reference material to verify their answers,” Anthropic said in its blog post on March 4.

Incorrect refusals Anthropic

What are the use-cases?

Its most intelligent model, Claude 3 Opus, can navigate open-ended prompts and sight unseen scenarios with remarkable fluency and human-like understanding, the company claims.

The potential uses include task automation, research and development, and for making strategies such as advanced analysis of charts & graphs, financials and market trends, and forecasting.

The next best alternative, the Claude 3 Sonnet, can be used for enterprise work at a lower cost compared to its peers. Its use cases involve data processing, sales, and timesaving tasks such as code generation, quality control, etc.

Anthropic said Sonnet is more affordable than other models with similar intelligence.

While the last one, Haiku, can be used for customer interactions, content moderation, and cost-saving tasks such as inventory management.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Reshab Shaw Covers IT and AI

first published: Mar 5, 2024 01:30 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Is Anthropic’s Claude 3 better than ChatGPT and Gemini? All you need to know

The Google and Amazon backed startup claims its models can beat OpenAI’s GPT-4 and Google’s Gemini, the best in the business till now, on a number of parameters including undergraduate-level knowledge and graduate-level reasoning.

Related Stories

Subscribe to Tech Newsletters

Trending news