The race among artificial intelligence (AI) companies is intensifying, with Anthropic, a startup founded by former OpenAI employees, unveiling its latest and most powerful generative AI (Gen AI) model yet – Claude 3.
The startup, backed by Google and Amazon, claims that its models can outperform OpenAI's GPT-4 and Google's Gemini, the best in the business so far, on several parameters, including graduate-level reasoning, math problem-solving, and others.
Parameters in LLMs represent the accumulated knowledge during the model's training phase. More parameters generally lead to more accurate predictions because the model has access to a greater amount of contextual information.
Claude 3, an improved version of its predecessor Claude 2.1, will be available in three sizes: Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku. These are multimodal Gen AI models, capable of extracting and processing information from photos, graphs, charts, PDFs, and slides.
The lower-end of the Gen AI models -Haiku- is the “fastest” and most “cost-effective” model, available against its competitors, the company claimed. It can read information, with charts and graphs, in less than three seconds.
The company claims that all its Gen AI models have enhanced capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages such as Spanish, Japanese, and French.
Not just quality, Anthropic claims even the pricing is unmatched. Opus and Sonnet are available to use in claude.ai and the Claude API, available in 159 countries. While Haiku will be available soon, the company said.
Claude 3 Opus is available at $15 input per million tokens. In contrast, GPT-4 costs $30.00 per million tokens.
So, how does it fare against competition?
Anthropic highlights 10 parameters on which Claude 3 surpasses OpenAI’s GPT-4 and GPT-3.5, and Google’s Gemini Ultra and Pro versions, including undergraduate and graduate-level understanding, math problem-solving, elementary school knowledge, multilingual math, coding, reasoning over text, and mixed evaluations.
Anthropic conducted several benchmark tests comparing Claude 3 with GPT-4 and Gemini. Claude 3 Opus scored 86.8 percent on the Massive Multitask Language Understanding (MMLU) test, slightly outperforming GPT-4 (86.4 percent) and Gemini Ultra (83.7 percent). MMLU uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.
It's noteworthy that Claude 3 Opus outperformed GPT-4 and Gemini 1.0 Ultra on the Multilingual Math parameter, even though it was tested using 0-shot as opposed to the 8-shot evaluation for the OpenAI and Google models.
Claude 3 Opus achieved a score of 90.7 percent on 0-shot, while GPT-4 and Gemini 1.0 Ultra scored 74.5 percent and 79 percent, respectively, on 8-shot.

How does it score on hallucinations?
Anthropic said Opus demonstrates a twofold improvement in accuracy or correct answers, on challenging open-ended questions when compared to the earlier Claude 2.1. It has shown reduced levels of incorrect answers.
To assess this, the AI company uses a large set of complex and factual questions that targets known weaknesses in current models. It categorises the responses into correct and incorrect answers, also known as hallucinations. The categorisation further extends to admissions of uncertainty, where the model says it does not know the answer, instead of providing incorrect information.
“In addition to producing more trustworthy responses, we will soon enable citations in our Claude 3 models so they can point to precise sentences in reference material to verify their answers,” Anthropic said in its blog post on March 4.

What are the use-cases?
Its most intelligent model, Claude 3 Opus, can navigate open-ended prompts and sight unseen scenarios with remarkable fluency and human-like understanding, the company claims.
The potential uses include task automation, research and development, and for making strategies such as advanced analysis of charts & graphs, financials and market trends, and forecasting.
The next best alternative, the Claude 3 Sonnet, can be used for enterprise work at a lower cost compared to its peers. Its use cases involve data processing, sales, and timesaving tasks such as code generation, quality control, etc.
Anthropic said Sonnet is more affordable than other models with similar intelligence.
While the last one, Haiku, can be used for customer interactions, content moderation, and cost-saving tasks such as inventory management.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.