HomeNewsTechnologyBeyond Q-Star: PPO, the hidden gem in OpenAI's arsenal

Trending Topics

OpenAI

Oppo Reno 15C Price

ChatGPT 5.2

BGMI Redeem Codes

Snapchat Recap 2025

Beyond Q-Star: PPO, the hidden gem in OpenAI's arsenal

Proximal Policy Optimization, or PPO, is helps train computer models to make decisions in complicated or simulated situations

OpenAI uses PPO in different situations, like teaching computer programmes in simulated environments or getting better at challenging games.

The recent boardroom upheaval at ChatGPT creator OpenAI, which sent ripples through the tech industry and beyond, seems to have stemmed from a letter sent by some researchers before the shock ouster of co-founder and CEO Sam Altman.

The letter raised concerns about a potential AI breakthrough that could pose risk to humanity. The decision to remove Altman, who returned as CEO within five days of sacking, was reportedly influenced by this letter.

Story continues below Advertisement

OpenAI is said to be working on a project called Q* (pronounced Q-Star), which has the capability to solve unfamiliar math problems. It is believed that Q* could mark a significant advancement in OpenAI's pursuit of artificial general intelligence (AGI), which is defined as autonomous systems outperforming humans in economically significant tasks.

Q* was able to solve some math problems given its enormous computational capacity.

Also read: What is Q-Star, the OpenAI software that may have led to Altman's ouster

Story continues below Advertisement

As the world speculates on the impact of this new model on AI (possibly based on Q-learning), Peter Welinder, OpenAI's vice president of product, hinted at another project the company has been working on since 2017.

Related Stories

In a recent post on X, Welinder said, "While everyone is diving into Q-learning, just wait until they learn about PPO."

Story continues below Advertisement

What is PPO?

Proximal Policy Optimization, or PPO, is a method in the world of AI that helps train computer models to make decisions in complicated or simulated situations.

"PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance," according to a research paper by the AI research lab.

PPO is like a personal trainer for computers, especially when they're learning by trial and error. Picture it as a helpful tool to teach machines how to produce text similar to human language.

The term "Proximal" suggests sticking closely to the original style, and "Policy Optimization" is about finding better strategies. By maintaining proximity to the original style, the improvements made by the programme are steady and reliable.

How does it work?

Imagine you're helping a virtual student to write essays. PPO is like the coach that helps this virtual student get better at essay writing step by step.

Instead of making huge changes in one go, PPO encourages small and gradual improvements. This ensures that the virtual student's writing style doesn't drastically change from one essay to the next. It's like refining their skills little by little, without completely changing their writing style.

OpenAI uses PPO in different situations, like teaching computer programmes in simulated environments or getting better at challenging games.

PPO is great at these tasks because it's flexible and can handle situations where a programme needs to learn a series of actions to reach a goal. This makes it useful in areas like robotics and algorithmic trading.

Follow Us On:

instagram

youtube

Trending Topics

Sensex Today IND vs SA T20 Live Score SEBI Board Meeting ICICI AMC IPO Allotment Status Goa Fire Shashi Tharoor H1B Visa Scam Indian Rupee iPhone chips in India Gold Price

News

Business Markets Stocks India News City News Economy Mutual Funds Personal Finance IPO News Startups

Markets

Home Currencies Commodities Pre-Market IPO Global Market Bonds

Personal Finance

Home Loans up to 50 Lakhs Credit Cards Lifetime Free Finance Tracker New Fixed Deposits Fixed Deposit Comparison Fixed Income

Mutual Funds

Home MC 30 Top Ranked Funds ETFs Mutual Fund Screener

Tools

Income Tax Calculator EMI Calculator Retirement Planning Gratuity Calculator Petrol Price in India Diesel Price in India IFSC Code

Community

Network 18 Sites

News18 Firstpost CNBC TV18 News18 Hindi Cricketnext Overdrive

Quick Links

About Us Contact Us Advisory Alert Advertise with Us Disclaimer Privacy Policy Cookie Policy Terms & Conditions Financial Terms (Glossary) Sitemap Investors

Download MC Apps:

Copyright © Network18 Media & Investments Limited. All rights reserved. Reproduction of news articles, photos, videos or any other content in whole or in part in any form or medium without express written permission of moneycontrol.com is prohibited.