
A new study has raised concerns about the reliability of OpenAI’s health-focused chatbot, finding that it sometimes underestimates the severity of medical emergencies.
The research, published last week in the journal Nature Medicine, evaluated the triage abilities of ChatGPT Health, a specialised version of the popular AI chatbot developed by OpenAI. Triage refers to assessing how urgent a medical condition is and deciding what kind of care a patient should seek.
In the study, researchers presented the chatbot with 60 real-world medical scenarios and compared its responses with those of three physicians who assessed the same cases using established medical guidelines.
The results showed that the chatbot often failed to recognise emergencies. According to the researchers, ChatGPT Health “under-triaged” about 51.6% of emergency cases. Instead of advising patients to go to the emergency room, the bot frequently suggested seeing a doctor within the next 24 to 48 hours.
Some of the examples included serious conditions such as diabetic ketoacidosis—a potentially life-threatening complication of diabetes—and cases of impending respiratory failure. In both situations, immediate medical attention is typically required.
“Any doctor would say that the patient needs to go to the emergency department,” said lead study author Ashwin Ramaswamy, who works at Mount Sinai Hospital in New York. He added that the chatbot often appeared to wait until symptoms became unmistakably severe before recommending emergency care.
At the same time, the bot sometimes showed the opposite problem. The study found that it overestimated the urgency of non-serious cases around 64.8% of the time, suggesting a doctor’s visit for conditions that could be managed at home, such as a mild sore throat lasting a few days.
Researchers also tested whether the chatbot’s recommendations changed based on patient demographics by altering factors like race and gender in the scenarios. The study found no significant differences in the results based on these variations.
ChatGPT Health is separate from the standard ChatGPT chatbot and currently has limited access with a waitlist for users. OpenAI says the platform is designed to allow people to upload medical documents securely but emphasises that it is not intended for diagnosis or treatment.
Responding to the study, an OpenAI spokesperson said the findings may not fully reflect how the tool is meant to be used. The company noted that the chatbot is designed for ongoing conversations where users can provide additional context through follow-up questions.
Experts say the findings highlight the need for more rigorous testing before AI tools are used for medical decision-making. John Mafi from UCLA Health, who was not involved in the research, said AI systems should be carefully evaluated in controlled trials before being deployed widely in healthcare.
While chatbots can help answer health-related questions, researchers caution that they should not replace professional medical advice—especially during emergencies.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.