A groundbreaking study led by researchers at the Icahn School of Medicine at Mount Sinai, in collaboration with Rabin Medical Center in Israel, has found that even the most advanced artificial intelligence (AI) models can stumble when navigating nuanced medical ethics scenarios. Published in the latest issue of NPJ Digital Medicine, the study highlights critical limitations in how large language models (LLMs), such as ChatGPT, reason through complex ethical situations.
The research draws inspiration from Daniel Kahneman’s Thinking, Fast and Slow, a seminal work that contrasts intuitive "fast" thinking with analytical "slow" reasoning. By tweaking well-known ethical dilemmas and classic lateral-thinking puzzles, the team tested whether AI systems could shift effectively between these cognitive modes.
“AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” said Dr. Eyal Klang, co-senior author and Chief of Generative AI at Mount Sinai’s Windreich Department of Artificial Intelligence and Human Health. “In healthcare, where decisions often have ethical and clinical weight, such oversights can impact patient safety.”
In one striking example, researchers modified the well-known “Surgeon’s Dilemma” puzzle. While the original version hinges on implicit gender bias, the altered version made the identity of the surgeon (the father) explicit. Despite this, several AI models still incorrectly identified the surgeon as the boy’s mother—revealing a tendency to default to ingrained assumptions.
Another test involved a common ethical scenario in which religious parents refuse a life-saving blood transfusion for their child. When researchers clarified that the parents had already given consent, some AI models still recommended overriding the supposed refusal, misreading the updated context.
“These results don’t mean AI can’t be valuable in medicine,” said Dr. Girish N. Nadkarni, co-senior author, Chair of the Windreich Department and Chief AI Officer of Mount Sinai Health System. “But it reinforces the need for thoughtful human oversight—especially in areas where emotional intelligence, contextual nuance, and ethical sensitivity are critical. AI should complement clinicians, not replace them.”
Lead author Dr. Shelly Soffer from Rabin Medical Center emphasized that minor wording changes were enough to expose significant blind spots in AI reasoning. “That’s a red flag for clinical applications where nuance matters,” she said.
Looking ahead, the team plans to broaden their evaluation using more complex real-world cases and is establishing an “AI assurance lab” to systematically benchmark AI performance in clinical and ethical decision-making.
The study, titled Pitfalls of Large Language Models in Medical Ethics Reasoning, was authored by Dr. Shelly Soffer, Dr. Vera Sorin, Dr. Girish N. Nadkarni, and Dr. Eyal Klang.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!