
Earlier this month, ChatGPT Health launched with support for Apple Health and other data sources, promising users deeper insights into their long-term health data. But an early real-world test suggests the tool may not yet be ready for that level of responsibility. A technology columnist from The Washington Post shared his experience after granting ChatGPT Health access to years of Apple Watch data, and the results were troubling.
The columnist, Geoffrey A. Fowler, has worn an Apple Watch daily for nearly a decade. Curious about what years of activity and heart data might reveal, he joined the wait list for ChatGPT Health and allowed it to analyse his Apple Health records. That dataset included roughly 29 million steps and around 6 million heart rate measurements. Fowler then asked ChatGPT to assess his cardiac health.
The verdict was alarming. ChatGPT Health graded his heart health as an F. Alarmed by the assessment, Fowler immediately changed his behaviour and went for a run. He then shared ChatGPT’s analysis with his doctor. The response was blunt. According to his physician, Fowler is at extremely low risk of a heart attack, so low that his insurance would likely refuse to pay for additional cardiac testing. In short, the AI assessment was wrong.
Digging deeper, Fowler found that ChatGPT’s conclusions were based on flawed interpretations of the underlying data. A major factor in the negative grade appeared to be VO2 max, a metric Apple Watch estimates to indicate cardiovascular fitness. However, Apple clearly states that VO2 max values from the watch are estimates meant for tracking trends, not for making definitive clinical judgments. Accurate VO2 max measurement requires specialised lab equipment, a distinction ChatGPT Health did not appear to account for.
Another issue involved changes in resting heart rate data. Fowler noticed that shifts in his historical readings coincided with upgrading to a newer Apple Watch. These were not genuine physiological changes but the result of improved sensors and updated measurement techniques. ChatGPT Health treated these shifts as meaningful health signals, failing to factor in hardware and software changes over time.
Perhaps more concerning was the lack of consistency. When Fowler asked the same question again, ChatGPT Health revised its assessment from an F to a C. Repeating the question multiple times produced wildly different results, with grades swinging between an F and a B. Such variability undermines confidence in a tool that is positioning itself as a serious health companion.
Across multiple conversations, ChatGPT also struggled to retain key personal details. Fowler reported that the system repeatedly forgot his age, gender, and recent vital signs. Even though ChatGPT Health had access to his recent blood test results, those data points were not always included in its analysis. This selective and inconsistent use of available information further distorted the health assessments.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.