Harvard's latest research: Using AI large language models for medical diagnosis, 80% of initial diagnoses are incorrect

robot
Abstract generation in progress

Mars Finance News, April 15 — A research team from Harvard Medical School evaluated the diagnostic capabilities of over 20 advanced AI large language models (LLMs), including ChatGPT, DeepSeek, Gemini, and Claude. The results showed that the error rate for “differential diagnosis” (identifying possible diseases) based on initial patient symptoms and signs was as high as 80%. The study further indicated that after patients provide more test results, large models can reduce the failure rate of “final diagnosis” to around 40%. The researchers stated that this means AI chatbots need comprehensive patient information to make more accurate diagnoses; when patients cannot provide complete health testing data, the results provided by AI are unreliable. The researchers also emphasized, “Artificial intelligence has not yet reached a level where it can make diagnostic decisions for patients without medical professional intervention.” (Yicai)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin