AI Chatbots Show Mixed Results in Medical Applications, Study Finds
Recent studies reveal AI chatbots face significant challenges in medical applications, with one study showing ChatGPT Health under-triaged 51.6% of emergency cases. Cancer patients using an AI chatbot experienced 22% withdrawal rates due to usability issues, while medical researchers show cautious adoption with 40.3% reporting AI use in research.
AI-powered chatbots are showing both promise and significant limitations in medical applications, according to recent research examining their use in cancer care, emergency triage, and medical research. Early findings from multiple studies reveal usability challenges, accuracy concerns, and inconsistent performance that raise questions about their readiness for widespread clinical implementation.
In the CAM 2.0 study involving 73 cancer patients undergoing chemoradiotherapy, researchers tested whether digitally enabled continuous activity monitoring combined with AI could streamline symptom monitoring. Patients were randomly assigned to receive either a commercially available activity tracker or the same tracker combined with an AI-powered chatbot called "Penny" that provided support via text messages. Early findings indicate that patients in the intervention group experienced difficulties using the AI-powered chatbot, leading to a significant withdrawal rate of 22%. Some patients requested direct contact with a member of their care team, even when their concerns had already been addressed through the digital triage process. These challenges appear to have affected clinical workflows, introducing additional and unexpected tasks, helping patients navigate the chatbot and verifying the accuracy of flagged alerts.
A separate study published in Nature Medicine tested ChatGPT Health's ability to triage medical cases based on real-life scenarios. Researchers fed 60 medical scenarios to ChatGPT Health and compared its responses with those of three physicians who also reviewed the scenarios. The researchers found that ChatGPT Health "under-triaged" 51.6% of emergency cases, meaning instead of recommending the patient go to the emergency room, the bot recommended seeing a doctor within 24 to 48 hours. The emergencies included a patient with a life-threatening diabetes complication called diabetic ketoacidosis and a patient going into respiratory failure. In cases like impending respiratory failure, the bot seemed to be "waiting for the emergency to become undeniable" before recommending the ER. Emergencies like stroke, with unmistakable symptoms, were correctly triaged 100% of the time.
Compared with the doctors in the study, the bot also over-triaged 64.8% of nonurgent cases, recommending a doctor's appointment when it wasn't necessary. The bot told a patient with a three-day sore throat to see a doctor in 24 to 48 hours, when at-home care was sufficient. In suicidal ideation or self-harm scenarios, the bot's response was inconsistent. When a user expresses suicidal intent, ChatGPT is supposed to refer users to 988, the suicide and crisis hotline. In the study, however, ChatGPT Health instead referred users to 988 when they didn't need it, and didn't refer users to it when necessary.
An international cross-sectional survey published in January 2026 in Cureus assessed the use and perceptions of AI chatbots among 434 medical researchers. Of the participants, 175 (40.3%) reported using AI chatbots in their research. Use varied by country (32.8%-45.9%), and neither gender nor country was significantly associated with use. Older age and more senior roles were associated with lower odds of use, with odds ratios showing ages 41-50 years at 0.32, residents at 0.31, and consultants at 0.17. Awareness strongly predicted use with an odds ratio of 15.53, as did guideline awareness with an odds ratio of 2.47.
The survey concluded that medical researchers have a positive attitude toward using AI chatbots, but ethical and accuracy concerns require further interventions to create systematic, unified rules. While guidelines exist for AI chatbot use in research, acceptance varies among publishers: Springer Nature and Science reject ChatGPT as a coauthor, while many Elsevier journals permit its disclosed use. Studies have shown that ChatGPT produces coherent writing with low plagiarism but faces challenges with accuracy, fabricated references, and ethical concerns.
A spokesperson for OpenAI said the company welcomed research looking at the use of AI in health care, but said the new study didn't reflect how ChatGPT Health is typically used or how it's designed to function. The chatbot is designed for people to ask follow-up questions to give more context in medical situations, rather than give a single response to a medical scenario. ChatGPT Health is available to only a limited number of users, and OpenAI is still working to improve the safety and reliability of the model before the chatbot is made more widely available.
Digital tools offer new opportunities for the early detection and management of treatment-related toxicities during systemic cancer therapy, potentially contributing to improved physical functioning, better quality of life, and fewer hospitalizations. While these technologies can support patients throughout their journey and help clinicians fine-tune care, their integration into real-world settings presents several challenges, including digital inequities and added burden on healthcare professionals.