The state has launched a pilot program with health-tech startup Doctronic that allows an AI system to handle routine prescription renewals for patients with chronic conditions
I’m generally more positive on use cases for AI/ML than some on these forums but this seems like a very bad use case for LLMs with significant questions on accountability. The current approach (including the UK at least) is not perfect but this seems the wrong area to be using this technology.
Toward the Autonomous AI Doctor: Quantitative Benchmarking of an Autonomous Agentic AI Versus Board-Certified Clinicians in a Real World Setting
Hayat, Hashim; Kudrautsau, Maksim; Makarov, Evgeniy; Melnichenko, Vlad; Tsykunou, Tim; Varaksin, Piotr; Pavelle, Matt; Oskowitz, Adam Z.
Abstract
Background
Globally we face a projected shortage of 11 million healthcare practitioners by 2030, and administrative burden consumes 50% of clinical time. Artificial intelligence (AI) has the potential to help alleviate these problems. However, no end-to-end autonomous large language model (LLM)-based AI system has been rigorously evaluated in real-world clinical practice. In this study, we evaluated whether a multi-agent LLM-based AI framework can function autonomously as an AI doctor in a virtual urgent care setting.
Methods
We retrospectively compared the performance of the multi-agent AI system Doctronic and board-certified clinicians across 500 consecutive urgent-care telehealth encounters. The primary end points: diagnostic concordance, treatment plan consistency, and safety metrics, were assessed by blinded LLM-based adjudication and expert human review.
Results
The top diagnosis of Doctronic and clinician matched in 81% of cases, and the treatment plan aligned in 99.2% of cases. No clinical hallucinations occurred (e.g., diagnosis or treatment not supported by clinical findings). In an expert review of discordant cases, AI performance was superior in 36.1%, and human performance was superior in 9.3%; the diagnoses were equivalent in the remaining cases.
Conclusions
In this first large-scale validation of an autonomous AI doctor, we demonstrated strong diagnostic and treatment plan concordance with human clinicians. These findings indicate that multi-agent AI systems can achieve comparable clinical decision-making to human providers and offer a potential solution to healthcare workforce shortages. Abstract Figure
They claim good performance. And if these results are confirmed (there has been no peer review) then performance does look good. And prescription renewals is one thing but this seems like a test and the goal is much wider adoption of the approach and their technology.
The top diagnosis of Doctronic and clinician matched in 81% of cases, and the treatment plan aligned in 99.2% of cases. No clinical hallucinations occurred (e.g., diagnosis or treatment not supported by clinical findings). In an expert review of discordant cases, AI performance was superior in 36.1%, and human performance was superior in 9.3%; the diagnoses were equivalent in the remaining cases.
My concern is about the corner cases and most importantly accountability. If something does go wrong (which it will, it always does with any system) there needs to be clear lines of responsibility with accountability and transparency. This is not possible with an LLM. Regardless of performance this will create a negative backlash.
Implementation of these tools should be on where they are best suited, risks are low and where people people are comfortable or can be made comfortable, not on where the opportunity to make some money is.
All automated tools like this should also be tested in cases where humans are very biased - how does it handle refilling pain meds for chronic pain patients?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.