Artificial intelligence in medicine and science

Influence of believed AI involvement on the perception of digital medical advice (2024)
Reis, Moritz; Reis, Florian; Kunde, Wilfried

Large language models offer novel opportunities to seek digital medical advice. While previous research primarily addressed the performance of such artificial intelligence (AI)-based tools, public perception of these advancements received little attention.

In two preregistered studies (n = 2,280), we presented participants with scenarios of patients obtaining medical advice. All participants received identical information, but we manipulated the putative source of this advice (‘AI’, ‘human physician’, ‘human + AI’). ‘AI’-and ‘human + AI’-labeled advice was evaluated as significantly less reliable and less empathetic compared with ‘human’-labeled advice. Moreover, participants indicated lower willingness to follow the advice when AI was believed to be involved in advice generation.

Our findings point toward an anti-AI bias when receiving digital medical advice, even when AI is supposedly supervised by physicians. Given the tremendous potential of AI for medicine, elucidating ways to counteract this bias should be an important objective of future research.

Link | PDF (Nature Medicine) [Open Access]
 
Influence of believed AI involvement on the perception of digital medical advice (2024)
Reis, Moritz; Reis, Florian; Kunde, Wilfried

Large language models offer novel opportunities to seek digital medical advice. While previous research primarily addressed the performance of such artificial intelligence (AI)-based tools, public perception of these advancements received little attention.

In two preregistered studies (n = 2,280), we presented participants with scenarios of patients obtaining medical advice. All participants received identical information, but we manipulated the putative source of this advice (‘AI’, ‘human physician’, ‘human + AI’). ‘AI’-and ‘human + AI’-labeled advice was evaluated as significantly less reliable and less empathetic compared with ‘human’-labeled advice. Moreover, participants indicated lower willingness to follow the advice when AI was believed to be involved in advice generation.

Our findings point toward an anti-AI bias when receiving digital medical advice, even when AI is supposedly supervised by physicians. Given the tremendous potential of AI for medicine, elucidating ways to counteract this bias should be an important objective of future research.

Link | PDF (Nature Medicine) [Open Access]
these results truly don't reflect my experience, the last thing I'd say is that I found human doctors to show more empathy than the simulated empathy of large language models. I also don't think they reflect the experience shared by most ME/POTS/LYME/MCAS, etc sufferers shared on patient groups. The problem of this study, I believe, is that the participating doctors were aware that they were part of a study and that they were being monitored, so their behavior doesn't reflect the typical behavior of a medical practitioner working with the absolute lack of accountability they work in their common practice, and also I'm pretty sure that neglected diseases weren't included as possible diagnosis.
 
Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said - Associated Press

Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human level robustness and accuracy.”

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.

More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients’ consultations with doctors, despite OpenAI’ s warnings that the tool should not be used in “high-risk domains.”

https://apnews.com/article/ai-artif...lth-business-90020cdf5fa16c79ca2e5b6c4c9bbb14
 
:cry: it truly learned from the professionals. It's so beautiful.

Joking aside, yikes on going ahead with it before it's ready. It will be ready soon. Jumping the gun here is a good way to turn the work culture against something that will soon be superior in all cases. Especially if they explicitly say not to use it in such cases.

Health care systems have been essentially hostile towards telemedicine and any means to provide better access. They don't even have patient portals dealing in tickets and case management available in most cases. Even though it's all mature and tried-and-tested in many other industries. But they go right ahead with technology that hardly anyone else uses yet. Very odd people making bizarre choices.
 
Health care systems have been essentially hostile towards telemedicine and any means to provide better access.
I need a retinal exam, and have to travel 300 km for that (and 300 back). I read an article about how teleophthalmology is working so well in my town. It wasn't offered to me, and they say their camera isn't good enough, and referral time is really long. So, despite the glowing reviews, it's really not available. Even with a lesser quality camera, it should be good enough for an expert to judge whether a more detailed exam is necessary. The government would probably save money by setting up some regional adequate-quality cameras, rather than paying for specialists' offices and staff. Yes I'm annoyed about it.
 
As this being the last day of the year, I wanted to share the yearly update for 2024 regarding a software framework I have been using to identify biological targets of high predictive value . The framework uses machine learning, natural language processing and network analysis algorithms to identify hotspots of research related to ME/CFS. Disclaimer : I own a patent for its methodology

To date, no patient organisation decided to fund it or researchers decided to use it. The first email was sent in 2015 to a number of researchers from myself mentioning the liver, endoplasmic reticulum stress and bile acid metabolism disruption (hint : bile acids required for lipid absorption) which have all been identified years later. If @Chris Ponting 's paper is accepted then "Liver Disease" will be yet one row in the table I am sharing below.

Some questions

1) Is this "cherry picking" or not ? The people to ask -and I am grateful for your scepticism- would be @Simon M , @Murph @forestglip @Yann04 and -of course- @ME/CFS Skeptic . How can we evaluate this system as to whether it indeed has been outperforming human researchers?

2) Why has no one decided to evaluate and subsequently use the system despite the fact that we have repeated targets -ever since 2015- that have been later identified ?


It looks as follows. This means that the framework has been able to identify targets earlier by a median of 6 years. If Liver Disease is added the median value of 6 years is not affected. Of course I can provide references for all the information shown below.



Wishing you all a Happy New Year:

Screenshot 2024-12-31 at 13.40.10.png
 
Last edited:
Is this "cherry picking" or not ? The people to ask -and I am grateful for your scepticism- would be @Simon M , @Murph @forestglip @Yann04 and -of course- @ME/CFS Skeptic .
I'm not quite sure how your system works but regarding cherry picking it depends on how many findings it predicts, for example if there are predictions that were not confirmed or perhaps contradicted by subsequent research. I suppose you would have to publish all the predictions beforehand somewhere much like a pre-registration to make it verifiable.

I also think that many findings highlighted in ME/CFS research are false positives, either from random sampling error or from selection bias in the study.
 
I suppose you would have to publish all the predictions beforehand somewhere much like a pre-registration to make it verifiable.

I was also going to say this.

And also, now that the predictions have "come true", maybe present the data in a format that includes all the details. How many total were predicted, the details of your system that found it, and the details of the studies that confirmed it, instead of a simple list. Maybe write something up in a research paper format.
 
Study Warns of Risks from Medical Misinformation in Large Language Models

A study published in Nature is drawing attention to the risks of large language models (LLMs) accidentally spreading medical misinformation. Researchers found that even small amounts of false information in training datasets could lead to harmful outputs that are nearly impossible to distinguish from accurate ones during standard testing.

To address this issue, the research team proposed using biomedical knowledge graphs to verify and flag problematic outputs, emphasizing the need for transparency and oversight in LLM development—particularly for healthcare applications.

The Problem with LLM Training Data
LLMs, like GPT-4 and LLaMA, are trained on massive datasets sourced from the open Internet, where information quality varies widely. While automated filters can catch overtly offensive content, more subtle misinformation often slips through, especially when it appears credible. This makes these models vulnerable to "data poisoning," a tactic where bad actors intentionally introduce false information into training data.
 
trained on massive datasets sourced from the open Internet, where information quality varies widely
This makes these models vulnerable to "data poisoning," a tactic where bad actors intentionally introduce false information into training data.
If only this problem were unique to datasets from the open Internet...

Because the psychobehavioral garbage that is so beloved in medicine is just as much misinformation as the junk out in the fringes of the conspiracy crowds, and just as detached from reality.
 
Last edited by a moderator:
Could forward this to your favorite researcher.

Announcing Trusted Tester access to the AI co-scientist system

We are excited by the early promise of the AI co-scientist system and believe it is important to evaluate its strengths and limitations in science and biomedicine more broadly. To facilitate this responsibly we will be enabling access to the system for research organizations through a Trusted Tester Program. We encourage interested research organizations around the world to consider joining this program here.
 
This is an important moment I believe. This is something I have been working on and it involves the use of Ensemble Learning for Large Language Models (LLMs) which we can call "EnsembleLLMs".

What does this mean? Essentially, a single prompt is passed to many LLMs -for example, to find causal factors for ME/CFS- and each LLM then responds according to its own reasoning process.

The next step involves finding where there are agreements / disagreements between the given answers then novelty detection is applied which can find interesting details that are relevant to the given question (=prompt). In the following snapshot, Grok3 mentioned something that no other LLM mentioned regarding 5 alpha reductase activity. This is considered novelty :

grok35ar.jpeg


In the next step each LLM is asked to perform reality check in the answers given by LLMs. In the example shown above, all other LLMs will be asked as to whether the metabolites mentioned are indeed 5 alpha reductase metabolites.

Needless to say, I strongly believe that very soon we will have answers we have been waiting for decades.
 
In the next step each LLM is asked to perform reality check in the answers given by LLMs. In the example shown above, all other LLMs will be asked as to whether the metabolites given are indeed 5 alpha reductase metabolites.
How do you check the validity of the origins of the findings? How can we know that it isn’t just making up relationships that doesn’t exist?
 
How do you check the validity of the origins of the findings? How can we know that it isn’t just making up relationships that doesn’t exist?

Good question. Let's assume that 4 LLMs named A,B,C,D are given the same prompt. It basically has to do with the rate of agreement. B,C,D will evaluate facts given by A (=metabolite 1 and metabolite 2 are all 5 alpha reductase metabolites). If all 3 LLMs agree that this is indeed true then we have a 100% agreement. We can then filter responses using an agreement rate cutoff (=all concepts discussed with a less than 100% agreement should be evaluated by humans). I am not suggesting that humans need not evaluate responses with a 100% agreement rate..for the time being that is.

These models will get better and better. No turning back.
 
@mariovitali thank you for the example.

I understand the point about comparing responses and responses to responses in order to do some kind of quality assurance. But I’ve got two questions:
  1. How does inter-LLM agreement signify truth? How can it ever signify truth? If my understanding is correct, LLMs simply give you the answer they believe that ‘sounds’ right, they have no way of assessing if it is right.
  2. How do we know that what it’s talking about is relevant? Saying that X=Y does not mean that either have to be relevant for Z. Where did X and Y even come from?
 
Back
Top Bottom