A new "reasoning" framework for ME/CFS research

Discussion in 'General ME/CFS news' started by mariovitali, May 13, 2025.

  1. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,622
    The problem, which I think @Uitsikt at least once also hinted at, is that the issues you are talking about have not been found in any reliable way that is "In ME/CFS and LongCOVID research we have found issues with Fibrinogen, P-Selectin and Collagen" is probably not a meaningful statement. Unless you of course just mean that studies exist that have found abnormal levels of something in a certain setup without assessing the quality and methodology of studies and the differences in the study setups. Since there have been several thousand of studies on ME/CFS and Long-Covid almost every thing that can be found has been found, but pretty much none of it has been found reliably! If the findings of Fibrinogen, P-Selectin and Collagen would have been a reliable findings they would have been discussed and you can be certain that someone would have connected the dots. Of course it's possible that nobody manged to find the needle in the haystack but the opposite is just as reasonable. Now there are other situations where the data can be considered more reliable (say DecodeME) but the abundance of data makes it harder or impossible to connect the dots, where such approaches almost instantly make sense (but of course so do other classical approaches that don't rely on LLMs).

    If we don't account for the quality of data nobody has to do any of this in the first place because the problem was already claimed to have been solved by people decades ago: "It's in your head".

    Now one may argue that one picks out things that are found different more often than others and looks for a connection there, but it's not hard to see why that needn't be a valuable approach either (for instance if they are affected by deconditioning, if they are more variable by nature, if they are examined more often, if such findings have a higher publication bias etc).

    Could you point towards the exact studies of Fibrinogen, P-Selectin and Collagen you are citing here, if that is possible, and how the negative results of other studies would be explained?
     
    RedFox, Kitty, hotblack and 2 others like this.
  2. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    3,052
    Location:
    Norway
    I didn’t intend to hint at this so you’re giving me credit I don’t deserve, but I agree with the point regardless!

    The results of the model are only as good as the data it’s built on.
     
    Kitty, Trish and Peter Trewhitt like this.
  3. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    774
    Location:
    USA
    Unfortunately I'd say the opposite. You're right that there is tons of data, probably even good data that points in the right direction. But the thing about biology is that it is so incredibly context specific. You may find that high levels of protein A suppress protein B in 90% of tissues, but then this relationship goes out the window in one particular tissue of interest in a disease context. We probably know tons about whatever system is driving ME/CFS. But that doesn't mean we understand anything about the specific disease mechanism. That still requires an enormous amount of new research to pinpoint what types of cells or tissue we're talking about, what particular biological processes are involved, and what exactly is going wrong that differentiates it from other diseases.

    Frankly what you describe is what I end up doing with my human brain all the time. A little detail from someone on S4ME, a finding in a mouse study from a few decades ago, a methodological detail from another paper that didn't fit with other findings in the field.

    There's a lot of data, but the problem isn't the same as aggregated mass data from other contexts. It's not that it's too much for one brain to handle, it's that the particular relevant circumstances and connections are simply not hashed out yet in experimental data. Seeing terms like fibrinogen and collagen and linking them to glycosylation is actually the easy part, I do it every day. Every new paper I read prompts me to remember little details from dozens of old papers and view them in a new light. The hard part is mechanistic understanding of how the pieces fit together--something current AI can only vaguely gesture towards based on what humans have already suggested in the discussion sections of their manuscripts.

    There's definitely a lot of bias in science, I would never argue otherwise. And AI has been shown to be incredibly useful in very specific contexts for specific reasons. But I've never seen anything from AI for this particular task that I couldn't have put together with a couple of google searches. Most of the time it's way less fruitful than my google searches, because the AI summaries tend to gloss over all the actually interesting details in the methodology and results.

    Sure, there's always a possibility that it could point to something new I hadn't considered--I'd love to have AI challenge me to think in a different way. In practice, it's only ever given me superficial connections between things that were already obvious as someone who knows basic biology. Either that, or it just recapitulated the same threads that are already hashed out over and over in every introductory section of an ME/CFS paper.
     
  4. Creekside

    Creekside Senior Member (Voting Rights)

    Messages:
    1,567
    Yes, one of the problems if ME's mechanism involves neural dysfunction is that neural activity is not simple and convenient to collect data from. Taking blood samples is easy. So is taking tissue samples. Measuring signals deep inside a living brain is much more difficult, so there's much less of that data out there. Measuring vesicle contents deep inside a human brain is even more difficult. What about structural changes in brain cells? Glial cells send out micro(nano?)scopic processes I think in time spans of microseconds. What if small changes in those are a component of ME? AI's can't look for connections in that data if it doesn't exist yet.
     
    RedFox, Sean, Kitty and 1 other person like this.
  5. Creekside

    Creekside Senior Member (Voting Rights)

    Messages:
    1,567
    Can the wrong information blind the AI to connections that might exist? Let's say there's a connection between fibrinogen and collagen in ME, showing up in 2 well-done studies, but the AI accesses 5 poorly-done studies that contradict that connection, so might it stamp that connection as invalid? Hopefully we'd train AI's to judge studies better than humans do.
     
    RedFox, Sean, Kitty and 5 others like this.
  6. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    774
    Location:
    USA
    I think so. The main issue would be that it’s not only training the AI to understand what makes a “good” study (i.e. ranking diagnostic criteria or checking for double blinding in a clinical trial). For the search for a biological mechanism, it would also have to be aware of all the potential ways that each experimental method skews results.

    That’s not going to be cut and dry—there may be a reason why a certain finding shows up in some studies of muscle cells but not in PBMCs, or why it might show up in studies of PBMCs only when samples were prepared in a specific way that would allow you to detect XYZ. That’s not going to be a matter of good vs bad studies, but simply a matter of context-dependent nuance—with way too few examples of those specific contexts for AI to be able to brute force pattern recognition (or even “reason”) from them.

    AI is very good at pattern recognition. And it can recognize patterns that aren’t obvious to humans. But in my experience digging into literature, the written details of a study only tell you so much. The rest needs to be inferred by actual deep understanding of the system and of those specific methods.

    That would require orders of magnitude more capacity than any current AI model has. And people like to say that AI ability is increasing fast, which is true. But I’d still give it decides before it can even begin to match the human brain’s ability to do what I’m describing. And even then, it would still be working off the same limited literature as everyone else.

    I think AI will greatly accelerate drug discovery, image-based diagnosis of cancers, etc. But for the problem of ME/CFS, I’d bet my money on human researchers getting there at the same rate without the help of AI. If I’m wrong, I’ll be pleasantly surprised.
     
    RedFox, Sean, Kitty and 3 others like this.
  7. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    809
    Location:
    UK
    Sorry if this misses some recent posts, something I wrote before catching up with the thread..

    Maybe I can provide another perspective or perhaps some new data to this conversation. I wonder, can we express new ideas if all we are using is the same words?

    ML/AI is often just showing patterns and relationships between data yes. A neural network at its most basic is representing a relationship between its training inputs and outputs.

    If you have a basic input layer and output layer you are representing a fairly simple and direct or linear relationship. The hidden layers, that is those multiple inside these are often representing more complex and non obvious relationships. Which each layer building upon what came in the layer before.

    So the deeper the layers the more complex and abstract the relationships. And what are these except new data? They are representing or discovering new relationships between data and information.

    Is not much of what we do in scientific discovery just discovering patterns and relationships already present in the world around us?

    I seem to remember some on this forum saying they think we may have already done the experiment which shows the secrets of ME/CFS but we just didn’t realise it. That is often the way with science, with discoveries, some things only make sense retrospectively, when you understand. And yes sometimes that takes more data to help you look at things in a different way, to help reveal what is happening.

    I’m interested in hearing more about what @mariovitali has been doing, both in terms of the approach taken but also the results. The more different ways we look at this problem the better imho.
     
    RedFox, Kitty, wigglethemouse and 4 others like this.
  8. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    809
    Location:
    UK
    I’ve caught up now and agree with a lot of what @jnmaciuch has said, so don’t want my post to be seen as a response to that. Perhaps all I’d add is that where I see some value is not in any AI tool giving us an answer but in being partners, tools for researchers. But instead they often seem to be sold as magic boxes which they are not and that drives a lot of problems and misunderstanding.
     
    ahimsa, RedFox, SNT Gatchaman and 5 others like this.
  9. Adrian

    Adrian Administrator Staff Member

    Messages:
    7,031
    Location:
    UK
    I think with transformers (which are at the heart of LLMs) the situation is slightly different. You have a mapping from input to output through multiple layers. But the input and hidden layer sizes vary with input. Thus if you have a prompt with say 100 tokens you have an input of 100*embedding size, that effectively maps to hidden sizes that are similar till you get a token on the output. This enables relationships between tokens learned between the input tokens as they propagate through. Then as tokens are generated these are fed back into the input space and the relationships with the previous tokens are represented in each of these hidden spaces etc. But essentially the semantics of the tokens (converted into embeddings - vectors) and their relationships can be learned.

    We need to be aware of what models can and can't do and some of the metrics associated with models help here. So they can read and summarize documents (but care needs to be taken as to the instructions of what the summary should contain). Some models (such as deepseek, phi4-reasoning) are trained to do reasoning tasks and with large enough models seem to do an ok job.

    Agentic AI becomes really interesting as the orchestration layer gets the model to interact with various tools and data sources - thus the agent starts by planning how to do a task and can then execute the plan with the tools (which provide additional data or ways to process the data).

    Hence I can see AI tools being good at walking through large amounts of knowledge to try to find interesting things. Unfortunately they have a habit of making stuff up (but agentic AI does help here by interacting with the data sources). But, I assume, in the medical world the basic knowledge needs to be created through experimentation.

    So I can see AI systems being useful at identifying possible directions by reading more papers than a person could and trying to gain useful insights - which is where the google c-scientist is. But I don't see it as creating new knowledge.
     
    RedFox, Kitty, Peter Trewhitt and 3 others like this.
  10. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    575
    Some comments regarding previous posts :

    1. I believe that AI can understand and identify whether a study's design is a proper one and identify any wrong doings quite easily now. So we can train an LLM to spot these wrong doings and have it report them to us. Example from this forum : https://www.s4me.info/threads/disse...genome-analysis-2025-zhang.43705/#post-602594

    2. Some users asked about which studies point to Fibrinogen, P-Selectin etc. There are actually more concepts pointing to glycoproteins. See the following thread : https://twitter.com/user/status/1863564681169838363



    3. @jnmaciuch please read the thread I posted in (2) and let me know if -given what holds in biology- glycosylation could be a research target worth looking at. If it is please give me a reference on which a researcher or group of researchers have previously identified this potential link. If what I posted is not scientifically correct, please let us know (recall I am not a biologist or related to medical science)

    EDIT : I still do not understand what it means by saying "AI will not create new knowledge". If AI was able to identify a connection which no human ever made, isn't this considered knowledge?
     
    Last edited: May 15, 2025
  11. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    3,052
    Location:
    Norway
    It is not new data because the data was already there! It has just been rearranged into a new configuration.
    The question is if the data has already been captured in our data sets or not.

    AI can’t use data that isn’t in the data it has access to, and it can’t solve problems that require data it doesn’t have. This is why a self-driving car fails when it encounters a situation that wasn’t included in the training set. If the data didn’t include plastic bags, it has no concept of a plastic bag, and therefore can’t know what to do when it encounters one.
     
  12. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    3,052
    Location:
    Norway
    This is intended as constructive feedback:

    I suspect you make it prohibitively difficult to engage with the predictions from your models when you only post screenshots of the texts. Would you be able to post the text sections as plain text? That would make it easier to read and multiquote.

    Twitter threads are also not very accessible. Many members do not have users and it requires additional effort to switch back and forth.

    Reducing the barriers to entry might make it more likely that the people you intend to engage with actually do so.
     
    Kitty, Peter Trewhitt and Trish like this.
  13. Adrian

    Adrian Administrator Staff Member

    Messages:
    7,031
    Location:
    UK
    Maybe we need to be more precise. As I would see it there are experimental results (i.e. stuff obtained from the physical world), interpretations of results (say papers and commentary), meta analysis (looking over interpretations of the results, or combining the raw experimental results). I feel along with each of these we should probably say that their is meta data which would include things like the reliability of results, interpretations which relate to say an analysis of the methodology in getting results (from sampling methods, sample sizes, sensor accuracy, mitigations for bias etc). As we talk about the interpretations of results, or meta analysis we need some notion of judgement (what makes an analysis interesting, trustworthy etc).

    So when you say identify connections. I see this as around the meta analysis where AI can do that and potentially judge the interestingness or of results. And AI can be useful as it can help cover more data sources that a person would typically be able to - although perhaps with less insight as to what is meaningful given a lack of understanding of biological systems.

    What AI won't do is create the raw experimental data (although it could help advise on what may be good targets given meta analysis).

    To me when looking at ME there are lots of published results but I would worry about how good they are - too many small trials etc. I think this makes it hard to know if connections made are in any way reliable.
     
  14. Adrian

    Adrian Administrator Staff Member

    Messages:
    7,031
    Location:
    UK
    So how about a simple experiment to get an LLM (with appropriate prompting) to read a few study designs and suggest potential issues (and good design elements). I'm not convinced that a foundation model would do a good job (perhaps with fine tuning but I'm not sure what the dataset would be).
     
    RedFox, Kitty, Peter Trewhitt and 2 others like this.
  15. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    3,052
    Location:
    Norway
    See my previous comments where I said that we should not get stuck on the semantics of «knowledge», but rather think of the fundamental nature of «data».

    If an AI model can only paint with black and white, it will never be able to paint red. If you need red to solve a problem, the AI model in this instance will not be able to solve it, no matter how many ways it can rearrange black and white.
     
    RedFox, Kitty and Peter Trewhitt like this.
  16. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    575
    Sorry, but I do not understand why this is so. Practically speaking, if a number of analytical methods can better analyse and interpet existing knowledge, isn't this information that can potentially help us improve patient lives ?

    AI is now being used to optimise the function of ER departments and it does so based on existing data. No human can possibly put together all of the factors but AI can and this will save people lives. It is that simple.
     
    Kitty, Peter Trewhitt and hotblack like this.
  17. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    809
    Location:
    UK
    All good points. I suppose I was thinking more widely than LLMs.

    I’m not sure if a transformer architecture itself precludes usefulness. Having an attention mechanism introduce a more relational aspect and allowing information from the input at inference time to be included could be useful, specifically with some data types (language obviously). And off the shelf LLMs can be used in interesting ways as a basis for more fine tuning or in conjunction with other models.

    That said I don’t see much value in just asking an LLM for new knowledge or answers as they’re sometimes portrayed as doing out of the box. I’ve not been convinced that patterns in language alone (or even multimodal transformer models) have the information needed for proper intelligence to emerge in the way some have.
     
    Kitty and Peter Trewhitt like this.
  18. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,622
    Of course it is able to generate new information and data by making inferences from things already existing. That alone doesn't require a complex architecture, one hidden layer can suffice as already proven since the 80's.

    It is no question that it gains insights in ways that are methodogically completely different to humans, but whether that fundamentally different to how humans gain knowledge is a whole different and highly debated question. I think that debate is also irrelevant to whether AI can make contributions to ME/CFS research. That humans are currently needed for experimentation, if just as a vessel to create raw experimental data, is probably not @mariovitali point of the argument who uses existing data nor that of @jnmaciuch who argues that the specifics of said data matter more than its existence itself. Besides that, my last blood draw was not taken by a human but by a robot.

    However, much as @Adrian I see reliability of data as the largest problem and as such currently have no hope of any positive findings using the above methods being of relevance. A vague reference to "glycoproteins" seems meaningless when we are not looking for general references but very specific contexts. In most instances ME/CFS is the worst situation for current AI's to work on (of course there exist other valuable applications in ME/CFS such as in imaging where the use is already common, but that's not the reference point here). However, what I do think exists are any abundance of reliable negative results. I wouldn't trust an AI to filter through those, precisely due to the complexity of studies where as @jnmaciuch mentions details and context, which often is neither part of papers nor necessarily even known to the authors of the paper and often not even known to the other papers of most other papers (so not part of the data!), but I do think S4ME could be helpful in filtering those. Have you ever tried to approach the problem from this opposite angle @mariovitali nurturing human knowledge as well, to combine classical and AI methods (much as how state of the art applications often rely on a synergy of both) in very specific contexts rather than in a very general framework?

    If we want to avoid a vague reference to "glycoproteins", could you provide some details on what exact study, including protocol, your AI could recommend to have conducted in ME/CFS, or are we not there yet?
     
    Last edited: May 15, 2025
  19. Adrian

    Adrian Administrator Staff Member

    Messages:
    7,031
    Location:
    UK
    Potentially a complex argument. I think LLMs do exhibit some aspects of intelligence but they are essentially running a mapping process over weights. So the notion of memory and reasoning about things that have happened/been observed is not there. That is where the agentic ideas help as memories of previous queries, answers, ability to find/assess more knowledge comes in. For notions of awareness or consciousness such short term memory is necessary.
    However, large contexts (input sizes) are computationally challenging so I think we will see agentic AI systems crafted to help with particular tasks emerging rather than general AI.
     
  20. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    809
    Location:
    UK
    It is, very! I was thinking along the lines of what people like Paul Cisek have argued, if you’re familiar with his work. If not I’ll dig out some presentation of his for you. All very interesting stuff.
    And yes, agree with your other points. I think there’s a few reasons for the push towards agentic tools, both practical and business.
     
    Kitty and Peter Trewhitt like this.

Share This Page