Preprint Comparing DNA Methylation Landscapes in Peripheral Blood from [ME/CFS] and Long COVID Patients, 2025, Peppercorn et al

Discussion in 'ME/CFS research' started by Nightsong, May 20, 2025 at 9:01 PM.

  1. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,621
    I haven't had a closer look yet but Tate had previously published a similar study. I wouldn't be surprised if there is some overlap.
     
    Kitty, Ravn and Peter Trewhitt like this.
  2. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,523
    Location:
    Aotearoa New Zealand
    Thanks for the description of the PCA of the 70k fragments, jnmaciuch.
    That's good news and will go some way to making things clearer, although inclusion of that PCA is still a problem. Many of us went 'wow! that looks amazing' when we looked at the chart - and not everyone has the interest or time to think 'it's too good to be true' and look closer at things.

    I think the people who are saying 'oh, it's what is always done' are missing the extreme level of selection that happened here. If you only use 4.6% of 70k data points for the PCA, specifically the ones that separate out your pre-defined groups, ... the pre-defined groups will be separated. It's not really more complicated than that. I want the ME/CFS researchers that we support and are relying on to find answers to do better than that.

    I don't think PCA is a useful tool for this particular analysis. It doesn't need to be in the paper, it's circular and is a distraction. The manuscript would be better if it just acknowledged the very small sample sizes and gave more space to the consideration of whether the identified DMFs, especially those ones common to both disease groups, might tell us something about ME/CFS and ME/CFS-like LC.
     
  3. Nightsong

    Nightsong Senior Member (Voting Rights)

    Messages:
    1,165
    A previous study by this group was Changes in DNA methylation profiles of myalgic encephalomyelitis/chronic fatigue syndrome patients reflect systemic dysfunctions (Helliwell et al., 2020). From a quick glance: very similar methodology (although DMAP + single-CpG MethylKit vs DMAP2 in the more recent one). n=20 (10 pwME, 10 controls); RRBS (146575 frags); DMAP found 76 DMFs (52% hypo), MethylKit 349 DMCs (56% hypo), highest rep in intergenic (40%) & intronic (25%) regions. (The abstract states 394, but the results section states 349.); fragment-level statistics reported without FDR correction.

    This group's RRBS pipeline was documented in Chapter 9 of the new Springer Protocol (link).
     
    Kitty, Ravn, jnmaciuch and 4 others like this.
  4. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    For what it’s worth, the full PCA does actually tell an interesting story along PC2—the fact that there are distinct “Neapolitan stripes” between the three groups, despite some messy outliers, is impressive since it is actually taking into consideration all 70K sites (even despite the fact that the 5/5 selection to 70K will introduce some skewing to begin with, it’s been a very lively debate in epigenomic sequencing for years as to whether you can pre-select sites/peaks based on any a-priori knowledge)

    In my opinion, the story that the full PC2 tells is that people with longer disease are trending back towards healthy control in terms of DNA methylation. If it was simply a difference of “LC or ME/CFS is different from everything else,” you wouldn’t see neopolitan, you’d see one stripe amidst a soup of the other 2 groups.

    I’m certainly not going to call that a definitive finding on the basis of n=15, but it could absolutely be the basis of a future study comparing people who got LC at the beginning of the pandemic to those recently afflicted to assess impact of disease duration with the same trigger.

    So I’m absolutely in agreement with you @Hutan that as it is used, the PCA ends up being circular. Which is a shame in my opinion, since there is an interesting story here despite the tiny sample size. They do go into it in the later analysis, but it almost loses the punch.

    I do understand why they didn’t show it like that though. If you don’t know to look along each axis separately, it just all looks like soup. But I think there were possible ways to finesse this—using only the PC2 score as its own latent variable and showing it as a box plot, for example.
     
    Last edited: May 23, 2025 at 12:13 AM
    Kitty, geminiqry, Ravn and 4 others like this.
  5. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    I’m also realizing just now that I have a tendency to use food metaphors when discussing science close to dinner time. I’m a caricature of myself
     
    Kitty, chillier, Ravn and 4 others like this.
  6. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,523
    Location:
    Aotearoa New Zealand
    Hope you had a good dinner jnmaciuch.:)

    What was the percentage variation explained by the PC1 and PC2 axes?

    Yes, the 5/5 selection is worth noting here. Only fragments that were present in every participant were included. So, that means that fragments that were only present in men, or that weren't present at all in the healthy controls... etc were excluded. That's interesting information to explore when there are bigger cohorts. It also sounds as though there was an element of subjectivity here in what presence requirement was used. I imagine the difficulties that missing data presents was a factor in the decision to only use fragments with 100% presence.

    Sure. There might be other sources of variation between cohorts too, such as whether a sample was taken in the afternoon or morning or how the specimen was stored.

    I think it's reasonable to assume that readers of a paper like this will understand how to read a PCA chart, and, in any case, it's easy enough for the text to explain that it's the PC2 that separates the groups so that people without prior knowledge will understand. But, I don't think it's reasonable to expect a PCA to separate out people on the basis of disease specific DNA methylation when you only have 5 people per group and 70k fragments resulting from all sorts of biological processes.

    Yes, definitely, there are better statistical analysis and presentation tools, and I don't think they have to be related to PCA at all. What we want to know from a study like this is 'what associated genes were found to be differentially methylated between the groups?', so that that information can give us ideas about the disease mechanism.
     
    Last edited: May 23, 2025 at 2:36 AM
  7. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,422
    Location:
    London, UK
    I would agree with Utsikt. Peer review is a lottery and generally far less fair and rigorous than the attention of members here. If the analysis seems misleading that needs to be understood by all.
     
    Peter Trewhitt, Kitty, Utsikt and 4 others like this.
  8. chillier

    chillier Senior Member (Voting Rights)

    Messages:
    261
    I think you're right, but it's not just the PCA it's also the fact they don't multiple test correct and find basically exactly the number of positive results you'd expect by chance. It's cool there are genes that could make sense, but as it stands I don't feel I can trust any of it.

    However, if what @jnmaciuch says about the PCA done on all 70k+ fragments is true then that's a different story. If the groups really look like they're separating out on that PCA then that would be encouraging. Unlike the PCA on only the significant fragments you wouldn't expect to see group differences there I think based on noise alone.
     
    Last edited: May 23, 2025 at 10:31 AM
  9. chillier

    chillier Senior Member (Voting Rights)

    Messages:
    261
    Here is how it looks with simulated random data. From 72k features in each of three groups consisting of 5 individuals sure enough you get here ~3.6k (3575) positive hits with anovas and a PCA that look very similar to theirs on the significant fragments.
    upload_2025-5-23_10-24-54.png

    Note that doing the PCA on all the simulated results doesn't separate the groups; if Peppercorn/Tate's does that would be of interest. I hope we can get to see that PCA soon.
    upload_2025-5-23_10-27-33.png

     
  10. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,621
  11. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,523
    Location:
    Aotearoa New Zealand
    Thank you @chillier! Yes, those charts are exactly what I was expecting.

    @jnmaciuch described the PCA on the 72k data as Neapolitan layers with messy outliers on the PC2 axis. That chart from random variables is not far off that - in fact I can see a bit of separation of the "groups" on both axes. With so few data points, it isn't hard to find a pattern of some sort.
     
    Peter Trewhitt, Kitty and EndME like this.
  12. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    A little over and under 10%, respectively. So there’s a lot of noise even when selecting down to 70K, probably to be expected
     
    Peter Trewhitt, Kitty and Hutan like this.
  13. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    [edit: sorry, hit post too soon] Yes if you pulled more of the red dots into the middle and pushed the green and blue further apart, it wouldn’t look too far off. It’s a weak signal, so probably would come down to ANOVA between groups on PC2 scores. Definitely would like to see replication on a larger cohort
     
    Kitty and Peter Trewhitt like this.

Share This Page