Too much focus on your health might be bad for your health: Reddit user’s communication style predicts their Long COVID likelihood, 2024, Segneri+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Too much focus on your health might be bad for your health: Reddit user’s communication style predicts their Long COVID likelihood
Ludovica Segneri; Nandor Babina; Teresa Hammerschmidt; Andrea Fronzetti Colladon; Peter A. Gloor

Long Covid is a chronic disease that affects more than 65 million people worldwide, characterized by a wide range of persistent symptoms following a Covid-19 infection. Previous studies have investigated potential risk factors contributing to elevated vulnerability to Long Covid. However, research on the social traits associated with affected patients is scarce. This study introduces an innovative methodological approach that allows us to extract valuable insights directly from patients’ voices. By analyzing written texts shared on social media platforms, we aim to collect information on the psychological aspects of people who report experiencing Long Covid. In particular, we collect texts of patients they wrote BEFORE they were afflicted with Long Covid.

We examined the differences in communication style, sentiment, language complexity, and psychological factors of natural language use among the profiles of 6.107 Reddit users, distinguishing between those who claim they have never contracted Covid -19, those who claim to have had it, and those who claim to have experienced Long Covid symptoms.

Our findings reveal that people in the Long Covid group frequently discussed health-related topics before the pandemic, indicating a greater focus on health-related concerns. Furthermore, they exhibited a more limited network of connections, lower linguistic complexity, and a greater propensity to employ emotionally charged expressions than the other groups. Using social media data, we can provide a unique opportunity to explore potential risk factors associated with Long Covid, starting from the patient’s perspective.

Link | PDF (PLOS ONE) [Open Access]
 
excessive use of social media for information and communication about health-related topics, such as the coronavirus pandemic, can impact perceived disease symptoms. This is also known as hypochondriacal beliefs affecting disease progression. Mahat-Shamir et al. confirmed the mediation effects of hypochondriasis symptoms of social media users in the context of the COVID-19 pandemic. Hence, we assume similar effects for Covid long haulers

Lastly, Health includes words related to physical and mental health, such as “doctor”, “sick”, “pain”, and “therapy”; it focuses on a user’s perception of their physical well-being. Brown et al., Ferguson et al., and Pauli & Alpers demonstrated that individuals with hypochondriacal beliefs tend to process health-related information more extensively than those without such beliefs. Using this dimension, we want to investigate whether individuals who have experienced Long Covid exhibited more significant health concerns before the pandemic; this could imply greater control and thus a greater likelihood of discovering that the preexisting symptoms of Covid-19 are associated with Long Covid.

it is plausible to suggest that Long Covid users are more susceptible to stress. In contrast, in the content posted before the pandemic, subjects in the Long Covid class were already mainly talking about health. This finding implies that health was already a significant concern or interest for these individuals even before their experience with Long Covid. It suggests that individuals who develop Long Covid symptoms may have a greater tendency towards hypochondria. Research conducted by Brown et al., Ferguson et al., and Pauli & Alpers have shown that individuals with hypochondriacal beliefs tend to process health-related information more extensively than those without such beliefs.

our study helps to create psychological profiles related to Long Covid patients, highlighting that people who developed Long Covid often discussed health-related topics before the pandemic. Hence, it suggests that those users may have a greater tendency towards hypochondria, which aligns with previous research on how hypochondriacal belief impacts disease progression.
 
(It continues to astonish me just how large a body of literature will have to be ejected to make any progress. Coming from a highly technical area of medicine with a literature base that is often very complex (physics, maths and now machine learning), I just can't even with this stuff.)

They almost develop some insight in their limitations section, mentioning comorbidities, but immediately backtrack to "health anxiety".

Additionally, our reliance on the Reddit dataset limits our ability to include information about pre-existing health conditions that may increase susceptibility to developing Long Covid. As Jacobs et al. suggested, it is crucial to consider comorbid conditions such as asthma, chronic constipation, reflux, seasonal allergies, rheumatoid arthritis, depression/anxiety, in addition to age, gender, race, and smoking. These factors proved to be significantly associated with the development of Long Covid.

Therefore, future research should explore incorporating additional variables beyond those proposed in our current study. One potential approach is to complement our social media analytics with interviews and questionnaires targeting individuals who have experienced COVID-19 symptoms and Long Covid. An ideal strategy would involve mapping their ego networks while comprehensively analyzing the discrepancies between online social networks and real-life social networks. Another interesting analysis might also consider different categories of people with Long Covid, based on their symptoms. It could be that our predictors may more accurately anticipate psychological aspects related to the disease, such as depression or feeling tired, rather than more physiological aspects, such as a taste or hearing disorder.

Finally, although this study focused on factors predicting the likelihood of Long Covid, those factors may also be appropriate to predict general health anxiety, especially considering the heterogeneity of Long Covid symptoms. As illustrated in the introduction, individuals affected by Long Covid often exhibit symptoms similar to those of depression or migraine, including headache and anxiety, and thus, similar predictors. This overlap could suggest that the same factors predicting Long Covid might also influence broader health diseases. Therefore, future research should aim to provide more detailed insights into the specific symptoms and predictors differentiating Long Covid from other illnesses, employing an approach that prioritizes depth over breadth.
 
After spending roughly two minutes skimming this paper:

I think the study is unethical: although the Reddit forum is public no-one consented to the scraping of their health-related data. Reddit users are not representative of the population (probably younger and more technically literate amongst other factors) and users of support groups tend not to be representative of people with their illnesses. It doesn't address potential confounding variables: age, gender, socioeconomic status, pre-existing health conditions all of which can influence communication style, likelihood of support group use as well as likelihood of developing Long COVID. Looks like a lot of low McFadden's R2 values suggesting that many of their models explain only a small proportion of the variance in the data. Also they jump to hypochondriasis with no consideration of alternative explanations: surely people more interested in health-related topics would be more likely to participate in an online support group for the illness or to discuss their personal experiences online?

Won't waste further time on this but would be a good one for a journal club to tear apart.
 
Could it be something as simple as those who go to Reddit to discuss their psychological and health issues will be open about their long covid, whereas those who opt not to discuss such personal health issues on Reddit will continue not doing so when they get long covid?

I agree this research is both unethical and worse than useless.
 
This makes me highly uncomfortable.

I have some pretty vulnerable posts I made on the specific forum they analysed at the beginning of my illness.

But what makes me uncomfortable is that they then went on to analyse users post history outside the forum.

I doubt they asked the moderators if they could use the data either. Disgusting…

And this doesn’t even account for the fact that a lot of accounts on the forum are fake accounts with fake “illness stories” to try and advertise treatments.
 
Also the study seems to be ignoring that people who had prexisting health conditions were more likely to develop long COVID?

Also, people who used reddit to discuss chronic health issues pre-pandemic will be more likely to use it to discuss their long covid, than those who used reddit without discussing health issues. (Not sure if the coronavirus infection group counts because its not the same to talk about being infected with COVID, “oh cool, I have the new virus? any tips”, than to be talking about chronic health issues.)
 
Last edited:
If my data had been scraped I think I'd make ethics complaints to the institutions of everyone involved pour decourager les autres.

Something that does concern me is that if "research" like this is being carried out on unsuspecting Reddit support-group users then it could also be carried out against us by scraping S4ME. Perhaps would be useful to obfuscate usernames from non-logged-in users, implement rate limiting and browser behaviour checking, and/or make more content members-only?
 
Last edited:
If my data had been scraped I think I'd make ethics complaints to the institutions of everyone involved pour decourager les autres.

Something that does concern me is that if "research" like this is being carried out on unsuspecting Reddit support-group users then it could also be carried out against us by scraping S4ME. Perhaps would be useful to obfuscate usernames from non-logged-in users, implement rate limiting and browser behaviour checking, and/or make more content members-only?
We are aware of this issue. When we updated our forum rules recently we added a rule that permission will not be given to use any part of the forum as source material for research. We also made a copy of the rules public, so there is no excuse for researchers not being aware of this rule.

https://www.s4me.info/threads/welcome.38181/#post-527624
 
Last edited:
If my data had been scraped I think I'd make ethics complaints to the institutions of everyone involved pour decourager les autres.

Something that does concern me is that if "research" like this is being carried out on unsuspecting Reddit support-group users then it could also be carried out against us by scraping S4ME. Perhaps would be useful to obfuscate usernames from non-logged-in users, implement rate limiting and browser behaviour checking, and/or make more content members-only?
If I had the energy, I would. I hope someone does.
 
This "study" is a good example of the harm of bias in medical research. It starts off with a sound premise and choices, but veers off completely into a pre-fabricated appallingly biased conclusion that makes an explicit decision about a preferred direction of causality out of nothing but correlation. So in other words: typical non-biomedical research, which is usually a proxy study of the researchers' biases and outcome-seeking using cherry-picked data to justify themselves.

I have no issues with the choices they made about using those sub-reddits, they are public forums, or checking users' comments in other sub-reddits. But Reddit is not a place where people talk about all the things in their lives, it's a limited sub-set and a lot of people who do discuss many other things elsewhere only go on Reddit for some topics, like people who used it sporadically before, but heavily once they developed LC. Might as well record people's conversations to study what they focus on in their daily lives, but only at grocery stores and clinics, and conclude that they only talk about food and health problems.

It's actually odd that they chose to do only text pattern matching. Given all the AI hype, they made a choice to not use it where it would have made far more sense. They do broach the issue of how the patterns had to be refined, but it's simply not sufficient. They make arbitrary decisions about the so-called traits that aren't too different from how phrenology was formed on the basis of explicitly European cranial features as the morphological traits of civilized intelligence.

The importance they give to first-person narrative is plain weird. Health problems are personal in ways that most other topics are discussed. Of course they will more often have first-person pronouns than, say, discussions about hardware or video games.

They do notice and discuss the fact that it has been shown that prior health problems are a significant risk factor, but choose to disregard this in order to produce a very distinctly clickbait title that will be widely shared by those who had already made their minds.

Their claims have to be accepted at face value as well. There is no verifiable data supporting their claims that users talked more about health, just the final output of their arbitrary choices. I don't think it would hold much to scrutiny.

However a major factor here, which would create a trend, is that the chronic illness community is heavily present in those forums, precisely because the medical profession has failed them all. I don't know if it's because they are ignorant of that fact. I guess they must be. Nevertheless this would be like finding that a population of refugees fleeing from a war-torn impoverished region sure has an usual number of attorneys, medics and other types of professionals with advanced degrees, because they also counted the staff from international NGOs who make up 10-20% of the people there.

Again and again you find this, it's not even a pattern, it's an obsession: bias, bias everywhere. Even by the standards of social sciences, and this here is social science, it's extreme in medical research. It's about always making the same decisions as to where the direction of causality must go based on what they prefer to see out of correlational data.

I have no doubt this will be shared gleefully by self-important smug people everywhere for a few weeks, then be totally forgotten. It's just sad seeing how confused and dysfunctional this profession is. The best they can do is borderline magical, but the worst that they do is atrocious at a level that is unacceptable in any other profession. High floor, very low ceiling, in the parlance of sports.
 
There's policy-based evidence making. Maybe there's also ego-driven evidence-making?

Driven by the need to avoid the painful admission that one is not perfect and is disappointing patients that are seeking help.

If it's not a "medical problem" then it's not the responsibility of the doctor but someone else's.
 
Hypochondriacs posting in an online LC group doesn't make them an LC patient. The title of the paper should be changed to "Reddit user’s communication style predicts their self-declared Long COVID likelihood". And of course, hypochondriacs are more likely to claim that they have LC, just like they do with cancer, Black mold allergy, EM sensitivity, MCS, MS, Lyme disease, ... and ME/CFS. Doesn't mean that they have them, till they are diagnosed. Nor does it mean you are a hypochondriac if are diagnosed with LC. Since neither implies the other, it's a useless hypothesis.
 
Back
Top Bottom