Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Esther12 · Oct 14, 2019

Michiel Tack said:
So Wise & Brown (2005) say that "The minimal clinically important difference (MCID) for the 6MWT is conservatively estimated to be 54-80 meters." The difference between the improvements in meters walked in the GET group (67 meters) and SMC (22 meters) was 45 meters, so less than 54-80 meters.

The data for the 6-minute walking test is also no longer statistically significant if the data from the PACE trial is pooled with the data from Jason et al. 2007 (if my attempt at a meta-analysis is correct).

So I don't think there's a case for arguing that GET increases the walking ability of CFS patients.

Would/could they exclude Jason for baseline differences (are those differences apparent for 6mwt data?)?

ME/CFS Science Blog · Oct 14, 2019

Esther12 said:
Would/could they exclude Jason for baseline differences (are those differences apparent for 6mwt data?)?

Don't think so.

Mean baseline score and standard deviation for the exercise group: 1335.27 (280.99)

Mean baseline score and standard deviation for the control group: 1317.78 (296.55)

They have excluded the results of Jason 2007 for physical function because of baseline difference (a correct decision I think, the mean (sd) for the exercise and control group at baseline were 39.17 (15.65) and 53.77 (26.66) respectively) but not for other outcomes. So I assume they would include it for the 6min walking test if they had analyzed this outcome.

Jonathan Edwards · Oct 15, 2019

Dolphin said:
For what it is worth:

https://www.ncbi.nlm.nih.gov/pubmed/17136972

That abstract sounds to me like a misunderstanding of statistics. I d not see what it means to be statistically confident of a change in an individual. If there was a change, however, small, there was a change. I am not sure that the variance in others is relevant.

Barry · Oct 15, 2019

Michiel Tack said:
So Wise & Brown (2005) say that "The minimal clinically important difference (MCID) for the 6MWT is conservatively estimated to be 54-80 meters." The difference between the improvements in meters walked in the GET group (67 meters) and SMC (22 meters) was 45 meters, so less than 54-80 meters.

The data for the 6-minute walking test is also no longer statistically significant if the data from the PACE trial is pooled with the data from Jason et al. 2007 (if my attempt at a meta-analysis is correct).

So I don't think there's a case for arguing that GET increases the walking ability of CFS patients.

And I'm sure I recall a PACE participant saying they did their 6mwt by 'exchanging' it for some of their otherwise normal activities. Even apparently objective measures are not really that objective for ME, unless you reliably track energy usage continuously over time. A key facet of ME is that people can to some degree conserve a bit of energy for if they really need some a bit later. I'd love to know if in the GET arm there was any suggestions hinted to participants to make sure and take things easy, to rest up, before coming in for their 6mwt. I may well be wrong on this, but I'm a skeptical old beggar.

ME/CFS Science Blog · Oct 15, 2019

9) Minimally important differences
I’ve been reading up on the issue of minimally important differences (MID), the smallest difference that patients are likely to consider important. The authors of the Cochrane review have used MID to suggest that the treatments effects they found are clinically relevant.

There are basically three methods to estimate MIDs. There’s the distribution method which estimates MIDS based on easily available figures such as the standard deviation. There’s the anchoring method where scores on a questionnaire are compared to an ‘anchor’, another measure whose score is easily interpretable. Thirdly, there’s a quantitative measure where patients, clinicians or experts are interviewed directly on what they think the MID is.

Fatigue
For the 33-point Chalder Fatigue Scale (CFQ) Larun et al. refer to a lupus study (Goligher et al. 2008) that found a MID of 2.3 points. It used the anchoring method that went along these lines. First lupus patients filled in the CFQ. Afterwards they had a 10 minute discussing with another patient about their fatigue. They then had to estimate how the other patient’s fatigue relates to theirs on a 7-item response scale going from: “Much more fatigue,” “Somewhat more fatigue,” “A little bit more fatigue,” “About the same fatigue,” “A little bit less fatigue,” “Somewhat less fatigue,” to “Much less fatigue.” This way the authors could estimate scores on the CFQ correspond to the responses given on the 7-item scale. They can then estimate the MID by looking at the difference in CFQ scores between “About the same fatigue,” and the other responses and using regression analysis. Two other studies have used the same approach to estimate MID for the CFQ, one also in lupus patients, the other in patients with rheumatoid arthritis. Their estimates are similar, in the range 1.4-3.3 (It’s difficult to abstract the right data from the paper but I think these are the relevant figures),

There are however some problems with this approach. If a questionnaire is insensitive to severity of fatigue (for example if it does not give a strong difference in scoring between someone who is mildly fatigued and someone who is severely fatigued) then the scores corresponding to a difference between responses “About the same fatigue” and “A little bit less fatigue” will look small. Consequently, the MID will also be small. I think that might be what happened with the CFQ because in the three studies MID were calculated for multiple questionnaires and in all three cases the standardized MID (the MID divided by the standard deviation) was the lowest for the CFQ. If one looks at the graphs that give the mean CFQ scores for the 7 responses, one can see that there’s quite a lot of overlap. Sometimes the score difference for “About the same fatigue,” is higher than for “A little bit less fatigue”. So the method is certainly not perfect (I suspect the data from these studies is interesting for those wanting to analyse the CFQ).

I think Nosthal et al. 2018 used a better approach. They handed out the CFQ to persons who were about to have a surgery and they let them fill in the questionnaire at different time points after the surgery until their recovery. Each time the researchers added the question: “Given your current description of fatigue; would you say it has been of considerable significance to you?; Yes/No”. This way they could estimate which CFQ score corresponds to a yes to this question. On average this was a score of 15.1. Because the baseline score was 11.7, I think the study indicates that an increase of 3.4 was the minimal significant difference.

Then there’s the letter by Ridsdale et al. 2001., where they proposed a MID of 4 points. They have used this threshold in a 2012 economic analysis of a study of GET and counseling for patients with chronic fatigue. Although Trudie Chalder en Simon Wessely, the creators of the CFQ, were not listed as authors on the letter, they were part of the study that Ridsdale et al. were defending. So I think it’s reasonable to assume that they were included in the consensus view Ridsdale et al. referred to. The full quote is as follows: “The researchers in this trial include several of those involved in developing and testing the instrument. Our consensus view was that a difference of less than four, using a Likert scale, is not important.”

Finally there’s the PACE trial (of which Chalder is an author) where a distribution approach was used. They took half the standard deviation and came up with a MID of 2. There were some problems with this approach though. The CFQ was used in the selection criteria of the sample so the SD was probably reduced. I think the standard deviation of the 33-point CFQ is usually somewhere between 5 and 6 points, so a distribution estimate of the CFQ would come up with a MID of 2.5-3 if done correctly. It also seems that the MID used in the PACE trial was a post-hoc measure. Originally the authors were going to use the 11-point version of the CFQ and regard a 50% reduction or a score lower than 3 as a ‘positive outcome’. The baseline score on the 33-point version was approximately 28 points. Even if we take 11 points as the bottom of the scale, a 50% reduction would still mean a reduction of 8,5 points on the CFQ. Even though they planned to use the 11 point version, I think it’s clear they initially had a larger point reduction in mind to measure improvement.

In conclusion: the MID estimate used by Larun et al. is not way off from other estimates, although none of these have asked patients what they would consider a MID. The 3.4 point difference found in the meta-analysis is very small, close to the MID and even lower than some estimates.

Physical functioning

Edit: some of the MID's cited below represent normalized values which have a compressed scale. They recalculated original values are substantially higher. See the posts and updates below.

I think the conclusion is similar for the SF-36 physical function (scale 0-100), where the authors have used a MID of 7 points. It’s a bit on the low side but perhaps not unreasonable. The PACE trial used a MID of 8 points for SF-36 physical function, which was half the standard (SD) deviation in their sample. But as they used the SF-36 as a selection criterion their SD was lower than in an unselected sample. In the observational study by Crawley et al. the SD for physical function at baseline was 22.7 which would result in a MID of 11.35. The GETSET protocol also used a MID of 8 points.

Crawley did a study on the MID for the SF-36 on adolescent CFS patients, just last year (Bridgen et al. 2018). They came up with a MID of 10 points. This actually looks like a decent study, as they used both the distribution, anchoring and qualitative method. They took “qualitative interviews” with 21 young CFS patients and their parents to see what they would find a MID. A MID of 10 was used in the FITNET-NHS protocol and the Lightning process economic analysis.

Estimates from other patient groups are in the same range. Wyrwich et al. 2005 used a Delphi technique, a process where experts come together to reach a consensus on something. The result was a MID of 10 points on the SF-36 physical function scale for patients with heart or lung disease. But two years later Wyrwich et al. used the anchoring method on patients and GP’s and they came up with lower estimates. So that’s one of the papers Larun et al. refer to for using a MID of 7. The other reference they list is a 2014 study (Ward et al. 2014) on patients with rheumatoid arthritis that used the anchoring method to come to a MID of 7.1 points. I’ve found a study on Idiopathic Pulmonary Fibrosis (Witt et al. 2019) that found higher estimates (range 10.1-22.2) but there’s another study on idiopathic pulmonary fibrosis that came up with a much lower MID of only 3 points (Swigris et al. 2010). A study on prostate cancer survivors (Jayadevappaet al. 2012) came up with a MID of 7 points.

Trish · Oct 15, 2019

That's all really interesting @Michiel Tack. Thank you for doing all that research and putting it together.

I am left, however, with my own experience of trying to fill in the CFQ and SF-36 physical functioning questionnaires, and seeing that I can, on the same day, and at the same time, get scores with a range way larger than the so called minimal clinically significant changes found with all these supposedly scientific methods, just by taking a slightly different mind set to the questions.

Right now, today, I could score myself anywhere from 23 to 30 on the CFQ and those same scores applied when my ME was mild. In other words, it would be completely useless on guaging the severity of my ME - it simply counts how many of a random list of fatigue related statements apply to my ME.

And currently, today, I score myself from 0 to 15 on the SF-36PF, and I would have scored from 25 to 50 on a good day when my ME was mild, depending on whether I was feeling positive or negative about my limitations, and what I was comparing myself with.

All those studies you cite required people to answer subjective questions in a context where they were trying to please a therapist. That is not real life.

@Graham has illustrated this brilliantly with this video:

Discussed here:
https://www.s4me.info/threads/me-analysis-the-3-pace-videos-factsheet.6106/

Daisymay · Oct 15, 2019

[QUOTE@Graham has illustrated this brilliantly with this video:

Discussed here:
https://www.s4me.info/threads/me-analysis-the-3-pace-videos-factsheet.6106/[/QUOTE]

That video is SO good, Graham explains things so clearly.

Graham · Oct 15, 2019

Actually, I tried an experiment on my local ME support group a while back, while explaining the problems with PACE. I asked them to fill out the sf-36 form by themselves with no discussion, and if they had time they could fill out the Chalder Fatigue Scale as well.

Then I told them how to score them: they were astounded just how limited the questions were. "Is that it? Is that all? They based their conclusions just on that?"...

Next I explained to them that these questionnaires were supposed to be appropriate for the full health range, and to think just how limited the bed-bound people with ME are: to think of all of those who would struggle or even find it impossible to crawl from the bedroom to the bathroom. I told them to think carefully about whether their gradings were appropriate: in that context, did they really have a lot of difficulties? I asked them to go through the questions again and score it again.

Guess what - according to PACE rules, my little chat had improved their ME by a significant amount, and some of them were close to being cured.

I'm thinking of advertising my techniques, but I have yet to come up with a suitable label – Graham's Exhortatory Threats?

Thanks for the kind words about the video. Next time I'll get my hair styled properly.

Have you seen the singing sunflowers, @Daisymay ? It's video 6, The Recovery Song. It may bring a smile.

NelliePledge · Oct 15, 2019

@Graham aka the Exhortatist.

Dolphin · Oct 15, 2019

Michiel Tack said:
9) Minimally important differences
I’ve been reading up on the issue of minimally important differences (MID), the smallest difference that patients are likely to consider important. The authors of the Cochrane review have used MID to suggest that the treatments effects they found are clinically relevant.

There are basically three methods to estimate MIDs. There’s the distribution method which estimates MIDS based on easily available figures such as the standard deviation. There’s the anchoring method where scores on a questionnaire are compared to an ‘anchor’, another measure whose score is easily interpretable. Thirdly, there’s a quantitative measure where patients, clinicians or experts are interviewed directly on what they think the MID is.

Fatigue
For the 33-point Chalder Fatigue Scale (CFQ) Larun et al. refer to a lupus study (Goligher et al. 2008) that found a MID of 2.3 points. It used the anchoring method that went along these lines. First lupus patients filled in the CFQ. Afterwards they had a 10 minute discussing with another patient about their fatigue. They then had to estimate how the other patient’s fatigue relates to theirs on a 7-item response scale going from: “Much more fatigue,” “Somewhat more fatigue,” “A little bit more fatigue,” “About the same fatigue,” “A little bit less fatigue,” “Somewhat less fatigue,” to “Much less fatigue.” This way the authors could estimate scores on the CFQ correspond to the responses given on the 7-item scale. They can then estimate the MID by looking at the difference in CFQ scores between “About the same fatigue,” and the other responses and using regression analysis. Two other studies have used the same approach to estimate MID for the CFQ, one also in lupus patients, the other in patients with rheumatoid arthritis. Their estimates are similar, in the range 1.4-3.3 (It’s difficult to abstract the right data from the paper but I think these are the relevant figures),

There are however some problems with this approach. If a questionnaire is insensitive to severity of fatigue (for example if it does not give a strong difference in scoring between someone who is mildly fatigued and someone who is severely fatigued) then the scores corresponding to a difference between responses “About the same fatigue” and “A little bit less fatigue” will look small. Consequently, the MID will also be small. I think that might be what happened with the CFQ because in the three studies MID were calculated for multiple questionnaires and in all three cases the standardized MID (the MID divided by the standard deviation) was the lowest for the CFQ. If one looks at the graphs that give the mean CFQ scores for the 7 responses, one can see that there’s quite a lot of overlap. Sometimes the score difference for “About the same fatigue,” is higher than for “A little bit less fatigue”. So the method is certainly not perfect (I suspect the data from these studies is interesting for those wanting to analyse the CFQ).

I think Nosthal et al. 2018 used a better approach. They handed out the CFQ to persons who were about to have a surgery and they let them fill in the questionnaire at different time points after the surgery until their recovery. Each time the researchers added the question: “Given your current description of fatigue; would you say it has been of considerable significance to you?; Yes/No”. This way they could estimate which CFQ score corresponds to a yes to this question. On average this was a score of 15.1. Because the baseline score was 11.7, I think the study indicates that an increase of 3.4 was the minimal significant difference.

Then there’s the letter by Ridsdale et al. 2001., where they proposed a MID of 4 points. They have used this threshold in a 2012 economic analysis of a study of GET and counseling for patients with chronic fatigue. Although Trudie Chalder en Simon Wessely, the creators of the CFQ, were not listed as authors on the letter, they were part of the study that Ridsdale et al. were defending. So I think it’s reasonable to assume that they were included in the consensus view Ridsdale et al. referred to. The full quote is as follows: “The researchers in this trial include several of those involved in developing and testing the instrument. Our consensus view was that a difference of less than four, using a Likert scale, is not important.”

Finally there’s the PACE trial (of which Chalder is an author) where a distribution approach was used. They took half the standard deviation and came up with a MID of 2. There were some problems with this approach though. The CFQ was used in the selection criteria of the sample so the SD was probably reduced. I think the standard deviation of the 33-point CFQ is usually somewhere between 5 and 6 points, so a distribution estimate of the CFQ would come up with a MID of 2.5-3 if done correctly. It also seems that the MID used in the PACE trial was a post-hoc measure. Originally the authors were going to use the 11-point version of the CFQ and regard a 50% reduction or a score lower than 3 as a ‘positive outcome’. The baseline score on the 33-point version was approximately 28 points. Even if we take 11 points as the bottom of the scale, a 50% reduction would still mean a reduction of 8,5 points on the CFQ. Even though they planned to use the 11 point version, I think it’s clear they initially had a larger point reduction in mind to measure improvement.

In conclusion: the MID estimate used by Larun et al. is not way off from other estimates, although none of these have asked patients what they would consider a MID. The 3.4 point difference found in the meta-analysis is very small, close to the MID and even lower than some estimates.

View attachment 8766

Physical functioning

I think the conclusion is similar for the SF-36 physical function (scale 0-100), where the authors have used a MID of 7 points. It’s a bit on the low side but perhaps not unreasonable. The PACE trial used a MID of 8 points for SF-36 physical function, which was half the standard (SD) deviation in their sample. But as they used the SF-36 as a selection criterion their SD was lower than in an unselected sample. In the observational study by Crawley et al. the SD for physical function at baseline was 22.7 which would result in a MID of 11.35. The GETSET protocol also used a MID of 8 points.

Crawley did a study on the MID for the SF-36 on adolescent CFS patients, just last year (Bridgen et al. 2018). They came up with a MID of 10 points. This actually looks like a decent study, as they used both the distribution, anchoring and qualitative method. They took “qualitative interviews” with 21 young CFS patients and their parents to see what they would find a MID. A MID of 10 was used in the FITNET-NHS protocol and the Lightning process economic analysis.

Estimates from other patient groups are in the same range. Wyrwich et al. 2005 used a Delphi technique, a process where experts come together to reach a consensus on something. The result was a MID of 10 points on the SF-36 physical function scale for patients with heart or lung disease. But two years later Wyrwich et al. used the anchoring method on patients and GP’s and they came up with lower estimates. So that’s one of the papers Larun et al. refer to for using a MID of 7. The other reference they list is a 2014 study (Ward et al. 2014) on patients with rheumatoid arthritis that used the anchoring method to come to a MID of 7.1 points. I’ve found a study on Idiopathic Pulmonary Fibrosis (Witt et al. 2019) that found higher estimates (range 10.1-22.2) but there’s another study on idiopathic pulmonary fibrosis that came up with a much lower MID of only 3 points (Swigris et al. 2010). A study on prostate cancer survivors (Jayadevappaet al. 2012) came up with a MID of 7 points.

View attachment 8767

In conclusion: the post-treatment point difference found for the SF-36 physical functioning in the Cochrane meta-analysis was 13.1. But I think that’s a distorted figure because of the outlier of Powell et al. which reported an implausible difference of 31.75 points. If this study is excluded the mean difference drops to 7.37 which is close the MID used and even lower than some MID estimates.

References

Goligher et al. (2008). Minimal clinically important difference for 7 measures of fatigue in patients with systemic lupus erythematosus.

Pouchot et al. (2008). Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis.

Pettersson et al. (2015). Determination of the minimal clinically important difference for seven measures of fatigue in Swedish patients with systemic lupus erythematosus.

Nøstdahl et al. (2018). Defining the cut-off point of clinically significant postoperative fatigue in three common fatigue scales.

Ridsdale L, et al. (2001). Chronic Fatigue in general practice: authors reply.

Bridgen et al. (2018). Defining the minimally clinically important difference of the SF-36 physical function subscale for paediatric CFS/ME: triangulation using three different methods.

Wyrwich et al. (2005). A comparison of clinically important differences in health-related quality of life for patients with chronic lung disease, asthma, or heart disease.

Wyrwich et al. (2007). A comparison of clinically important differences in health-related quality of life for patients with chronic lung disease, asthma, or heart disease.

Ward et al. (2014). Clinically important changes in short form 36 health survey scales for use in rheumatoid arthritis clinical trials: the impact of low responsiveness.

Witt et al. (2019). Psychometric properties and minimal important differences of SF-36 in Idiopathic Pulmonary Fibrosis.

Swigris et al. (2014). The SF-36 and SGRQ: validity and first look at minimum important differences in IPF.

Jayadevappa et al. (2012). Comparison of distribution- and anchor-based approaches to infer changes in health-related quality of life of prostate cancer survivors.

Regarding the SF-36, it is sometimes scored on a normalized score, with a lower range of scores. I suspect the score of 3 may come from such a scoring method. It is possible other scores did too. I recall normalized scoring was used in a Rituxamib study from around 2011 though it wasn’t clear from the paper (the authors confirmed it following a question).

Dolphin · Oct 15, 2019

This study showed that the Chalder Fatigue Scale was not good at differentiating between
different levels of severity of ME.

Fatigue in Myalgic Encephalomyelitis

Ellen M Goudsmit1*, Bart Stouten2, Sandra Howes3
http://iacfsme.org/ME-CFS-Primer-Education/Bulletins/2008/Fatigue-in-Myalgic-Encephalomyelitis.aspx

rvallee · Oct 15, 2019

Michiel Tack said:
So Wise & Brown (2005) say that "The minimal clinically important difference (MCID) for the 6MWT is conservatively estimated to be 54-80 meters." The difference between the improvements in meters walked in the GET group (67 meters) and SMC (22 meters) was 45 meters, so less than 54-80 meters.

The data for the 6-minute walking test is also no longer statistically significant if the data from the PACE trial is pooled with the data from Jason et al. 2007 (if my attempt at a meta-analysis is correct).

So I don't think there's a case for arguing that GET increases the walking ability of CFS patients.

This seems to be based on similar ideas in evaluating disability cases with the ability to walk 200m, as if it somehow meant that being able to do that once, regardless of consequences after the test, means no disability whatsoever, which is absurd. Evaluating the ability to walk for a few minutes to test whether GET has a significant impact because the researchers have decided it is significant, using mathemagic, even though it does not actually translate into actual usefulness. Similar to asking paraplegics to climb stairs on their butt and if possible that means they should be able to do it in everyday life.

So as long as those ideas remain in place, and we know they are applied to other chronic diseases, it can be argued that it is a relevant evaluation because it is commonplace. The entire field appears to be poisoned with artificial clinical significance that does not translate into real life ability. There doesn't seem to be any concept of a difference between doing something once or twice and doing it repeatedly for months or years, an extension of the thinking that acute symptoms and chronic symptoms are completely different.

What a horrible mess. Completely inhumane thinking.

Graham · Oct 15, 2019

I carried out a survey among people I knew, friends of friends, and other contacts who had ME and asked them to fill in the Chalder Fatigue Questionnaire, also using Goudsmit's mild/moderate/severe category. Here's a graph of the results. It takes a bit of thinking, but there are three different types of dots for each severity, and the dot is placed according to its bimodal and its Likert score.

For example, you will see one mild case scoring 11 on bimodal and 26 on Likert (Continuous), or you will see one severe scoring 7 bimodal, 22 Likert. (You may need to zoom in, especially if your eyesight is fading with age!)

You can see that there is no consistency.

The white and yellow rectangles show all the possible scores for each bimodal score: the white rectangles indicate that the patient MUST have marked one of the answers as "better than when I was well".

The black vertical line between 4 and 3 represents the original target for "recovery", the black horizontal line the later amended target for recovery, and the green vertical line represents the score above which patients were declared as no longer having CFS even if the doctor diagnosed them as still having it.

I made that years ago, when I was young, fresh and full of energy. Well, it's all comparative.

Barry · Oct 15, 2019

Trish said:
I am left, however, with my own experience of trying to fill in the CFQ and SF-36 physical functioning questionnaires, and seeing that I can, on the same day, and at the same time, get scores with a range way larger than the so called minimal clinically significant changes found with all these supposedly scientific methods, just by taking a slightly different mind set to the questions.

Basically the readings are susceptible to a lot of 'noise' and the the notional minimal clinical difference values are completely lost within the noise, so are meaningless. So much so as to be completely misleading.

rvallee · Oct 15, 2019

All this relative improvement does not mean much without an absolute target. "Improving" from 15 to 40 on the SF-36 would seem highly significant and yet 40 is a severe level of disability. It's like having $5 deposited in a bank account that has a balance of $10. 50% improvement here still won't feed anyone for the month.

This is all so amateurish, as if common sense was thrown out the window and not be allowed to spoil the illusion. As if they assume the "improvement" is a trajectory, something that will keep going, rather than an illusion of improvement that is not sustainable because they focus on fatigue, rather than what actually matters: fatigability.

If physicists were allowed to use such nonsensical evidence there would be claims of perpetual motion machines published multiple times a day. It's perpetual if you stop at some arbitrary point and discount the need for fuel. And if you don't actually measure the power output and simply use a guesstimate scale of less power, some power, more power. Complete nonsense.

rvallee · Oct 15, 2019

Graham said:
I carried out a survey among people I knew, friends of friends, and other contacts who had ME and asked them to fill in the Chalder Fatigue Questionnaire, also using Goudsmit's mild/moderate/severe category. Here's a graph of the results. It takes a bit of thinking, but there are three different types of dots for each severity, and the dot is placed according to its bimodal and its Likert score.

For example, you will see one mild case scoring 11 on bimodal and 26 on Likert (Continuous), or you will see one severe scoring 7 bimodal, 22 Likert. (You may need to zoom in, especially if your eyesight is fading with age!)

You can see that there is no consistency.

The white and yellow rectangles show all the possible scores for each bimodal score: the white rectangles indicate that the patient MUST have marked one of the answers as "better than when I was well".

The black vertical line between 4 and 3 represents the original target for "recovery", the black horizontal line the later amended target for recovery, and the green vertical line represents the score above which patients were declared as no longer having CFS even if the doctor diagnosed them as still having it.

I made that years ago, when I was young, fresh and full of energy. Well, it's all comparative.

View attachment 8769

Ah but you are missing the magical ingredient: did you spend several months trying to convince them to think of themselves as better than they are and telling them not improving is their own fault? That's how the pros do it. Common mistake.

Dolphin · Oct 15, 2019

Michiel Tack said:
9) Minimally important differences
I’ve been reading up on the issue of minimally important differences (MID), the smallest difference that patients are likely to consider important. The authors of the Cochrane review have used MID to suggest that the treatments effects they found are clinically relevant.

There are basically three methods to estimate MIDs. There’s the distribution method which estimates MIDS based on easily available figures such as the standard deviation. There’s the anchoring method where scores on a questionnaire are compared to an ‘anchor’, another measure whose score is easily interpretable. Thirdly, there’s a quantitative measure where patients, clinicians or experts are interviewed directly on what they think the MID is.

Fatigue
For the 33-point Chalder Fatigue Scale (CFQ) Larun et al. refer to a lupus study (Goligher et al. 2008) that found a MID of 2.3 points. It used the anchoring method that went along these lines. First lupus patients filled in the CFQ. Afterwards they had a 10 minute discussing with another patient about their fatigue. They then had to estimate how the other patient’s fatigue relates to theirs on a 7-item response scale going from: “Much more fatigue,” “Somewhat more fatigue,” “A little bit more fatigue,” “About the same fatigue,” “A little bit less fatigue,” “Somewhat less fatigue,” to “Much less fatigue.” This way the authors could estimate scores on the CFQ correspond to the responses given on the 7-item scale. They can then estimate the MID by looking at the difference in CFQ scores between “About the same fatigue,” and the other responses and using regression analysis. Two other studies have used the same approach to estimate MID for the CFQ, one also in lupus patients, the other in patients with rheumatoid arthritis. Their estimates are similar, in the range 1.4-3.3 (It’s difficult to abstract the right data from the paper but I think these are the relevant figures),

There are however some problems with this approach. If a questionnaire is insensitive to severity of fatigue (for example if it does not give a strong difference in scoring between someone who is mildly fatigued and someone who is severely fatigued) then the scores corresponding to a difference between responses “About the same fatigue” and “A little bit less fatigue” will look small. Consequently, the MID will also be small. I think that might be what happened with the CFQ because in the three studies MID were calculated for multiple questionnaires and in all three cases the standardized MID (the MID divided by the standard deviation) was the lowest for the CFQ. If one looks at the graphs that give the mean CFQ scores for the 7 responses, one can see that there’s quite a lot of overlap. Sometimes the score difference for “About the same fatigue,” is higher than for “A little bit less fatigue”. So the method is certainly not perfect (I suspect the data from these studies is interesting for those wanting to analyse the CFQ).

I think Nosthal et al. 2018 used a better approach. They handed out the CFQ to persons who were about to have a surgery and they let them fill in the questionnaire at different time points after the surgery until their recovery. Each time the researchers added the question: “Given your current description of fatigue; would you say it has been of considerable significance to you?; Yes/No”. This way they could estimate which CFQ score corresponds to a yes to this question. On average this was a score of 15.1. Because the baseline score was 11.7, I think the study indicates that an increase of 3.4 was the minimal significant difference.

Then there’s the letter by Ridsdale et al. 2001., where they proposed a MID of 4 points. They have used this threshold in a 2012 economic analysis of a study of GET and counseling for patients with chronic fatigue. Although Trudie Chalder en Simon Wessely, the creators of the CFQ, were not listed as authors on the letter, they were part of the study that Ridsdale et al. were defending. So I think it’s reasonable to assume that they were included in the consensus view Ridsdale et al. referred to. The full quote is as follows: “The researchers in this trial include several of those involved in developing and testing the instrument. Our consensus view was that a difference of less than four, using a Likert scale, is not important.”

Finally there’s the PACE trial (of which Chalder is an author) where a distribution approach was used. They took half the standard deviation and came up with a MID of 2. There were some problems with this approach though. The CFQ was used in the selection criteria of the sample so the SD was probably reduced. I think the standard deviation of the 33-point CFQ is usually somewhere between 5 and 6 points, so a distribution estimate of the CFQ would come up with a MID of 2.5-3 if done correctly. It also seems that the MID used in the PACE trial was a post-hoc measure. Originally the authors were going to use the 11-point version of the CFQ and regard a 50% reduction or a score lower than 3 as a ‘positive outcome’. The baseline score on the 33-point version was approximately 28 points. Even if we take 11 points as the bottom of the scale, a 50% reduction would still mean a reduction of 8,5 points on the CFQ. Even though they planned to use the 11 point version, I think it’s clear they initially had a larger point reduction in mind to measure improvement.

In conclusion: the MID estimate used by Larun et al. is not way off from other estimates, although none of these have asked patients what they would consider a MID. The 3.4 point difference found in the meta-analysis is very small, close to the MID and even lower than some estimates.

View attachment 8766

Physical functioning

I think the conclusion is similar for the SF-36 physical function (scale 0-100), where the authors have used a MID of 7 points. It’s a bit on the low side but perhaps not unreasonable. The PACE trial used a MID of 8 points for SF-36 physical function, which was half the standard (SD) deviation in their sample. But as they used the SF-36 as a selection criterion their SD was lower than in an unselected sample. In the observational study by Crawley et al. the SD for physical function at baseline was 22.7 which would result in a MID of 11.35. The GETSET protocol also used a MID of 8 points.

Crawley did a study on the MID for the SF-36 on adolescent CFS patients, just last year (Bridgen et al. 2018). They came up with a MID of 10 points. This actually looks like a decent study, as they used both the distribution, anchoring and qualitative method. They took “qualitative interviews” with 21 young CFS patients and their parents to see what they would find a MID. A MID of 10 was used in the FITNET-NHS protocol and the Lightning process economic analysis.

Estimates from other patient groups are in the same range. Wyrwich et al. 2005 used a Delphi technique, a process where experts come together to reach a consensus on something. The result was a MID of 10 points on the SF-36 physical function scale for patients with heart or lung disease. But two years later Wyrwich et al. used the anchoring method on patients and GP’s and they came up with lower estimates. So that’s one of the papers Larun et al. refer to for using a MID of 7. The other reference they list is a 2014 study (Ward et al. 2014) on patients with rheumatoid arthritis that used the anchoring method to come to a MID of 7.1 points. I’ve found a study on Idiopathic Pulmonary Fibrosis (Witt et al. 2019) that found higher estimates (range 10.1-22.2) but there’s another study on idiopathic pulmonary fibrosis that came up with a much lower MID of only 3 points (Swigris et al. 2010). A study on prostate cancer survivors (Jayadevappaet al. 2012) came up with a MID of 7 points.

View attachment 8767

In conclusion: the post-treatment point difference found for the SF-36 physical functioning in the Cochrane meta-analysis was 13.1. But I think that’s a distorted figure because of the outlier of Powell et al. which reported an implausible difference of 31.75 points. If this study is excluded the mean difference drops to 7.37 which is close the MID used and even lower than some MID estimates.

References

Goligher et al. (2008). Minimal clinically important difference for 7 measures of fatigue in patients with systemic lupus erythematosus.

Pouchot et al. (2008). Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis.

Pettersson et al. (2015). Determination of the minimal clinically important difference for seven measures of fatigue in Swedish patients with systemic lupus erythematosus.

Nøstdahl et al. (2018). Defining the cut-off point of clinically significant postoperative fatigue in three common fatigue scales.

Ridsdale L, et al. (2001). Chronic Fatigue in general practice: authors reply.

Bridgen et al. (2018). Defining the minimally clinically important difference of the SF-36 physical function subscale for paediatric CFS/ME: triangulation using three different methods.

Wyrwich et al. (2005). A comparison of clinically important differences in health-related quality of life for patients with chronic lung disease, asthma, or heart disease.

Wyrwich et al. (2007). A comparison of clinically important differences in health-related quality of life for patients with chronic lung disease, asthma, or heart disease.

Ward et al. (2014). Clinically important changes in short form 36 health survey scales for use in rheumatoid arthritis clinical trials: the impact of low responsiveness.

Witt et al. (2019). Psychometric properties and minimal important differences of SF-36 in Idiopathic Pulmonary Fibrosis.

Swigris et al. (2014). The SF-36 and SGRQ: validity and first look at minimum important differences in IPF.

Jayadevappa et al. (2012). Comparison of distribution- and anchor-based approaches to infer changes in health-related quality of life of prostate cancer survivors.

I’m not sure whether you or anyone has said it already or not but an issue can be that the range of possible scores of fatigue for example could be larger with ME or CFS than with some other conditions from which thresholds are extracted. So illness A may only produce fatigue of the range 12-20 (say). Step changes in level of illness/fatigue for this illness could be smaller than ME or CFS where a much greater range of fatigue could be possible (say 16-33). I am being a bit lazy and haven’t looked at data from other illnesses before posting this.

Same issue could arise with SF-36 PF where low scores e.g. under 40 or maybe under 50 or 60 might not be common with many other conditions.

ME/CFS Science Blog · Oct 15, 2019

Dolphin said:
Regarding the SF-36, it is sometimes scored on a normalized score, with a lower range of scores. I suspect the score of 3 may come from such a scoring method. It is possible other scores did too. I recall normalized scoring was used in a Rituxamib study from around 2011 though it wasn’t clear from the paper (the authors confirmed it following a question).

Thanks for pointing this out! Yes, it was a normalized score. I didn't realize this could make a big difference to points on the scale.

The Ward et al. (2014) study that Larun et al. cite was also a normalized score. It would be interesting to recalculate it to the original scores.

I haven't found the exact instructions on how to do this. A Spanish study explained:

The SF-36 scores range from 0 to 100, with a higher score indicating better health status. In addition, normalized values can be estimated so that it can provide a reference value from the general population. To do so, each SF-36 component score first was standardized using the mean and standard deviations (SD) obtained from a Spanish population older than 45 years and then transformed to norm-based (mean Z 50, SD Z 10) scoring, as suggested by the authors of the questionnaire.

The Ward et al. (2014) study did the same but with the US population. Does anyone know how to recalculate the scores back to the original value?

I just noticed that the study that came up with a MID of 3 (normalized value) said that it's result was identical to that of Kosinksi et al. (2000) if it was recalculated as a normalized value. Kosinski reported a (non-normalized) MID of 7.7. So it does seem to make a significant difference: 7.7 versus 3 points.

ME/CFS Science Blog · Oct 15, 2019

Dolphin said:
I’m not sure whether you or anyone has said it already or not but an issue can be that the range of possible scores of fatigue for example could be larger with ME or CFS than with some other conditions from which thresholds are extracted. So illness A may only produce fatigue of the range 12-20 (say). Step changes in level of illness/fatigue for this illness could be smaller than ME or CFS where a much greater range of fatigue could be possible (say 16-33). I am being a bit lazy and haven’t looked at data from other illnesses before posting this.

Wouldn't that be visible as a smaller standard deviation (SD)? In the three anchoring studies the SD was 6.6, 5.2 and 5.8 so not particularly lower than in CFS samples, I guess.

To be honest, I also had problems interpreting the data in these three studies that used the anchoring method for the CFQ (Goligher et al. 2008, Pouchot et al. 2008., Pettersson et al. 2015). It's a bit complex. I would encourage others to take a look and not trust my judgement too much.

Dolphin · Oct 15, 2019

Michiel Tack said:
Wouldn't that be visible as a smaller standard deviation (SD)? In the three anchoring studies the SD was 6.6, 5.2 and 5.8 so not particularly lower than in CFS samples, I guess.

Okay, when I was thinking about that I had missed that that the anchoring studies also had that data and didn't solely rely on their responses to "change questions".

Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)