My feedback on content of CDEs (Fatigue) - Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) CDE Public Review

Tom Kindlon · Jan 21, 2018

I just submitted the following comments.

Thanks to @Sasha and @Graham for giving me feedback on an earlier draft.

I am aware of another draft submission that is being made on this instrument (I was too late to the discussion to give my comments) so decided to restrict myself to a few comments rather than giving a thorough review.

I hope other people will submit comments on particular categories or questionnaires. All the information is here: https://commondataelements.ninds.nih.gov/MECFS.aspx#tab=Data_Standards

I don't believe it is necessary to cite papers to make the majority of points, in this case I just was aware of data. In many other cases there may not be appropriate data. I'd also don't think you need to be attached to any one group to make a submission.

I'm not sure if I will be making any other submissions due to my workload and health.

---------

Name of Reviewer/Institution: Tom Kindlon, Irish ME/CFS Association

CDE, Case Report Form or Measure: Chalder fatigue questionnaire (Fatigue)

Suggested Change: Don't use the Chalder fatigue questionnaire

Rationale:

The Chalder Fatigue questionnaire has two separate scoring systems, bimodal (0-11) and Likert (0-33) [1]. Some of the issues raised below are more significant with one system rather than the other.

(i) Doubts about the validity of two of the items in the questionnaire as means to measure fatigue:

The item “Do you have problems starting things” seems as though it could relate more to motivation or some other issue rather than fatigue specifically.

The item “Do you feel sleepy or drowsy” relates more to sleepiness than fatigue. Sleepiness and fatigue are not necessarily the same thing [2].

Most studies that used the Chalder fatigue scale do not give details of scores on individual items but one study [3] reported the following in participants with ME: “Focusing on the individual items revealed that 86.8% of the questions making up the physical fatigue subscale received near maximal or maximum scores. The items which received the greatest number of low scores were question 3 (‘do you feel sleepy or drowsy’) and question 4 (‘do you have problems starting things’).”

(ii) Ceiling effects are a significant issue when the Chalder fatigue questionnaire is used with patients with ME and CFS, particularly with bimodal scoring:

A study of those with ME [3] found that “Fifty per cent of the patients recorded the maximum score using the bimodal method and 77% recorded the two highest scores [i.e. either 10 or 11].” In the FINE and PACE trials, 76% (147/193) and 65% (417/640) respectively of CFS participants reported the highest score [11] at baseline using bimodal scoring [4,5].

With regards to Likert scoring, a study of those with ME found that there was some evidence of a ceiling effect in those who were severely affected (more details were not reported but the average score for those severely affected was 30.55 (SD: 2.66)). In the FINE and PACE trials 29.1% (57/196) and 14.5% (93/640) of the participants with CFS respectively scored the maximum score of 33 at baseline.

There is also a 14-item version of the instrument with three extra items. A study of 136 individuals with CFS looking at Likert scoring found there was near-maximal scoring on 6 of the 8 physical fatigue items [6].

The authors of the ME study [3] noted with regards to bimodal scoring that there was a “marked overlap between those who rated themselves as moderately or severely ill. These findings are indications of a low ceiling.” This could lead to the questionnaire failing to detect patients moving from being severely to moderately affected and vice versa.

Furthermore, if patients are already at a ceiling score at the start of the intervention, the questionnaire cannot detect their getting worse. This could mean that evidence of harm would not be recorded. Also, this phenomenon could affect measures of efficacy: if a certain percentage of patients improved and the same percentage worsened to a similar level, this could show up as an average improvement because the scores for those who got worse would not change if they were already at the ceiling level.

This could also make interventions that caused a significant number of deteriorations seem better than those that caused fewer. For example, consider a scenario in which one intervention caused a certain percentage of patients to improve while the same percentage, who began at the maximum score, worsened by the same amount. If another intervention caused half the number of patients to both improve and worsen, the average numerical improvement for the first intervention would be twice that of the second, even though rationally the scores should be the same.

(iii) Discussion of the ability of respondents to mark symptoms as occurring “less than usual”:

The fact that participants can rate their fatigue symptoms as occurring “less than usual” can lead to some odd results with Likert scoring of the Chalder scale (it is not an issue with its bimodal scoring). People who have no fatigue problems should generally score 11/33, indicating that they had problems ‘no more than usual’. And, indeed, a study in Norway found that those in the category “No disease/current health problem” had a mean score of 11.2 [7].

However, a study found that people with "multiple sclerosis fatigue" after an intervention reported an average fatigue score of 7.80 – that is, lower than 11; this score also showed lower fatigue than that of a healthy, nonfatigued comparison group in the study [8]. It is very unlikely to be true that patients with multiple sclerosis fatigue at baseline ended the study with lower fatigue than healthy people. Scores of less than 11 were also reported by those with CFS in the FINE and PACE trial [4,5].

I will explore further now how pooling the scores of people who give scores of less than 11 with other scores can give odd results. Say 75% of participants gave a Likert score of 4 and 25% gave a score of 24. This would be an average score of 9 which is a better score than the score of 11 that healthy people report. However, it is likely that people who scored 4 on the scale were confused by the peculiar option on the Chalder questionnaire that allows them to rate themselves as having fewer problems with fatigue than when they were last well (choosing that option is the only way to get a score below 11). If they really meant to say that they had no more fatigue than when they were last well, then their score should really have been similar to that of the average healthy person, at 11.2. Substituting this score instead of 4 in this example would give an average score for the group of 14.4, a worse score than what healthy people score. The latter is, I believe, a better representation of what the average fatigue score for the group would be: that is, if a significant percentage still had significant fatigue, than the overall fatigue level should be worse on average than a healthy group, not better. This shows that the ability to have better scores than healthy people doesn’t just affect the validity of individual scores, it also affects the validity of overall mean scores.

References:

1. Chalder T, Berelowitz G, Pawlikowska T, Watts L, Wessely S, Wright D, Wallace EP: Development of a Fatigue Scale. J Psychosom Res 1993, 37:147-153.

2. Neu D, Hoffmann G, Moutrier R, Verbanck P, Linkowski P, Le Bon O. Are patients with chronic fatigue syndrome just 'tired' or also 'sleepy'? J Sleep Res. 2008 Dec;17(4):427-31. doi: 10.1111/j.1365-2869.2008.00679.x.

3. Goudsmit, EM., Stouten, B and Howes, S. Fatigue in myalgic encephalomyelitis. Bulletin of the IACFS/ME, 2008, 16, 3, 3-10. https://web.archive.org/web/20140719090603/http://www.iacfsme.org/BULLETINFALL2008/Fall08GoudsmitFatigueinMyalgicEnceph/tabid/292/Default.aspx

4. Goldsmith LP, Dunn G, Bentall RP, Lewis SW, Wearden AJ. Correction: Therapist Effects and the Impact of Early Therapeutic Alliance on Symptomatic Outcome in Chronic Fatigue Syndrome. PLoS One. 2016 Jun 1;11(6):e0157199. doi: 10.1371/journal.pone.0157199. eCollection 2016. https://doi.org/10.1371/journal.pone.0157199.s001 (CSV form: https://www.mediafire.com/file/rvh3brmgoaznude/Goldsmith+2015+FINE+data+journal.csv )

5. A dataset from the PACE trial. Released following a freedom of information request.

https://sites.google.com/site/pacefoir/pace-ipd_foia-qmul-2014-f73.xlsx Readme file: https://sites.google.com/site/pacefoir/pace-ipd-readme.txt.

6. Morriss RK, Wearden AJ, Mullis R. Exploring the validity of the Chalder Fatigue scale in chronic fatigue syndrome. J Psychosom Res. 1998 Nov;45(5):411-7.

7. Loge JH, Ekeberg O, Kaasa S. Fatigue in the general Norwegian population: normative data and associations. J Psychosom Res 1998; 45: 53-65. CrossRef | PubMed

8. Van Kessel K, Moss-Morris R, Willoughby, Chalder T, Johnson MH, Robinson E, A randomized controlled trial of cognitive behavior therapy for multiple sclerosis fatigue, Psychosom. Med. 2008; 70:205-213.

Esther12 · Jan 21, 2018

Thanks a lot for doing this, and to Sasha and Graham for helping you with it.

guest001 · Jan 21, 2018

Tom Kindlon said:
I just submitted the following comments.

Thanks to @Sasha and @Graham for giving me feedback on an earlier draft.

I am aware of another draft submission that is being made on this instrument (I was too late to the discussion to give my comments) so decided to restrict myself to a few comments rather than giving a thorough review.

I hope other people will submit comments on particular categories or questionnaires. All the information is here: https://commondataelements.ninds.nih.gov/MECFS.aspx#tab=Data_Standards

I don't believe it is necessary to cite papers to make the majority of points, in this case I just was aware of data. In many other cases there may not be appropriate data. I'd also don't think you need to be attached to any one group to make a submission.

I'm not sure if I will be making any other submissions due to my workload and health.

---------

Name of Reviewer/Institution: Tom Kindlon, Irish ME/CFS Association

CDE, Case Report Form or Measure: Chalder fatigue questionnaire (Fatigue)

Suggested Change: Don't use the Chalder fatigue questionnaire

Rationale:

The Chalder Fatigue questionnaire has two separate scoring systems, bimodal (0-11) and Likert (0-33) [1]. Some of the issues raised below are more significant with one system rather than the other.

(i) Doubts about the validity of two of the items in the questionnaire as means to measure fatigue:

The item “Do you have problems starting things” seems as though it could relate more to motivation or some other issue rather than fatigue specifically.

The item “Do you feel sleepy or drowsy” relates more to sleepiness than fatigue. Sleepiness and fatigue are not necessarily the same thing [2].

Most studies that used the Chalder fatigue scale do not give details of scores on individual items but one study [3] reported the following in participants with ME: “Focusing on the individual items revealed that 86.8% of the questions making up the physical fatigue subscale received near maximal or maximum scores. The items which received the greatest number of low scores were question 3 (‘do you feel sleepy or drowsy’) and question 4 (‘do you have problems starting things’).”

(ii) Ceiling effects are a significant issue when the Chalder fatigue questionnaire is used with patients with ME and CFS, particularly with bimodal scoring:

A study of those with ME [3] found that “Fifty per cent of the patients recorded the maximum score using the bimodal method and 77% recorded the two highest scores [i.e. either 10 or 11].” In the FINE and PACE trials, 76% (147/193) and 65% (417/640) respectively of CFS participants reported the highest score [11] at baseline using bimodal scoring [4,5].

With regards to Likert scoring, a study of those with ME found that there was some evidence of a ceiling effect in those who were severely affected (more details were not reported but the average score for those severely affected was 30.55 (SD: 2.66)). In the FINE and PACE trials 29.1% (57/196) and 14.5% (93/640) of the participants with CFS respectively scored the maximum score of 33 at baseline.

There is also a 14-item version of the instrument with three extra items. A study of 136 individuals with CFS looking at Likert scoring found there was near-maximal scoring on 6 of the 8 physical fatigue items [6].

The authors of the ME study [3] noted with regards to bimodal scoring that there was a “marked overlap between those who rated themselves as moderately or severely ill. These findings are indications of a low ceiling.” This could lead to the questionnaire failing to detect patients moving from being severely to moderately affected and vice versa.

Furthermore, if patients are already at a ceiling score at the start of the intervention, the questionnaire cannot detect their getting worse. This could mean that evidence of harm would not be recorded. Also, this phenomenon could affect measures of efficacy: if a certain percentage of patients improved and the same percentage worsened to a similar level, this could show up as an average improvement because the scores for those who got worse would not change if they were already at the ceiling level.

This could also make interventions that caused a significant number of deteriorations seem better than those that caused fewer. For example, consider a scenario in which one intervention caused a certain percentage of patients to improve while the same percentage, who began at the maximum score, worsened by the same amount. If another intervention caused half the number of patients to both improve and worsen, the average numerical improvement for the first intervention would be twice that of the second, even though rationally the scores should be the same.

(iii) Discussion of the ability of respondents to mark symptoms as occurring “less than usual”:

The fact that participants can rate their fatigue symptoms as occurring “less than usual” can lead to some odd results with Likert scoring of the Chalder scale (it is not an issue with its bimodal scoring). People who have no fatigue problems should generally score 11/33, indicating that they had problems ‘no more than usual’. And, indeed, a study in Norway found that those in the category “No disease/current health problem” had a mean score of 11.2 [7].

However, a study found that people with "multiple sclerosis fatigue" after an intervention reported an average fatigue score of 7.80 – that is, lower than 11; this score also showed lower fatigue than that of a healthy, nonfatigued comparison group in the study [8]. It is very unlikely to be true that patients with multiple sclerosis fatigue at baseline ended the study with lower fatigue than healthy people. Scores of less than 11 were also reported by those with CFS in the FINE and PACE trial [4,5].

I will explore further now how pooling the scores of people who give scores of less than 11 with other scores can give odd results. Say 75% of participants gave a Likert score of 4 and 25% gave a score of 24. This would be an average score of 9 which is a better score than the score of 11 that healthy people report. However, it is likely that people who scored 4 on the scale were confused by the peculiar option on the Chalder questionnaire that allows them to rate themselves as having fewer problems with fatigue than when they were last well (choosing that option is the only way to get a score below 11). If they really meant to say that they had no more fatigue than when they were last well, then their score should really have been similar to that of the average healthy person, at 11.2. Substituting this score instead of 4 in this example would give an average score for the group of 14.4, a worse score than what healthy people score. The latter is, I believe, a better representation of what the average fatigue score for the group would be: that is, if a significant percentage still had significant fatigue, than the overall fatigue level should be worse on average than a healthy group, not better. This shows that the ability to have better scores than healthy people doesn’t just affect the validity of individual scores, it also affects the validity of overall mean scores.

References:

1. Chalder T, Berelowitz G, Pawlikowska T, Watts L, Wessely S, Wright D, Wallace EP: Development of a Fatigue Scale. J Psychosom Res 1993, 37:147-153.

2. Neu D, Hoffmann G, Moutrier R, Verbanck P, Linkowski P, Le Bon O. Are patients with chronic fatigue syndrome just 'tired' or also 'sleepy'? J Sleep Res. 2008 Dec;17(4):427-31. doi: 10.1111/j.1365-2869.2008.00679.x.

3. Goudsmit, EM., Stouten, B and Howes, S. Fatigue in myalgic encephalomyelitis. Bulletin of the IACFS/ME, 2008, 16, 3, 3-10. https://web.archive.org/web/20140719090603/http://www.iacfsme.org/BULLETINFALL2008/Fall08GoudsmitFatigueinMyalgicEnceph/tabid/292/Default.aspx

4. Goldsmith LP, Dunn G, Bentall RP, Lewis SW, Wearden AJ. Correction: Therapist Effects and the Impact of Early Therapeutic Alliance on Symptomatic Outcome in Chronic Fatigue Syndrome. PLoS One. 2016 Jun 1;11(6):e0157199. doi: 10.1371/journal.pone.0157199. eCollection 2016. https://doi.org/10.1371/journal.pone.0157199.s001 (CSV form: https://www.mediafire.com/file/rvh3brmgoaznude/Goldsmith+2015+FINE+data+journal.csv )

5. A dataset from the PACE trial. Released following a freedom of information request.

https://sites.google.com/site/pacefoir/pace-ipd_foia-qmul-2014-f73.xlsx Readme file: https://sites.google.com/site/pacefoir/pace-ipd-readme.txt.

6. Morriss RK, Wearden AJ, Mullis R. Exploring the validity of the Chalder Fatigue scale in chronic fatigue syndrome. J Psychosom Res. 1998 Nov;45(5):411-7.

7. Loge JH, Ekeberg O, Kaasa S. Fatigue in the general Norwegian population: normative data and associations. J Psychosom Res 1998; 45: 53-65. CrossRef | PubMed

8. Van Kessel K, Moss-Morris R, Willoughby, Chalder T, Johnson MH, Robinson E, A randomized controlled trial of cognitive behavior therapy for multiple sclerosis fatigue, Psychosom. Med. 2008; 70:205-213.

Excellent points. As someone who has filled out the Chalder fatigue scale and SF36 Physical Function Scale purely for my own interest as a background to considering the use of either or both of these methods of measuring 'fatigue' / physical function and its improvement or otherwise, I was obviously struck by my bimodal score on the Chalder Fatigue Scale as being 11/11 and therefore had nowhere to move had my situation become worse. Actually my situation since perusing this scale has become worse by virtue of my downward trend (nothing to do with any interventions per se) but of course there is no way to record this. As Tom notes, patients with this 'couldn't be any worse score' might also show up very differently if measured in other ways. My SF36 PFS was 20/100. Obviously that isn't great either but there is 'room' for it to change downward as well as upward, BUT 'moderates' and 'severe' could well be overlapping using Chalder. I'm not as ill as some but also much more ill than others. But the Chalder Fatigue Scale is unable to reflect those nuances in many respondents. Thus it clearly isn't fit for purpose.

Alice · Jan 22, 2018

Lilpink said:
Excellent points. As someone who has filled out the Chalder fatigue scale and SF36 Physical Function Scale purely for my own interest as a background to considering the use of either or both of these methods of measuring 'fatigue' / physical function and its improvement or otherwise, I was obviously struck by my bimodal score on the Chalder Fatigue Scale as being 11/11 and therefore had nowhere to move had my situation become worse. Actually my situation since perusing this scale has become worse by virtue of my downward trend (nothing to do with any interventions per se) but of course there is no way to record this. As Tom notes, patients with this 'couldn't be any worse score' might also show up very differently if measured in other ways. My SF36 PFS was 20/100. Obviously that isn't great either but there is 'room' for it to change downward as well as upward, BUT 'moderates' and 'severe' could well be overlapping using Chalder. I'm not as ill as some but also much more ill than others. But the Chalder Fatigue Scale is unable to reflect those nuances in many respondents. Thus it clearly isn't fit for purpose.

I dont find fatigue an effective way of measuring my health other symptoms are much easier to rate and monitor as is my morning restin heart rate

Esther12 · Jan 22, 2018

I wonder if it would be good having other people write to say that they have seen Kindlon's submission, and fully support his concerns about the Chalder fatigue scale?

It looks like one just needs to click on the 'feedback' tab here: https://commondataelements.ninds.nih.gov/MECFS.aspx#tab=Feedback_and_Suggestions

guest001 · Jan 22, 2018

Alice said:
I dont find fatigue an effective way of measuring my health other symptoms are much easier to rate and monitor as is my morning restin heart rate

I agree to a point..whereas my debility within the construct of what is laughingly described as 'fatigue' has increased substantially with time, it is one of the least pressing of my symptoms which, had I a faerie wand and limited 'wishes', I would magic away in the first instance. Mind you how 'measurable', other than on a subjective scale of 0 to 10, some of the worst of my symptoms would be I'm not sure. My pain/allergy mix (as described by Anne Ortegren who recently took her own life) can't be contained within such a flimsy construct as a 'score' anyway..in fact until I read Anne's description I had thought that maybe no one else on this planet had experienced the same thing. The trouble with this disease is that it is so grotesque in often un-measurable ways that any blunt attempt to quantify it is beyond the wit of mere mortal calibration.

Invisible Woman · Jan 22, 2018

Alice said:
I dont find fatigue an effective way of measuring my health other symptoms are much easier to rate and monitor as is my morning restin heart rate

I wouldn't rate fatigue as a primary symptom anyway. I feel fatigued when I need to up my thyroid meds or when I'm anaemic. Otherwise I feel like I've been poisoned, not fatigue. There are other symptoms that are more significant to me.

Personally, while I find HR monitoring is useful, I would be reluctant to use it without caveats. There may be other factors affecting HR - undiagnosed or inadequately controlled thyroid issues or certain medications such as beta blockers for example.

lansbergen · Jan 23, 2018

Lilpink said:
The trouble with this disease is that it is so grotesque in often un-measurable ways that any blunt attempt to quantify it is beyond the wit of mere mortal calibration.

I agree

My feedback on content of CDEs (Fatigue) - Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) CDE Public Review

Tom Kindlon

Senior Member (Voting Rights)

Esther12

Senior Member (Voting Rights)

guest001

Guest

Alice

Established Member (Voting Rights)

Esther12

Senior Member (Voting Rights)

guest001

Guest

Invisible Woman

Senior Member (Voting Rights)

lansbergen

Senior Member (Voting Rights)