Preprint Development and psychometric evaluation of The Index of Myalgic Encephalomyelitis Symptoms TIMES Part I…, 2026, Horton, Tyson, Fleming, Gladwell

So the MEA have spend already incredibly limited funds raised by patients, on a long, impractical questionnaire, that used cherry-picked feedback, doesn't compare between patients or between patient responses, doesn't have a ceiling or a floor, merges mild with moderate and allows clinics to keep offering the same advice?

Yes.

And, I'm not a super fan of the DSQ Symptom Questionnaire, but it's difficult to see this as a big improvement of it. Here, the authors of TIMES explain why they thought the DSQ was not good enough:

They note that a lot of diagnostic criteria haven't been psychometrically validated. And then go on to note that the DSQ has been.

An exception is the DePaul Symptom Questionnaire (DSQ), a self-report measure of ME/CFS symptoms. The original version has 99 questions covering personal characteristics, history and symptoms11. Fifty-four symptoms are included which focus on the domains specified in one of the most widely used diagnostic criteria, the Clinical Canadian Criteria (CCC)12. The user is asked about the frequency and severity of each symptom over the previous six months on 5-point (0-4) Likert scales. These scores are then multiplied by 25 to create 100-point ‘intensity’ scale for each symptom. The ‘intensity’ is then averaged to create a composite symptom score. The original questionnaire has been modified over time such that original, expanded, brief, and paediatric versions are now available13. Furthermore, a subset of questions regarding post-exertional malaise (PEM, a cardinal symptom of ME/CFS) have been extracted to produce five and ten item PEM specific questionnaires14,15
.As part of a project to co-produce a clinical assessment toolkit for ME/CFS with people with ME/CFS and clinicians working in NHS specialist ME/CFS services, we reviewed the DSQ for inclusion in the toolkit as a measure of symptomology. Several limitations were identified by our ME/CFS and clinicians’ advisory groups. They noted overlap/repetition between some items; use of medical jargon and ‘American English’ that was difficult to understand in some instances. They also found references to exercise and exertion problematic (as their condition meant they could not exercise or exert themselves), and having questions about both severity and frequency of symptoms did not make sense in all items and made the questionnaire very lengthy. Finally, the timescale over which symptoms were assessed (the previous six months) did not reflect the diagnostic criteria used in the UK (previous three months)16.
Thus, we developed and evaluated a new assessment of ME/CFS symptoms focussing on the information needed for clinical assessment rather than diagnosis. This new assessment was called The Index of ME Symptoms (TIMES).


1. overlap/repetition
The DSQ covers 54 symptoms and, while it doesn't characterise PEM very well, it does try very hard. TIMES has 58 symptoms and I don't know what they did with PEM.

2. medical jargon and American English
There seems to be a lot of medical jargon in the scale names of the TIMES. In any case, it is not impossible to adjust the wording in tools to suit the audience - lots of scales are completely translated. This hardly seems like a big problem.

3. references to exercise and exertion problematic 'as their condition meant they could not exercise or exert themselves'
This is ridiculous. If you want to determine if people are doing less than they were (or are back to healthy levels of activity), it's completely valid to ask about exercise and exertion. And, for goodness sake, everyone who is living exerts in some way.

4. questions about severity and frequency did not make sense in all the items and made the questionnaire lengthy
The TIMES authors may have a point there, I'm not sure. They say that severity and frequency seemed to track together, and so they just chose one word, usually 'severity'. I don't think ticking both a severity box and a frequency box would take much longer than just making the one assessment about a symptom and having both does remove a bit of the ambiguity e.g. my PEM might be utterly awful, but perhaps I only had it for one day last month.

5. DSQ asks about previous 6 months.
The TIMES authors wanted something shorter than that, for use in the clinical setting where presumably people aren't in the clinics very long. Also because the NICE Guideline says symptoms only have to be present for 3 months for diagnosis. But, again, that is a tweak. It hardly necessitates a big process to make a new list of symptoms.

All up, and given that there are variations of the DSQ out there, including a brief version, I can't see why a big process was really needed to replace it. And, why not use FUNCAP to assess function rather than symptoms in a clinical setting? Or just use the 'so how have you been' question?
 
why not use FUNCAP
Exactly what I was thinking as I read your message.

Also, what's wrong with visible? (Which includes funcap)

You can use a free version, and personalise the symptoms you track so what's exactly relevant to you over time.

The NHS have no issue with recommending external sources, when it suits them, as they regularly give me long pages of web page links and I have been recommended mental health websites and apps, and also for pain management.

This seems wholly unnecessary.

I know, I'm not saying anything that hasn't been said already I'm just frustrated.
 
The other thing to keep in mind is that this is just one of, I think, five questionnaires this expensive project has created, the others intending to assess PEM and/or symptoms after exertion; functional capacity; usefulness of the clinic, and I've forgotten what else.

It's a massive set of data they expect clinics to waste pwME's energy filling in, and all of it specifying you should fill it in each time based on how you feel over the last month, with no severity benchmarks.

It's a total con, clearly intended as a way for clinics to go on faking efficacy. So easily subject to ceiling effect and therapist effect.

The whole package is called a clinical assessment toolkit. That tells us it's about adding up numbers to create scores, not about helping pwME.

If it was intended as a useful way for pwME to communicate their problems to clinicians it would be much briefer and not have any scoring system.

If it's for trial outcomes it needs to use clear benchmarks and incorporate objective data.
 
Surely the overall score will depend too much on the choice of which symptoms to list, for example if they were to list 6 separate sleep symptoms and only one OI symptom, then those with mildly disturbed sleep of various sorts will seem much sicker than those with OI that is totally disabling.
Yes and out of 58 questions only 1 asks about PEM. In comparison there are 9 questions about cognitive problems, 7 about pain, 6 about sleep, etc. So this does not seem in proportion.

It also only takes 1 symptom to be completely disabled. So counting how many symptoms a person has a based on arbitrary list of 58 questions, seems like a bad way to "summarise a person's overall level of symptom burden".

The questionnaire no longer asks about the severity of symptoms, just how often patients have a particular symptom.
 
Do they explain this in this paper? Could you or someone else quote the passage, can't seem to find it.
Oh, sorry, PEM is asssessed by the separate PASS (post activity symptom scale) questionnaire created as part of this multi PROMs project. I wouldn't expect that to necessarily be mentioned in this paper which seems to be all about the statistical shenanigans that allegedly make this questionnaire valid.
 
Yes.

And, I'm not a super fan of the DSQ Symptom Questionnaire, but it's difficult to see this as a big improvement of it. Here, the authors of TIMES explain why they thought the DSQ was not good enough:

They note that a lot of diagnostic criteria haven't been psychometrically validated. And then go on to note that the DSQ has been.






1. overlap/repetition
The DSQ covers 54 symptoms and, while it doesn't characterise PEM very well, it does try very hard. TIMES has 58 symptoms and I don't know what they did with PEM.

2. medical jargon and American English
There seems to be a lot of medical jargon in the scale names of the TIMES. In any case, it is not impossible to adjust the wording in tools to suit the audience - lots of scales are completely translated. This hardly seems like a big problem.

3. references to exercise and exertion problematic 'as their condition meant they could not exercise or exert themselves'
This is ridiculous. If you want to determine if people are doing less than they were (or are back to healthy levels of activity), it's completely valid to ask about exercise and exertion. And, for goodness sake, everyone who is living exerts in some way.

4. questions about severity and frequency did not make sense in all the items and made the questionnaire lengthy
The TIMES authors may have a point there, I'm not sure. They say that severity and frequency seemed to track together, and so they just chose one word, usually 'severity'. I don't think ticking both a severity box and a frequency box would take much longer than just making the one assessment about a symptom and having both does remove a bit of the ambiguity e.g. my PEM might be utterly awful, but perhaps I only had it for one day last month.

5. DSQ asks about previous 6 months.
The TIMES authors wanted something shorter than that, for use in the clinical setting where presumably people aren't in the clinics very long. Also because the NICE Guideline says symptoms only have to be present for 3 months for diagnosis. But, again, that is a tweak. It hardly necessitates a big process to make a new list of symptoms.

All up, and given that there are variations of the DSQ out there, including a brief version, I can't see why a big process was really needed to replace it. And, why not use FUNCAP to assess function rather than symptoms in a clinical setting? Or just use the 'so how have you been' question?
follow the money - this is for app development
 
All up, and given that there are variations of the DSQ out there, including a brief version, I can't see why a big process was really needed to replace it. And, why not use FUNCAP to assess function rather than symptoms in a clinical setting? Or just use the 'so how have you been' question?
If I go with how most people talk about it in the wild, about the most common use of a personal rating I have seen is a simple recovered %, and I can't see how any of those questionnaires add anything to it. Everything else is mostly noise, and every time I see it described in any way it always relates to PEM and overall ability to function. The limiting factor is symptoms, but the important rating is the ability to function.

It suffers from the same ceiling problem, and the fact that it's not a linear scale, since 20% functioning does not mean being able to do 20% of activities of daily living, but something closer to 5%. My 50% is not my old 50%, or even my 80 year-old father, same as to a poor person a stable budget is basically an upper middle class professional's borderline bankruptcy. Which is a common problem in economics, and they do a far better job of it despite struggling with it immensely.

But if we're going with how most people I have seen report their progress, it's never more elaborate than a simple "I am % recovered", and all the research I have seen over the years makes it clear that it's about 90% as useful as any of the 'psychometric' questionnaires used clinically or for research. FUNCAP is by far the best, but it still adds very little compared to me guesstimating that I am 50% recovered, and it's not as if anyone, clinician or researcher, knows what to do with the number of symptoms that I have or their severity.

So, ironically, an absolute CFQ, rather than relative to recent weeks, is probably the most useful tool. But it has to be absolute, or rather it has to be relative to prior 100% health, rather than compared to a more recent but sick value. It's making it relative to a recent ill comparison that makes CFQ worthless.

All of this is a good example of how more is not always better. All of this could be useful, and in research we definitely need to know about symptoms and their progression, but clinically, there isn't a single questionnaire that does any better than a simple 0-100% rating, same as there is not a single pain rating assessment that does any better than the old 0-10.

All this fake methodological process to 'validate' psychometrically is entirely worthless. It only ever creates the illusion of validity where there isn't any, all because the proper paint color code was applied within the right outlines. We pretty much have all the same criticism with the DSQ, so it has nothing to do with camps or anything like that, because you can bet your ass that we will be accused of being anti-psychology, or whatever. Good grief this is ridiculous.
 
It also only takes 1 symptom to be completely disabled. So counting how many symptoms a person has a based on arbitrary list of 58 questions, seems like a bad way to "summarise a person's overall level of symptom burden".
Yeah, this is really important and it screws everything up. Having 100 mild symptoms is usually fine. Having 1 severe symptom is a huge problem. This is all like adding pennies when there are million-dollar bills on the line. There is a complete disconnect with reality here.
 
One of the things that annoys me about this is that veneer of medical terms, presumably all contributing to the therapist presenting themselves as someone who knows how to fix things.


Another is that there is no PEM scale. Presumably you have to go into one of the scales to find, what, a single question about PEM? What has it been lumped together with? The paper notes that other symptom measures e.g. the DSQ thought that PEM was important to characterise, but none of the TIMES nine scales seems to cover it.


And my other annoyance is one that people have commented on already. It's useless for comparing between people. Someone's severe is someone else's mild. If you have to lie down after a meal and have bad stomach pains you might regard your problems with eating as severe. But that ignores the higher levels of hell that you have never encountered - e.g. things being so bad you have to be tube-fed.

And it's pretty hopeless for comparing symptoms at different times for the same person, because people's definitions of the severity of symptoms changes over time due to habituation. What was shocking to start with becomes normal. There's a ceiling effect, because, compared to being healthy, even mild CFS is severe. But, what do you do if you get worse and you have already ticked the 'severe' box?



So, if it's no use for rating your clientele by severity and it's no use for tracking changes in an individual over time, what exactly is it useful for?



Hmm. Perhaps it's mildly interesting to note if there is some change in the presence and absence of symptoms over the course of the illness? But it will create opportunities for researchers to trawl through the data to determine e.g. that women with a tendency to perfectionism are more likely to score highly on the sleep scale. ... and so therefore women with a tendency to perfectionism should be given instructions on sleep hygiene, and people who score highly on the sleep scale should be given CBT to combat perfectionism?

Surely, the presence or absence of PEM would be one of the key things to identify about an individual before thinking about a clinical response - and yet that doesn't warrant a separate scale. It is not at all clear what scale you would have to look in to find some information about PEM.

Honestly, I think my main problem is with the concept of these clinics, when there is no useful treatment and yet the therapists are so convinced that there is, and are so convinced that they have valuable knowledge to impart. So, I probably would suspect virtually any data collection tool of being put to some use that is not in the interests of the people contributing their data. There's nothing about this paper that makes me think that the people who decide to use this tool should be trusted with patient data.
It feels like all it can do is count symptoms and compare if a new one has started but not if it gets worse because even if all the symptoms go from ‘now and then’ to ‘half the time’ which means significantly disabled if it means ‘you can’t do x’ they’ve merged those - even aside from the habituation and not having anchored each of these to something in an attempt to get more objectives severity.

It feels like by focusing on ‘which symptoms’ to ‘create phenotypes’ but not whether anything gets worse or better that it is to be used to triage people to which items on the menu of untested therapies get thrown at them or which label gets given. But not to test whether said advice or therapies mean they get worse - which surely should be the one point of doing all this.

It isn’t ‘measuring’ it’s now claiming to be ‘typing’ by the looks. And/or potentially adding on new labels or aspects of you start getting gastro issues. Or sleep problems. Yet if it was a medical clinic it would be the detail of those problems that made a treatment decision not harmful eg like if you give someone anti-diarrhea or anti-constipation and this isn’t doing that other than the supposed free text if anyone has energy after 58 questions of either doing or marking.

And if it’s therapists they can’t prescribe and I have that horrible feeling it’ll be ‘for the CBT’ that’s being delivered against the guidelines ‘for symptoms’ which it was never intended as being as it was only supposed to be to eg deal with grief not the nonsense idea thinking different cures ibd.

And yes on this basis as people get more ill and are faced with 58 questions and no one going to do anything real about any of them then they’ll with each visit get more blasé about filling them all out and the claim will be improvement not ‘people stopped ticking everything because no one reads it anyway so it’s a waste of their energy’

This whole phenomenon could be actively tested even in a psychology experiment that was properly controlled on well people where they go in get gaslighted and it not read x times in a row and see how seriously they take using even their endless energy to fill out 58 questions no one will read. It’s not even me/cfs specific but an action provoked by the design and the therapists attitude they are taught that particularly will harm a group of patients exhausted with no spare energy who are going through all sorts.
 
Supplementary file 4 is the TIMES final version. Attaching it here so that people can more easily acces it.
I was wondering where that was.

I had a look. It made me realise that none of the authors of this questionnaire are medical. It shows. Half of the symptoms are not even recognised features of ME/CFS. The whole thing, with the questions just about frequency and the psudo-arithmetic, is complete nonsense.

It reminds me of the International Consensus Criteria, dreamt up by a group of people in cloud cuckoo land, with no basis in real life. The ungrounded beliefs here are quite explicit.

The absurdity of this has got to a pathological level.

We had something a bit like this in the 1980s for lupus, although at least it was based on documented features of the disease. Over a period of ten years a committee knocked it into a different shape and other committees knocked the same stuff into other shapes. The end result was that treatment trials in lupus almost always came out with null results because the assessment system was so crazy. Things have probably improved but the same obsession with pseudo-arithmetical indices is still around.
 
I would also imagine that for most people being faced with a set of questions like this would induce both misery and horror at the implication that there are another 55 symptoms of their illness that they will have tomorrow even if they have not had them so far.

In a medical history taking you ask loads of questions but the patient realises that you do that for everyone. If this set of questions is designed specifically for people with ME/CFS then any one answering it will go home terrified. There seems to be no appreciation of how much harm health professionals can do by interfering for no good reason.
 
One of the things that annoys me about this is that veneer of medical terms, presumably all contributing to the therapist presenting themselves as someone who knows how to fix things.
Agree, it seems ironic when they’ve highlighted the “use of medical jargon and American English that was difficult to understand” as one of the problems with existing questionnaires. They don’t seem to have considered that producing another questionnaire with lots of jargon to support it would not solve the problem.

There is no mention of FUNCAP in the entire paper as far as I can see. Did they consider its suitability? Are they not aware of it?

It’s so telling that you get papers on complex biology from teams like Chris Ponting’s group that are hugely readable, even if we may not all understand all the details. Then you get others which seem to deliberately use specialist terms and language not to help explain but to show how they are very clever people and know more than you.

Imagine if the same time and effort was put into measuring the performance of services rather than patients. No doubt they’ll try and claim this will help do that but it’s so cumbersome and flawed from a patient perspective I cannot see how.

As an aside I’ve never DecodeME capitalised like this “DeCodeME”. Their attempt to throw shade is also quite something

The number of people with severe/very severe ME/CFS recruited is also a strength. It is estimated that 25% of people with Me/CFS are severely or very severely affected. This is reflected in the TIMES cohorts and is higher than other large studies using similar recruitment methods, such as DeCodeME

The strengths of this study lie in the large, representative sample, the robust co- production with people with ME/CFS and clinicians working in NHS specialist ME/CFS services.

Well, many here found their co-production flawed and self selecting. It was a long long way from the high bar DecodeME set and what we see in other studies.

Half of their limitations section is like asking someone what their weaknesses are and them replying ‘well I’m somewhat of a high achiever’.
 
Last edited:
I know, I'm not saying anything that hasn't been said already I'm just frustrated
Catching up with posts I think mine is the same! So apologies everyone.

It seems like a real missed opportunity and it’s been disappointing to see how feedback on these various projects was handled. The results do not look promising and yet are being presented as if they are something new, positive and supported by the community. A real case of marking your own homework, after funding its production. I wish they’d been more open to working with more people but seem to have been set on a way of doing things.
 
Last edited:
There is no mention of FUNCAP in the entire paper as far as I can see. Did they consider its suitability? Are they not aware of it?
We asked about that in the early stages of development of another of their questionnaires that tried to ask about function. They insisted their idea was better as it was asking about what people have actually done, ratiher than what we think we can do, or something. The resulting questionnaire is dreadful.
 
The questionnaire no longer asks about the severity of symptoms, just how often patients have a particular symptom.
The whole thing, with the questions just about frequency and the psudo-arithmetic, is complete nonsense.
Actually, it's just the fatigue and cognitive dysfunction sections that ask about frequency (those two sections happen to be the first two). The other sections ask about severity. Whether respondents will notice the change in the measurement, I have my doubts.

Screenshot 2026-02-21 at 6.32.15 AM.pngScreenshot 2026-02-21 at 6.32.50 AM.png

As I mentioned upthread, It is possible that this is a reasonable thing to do. The authors report finding that frequency and severity assessments tracked together and so they didn't get much extra information from asking about frequency and severity. I doubt that the time saved in answering questions was worth the loss of a consistent approach to enquiring about severity and frequency.
 
We asked about that in the early stages of development of another of their questionnaires that tried to ask about function. They insisted their idea was better as it was asking about what people have actually done, ratiher than what we think we can do, or something. The resulting questionnaire is dreadful.
It was disappointing to see so much hard work wasted in those discussions and to have good faith feedback rejected in the way it was.
I’m just surprised they haven’t even mentioned one of the more modern and well known and regarded questionnaires in the field. It seems a striking omission.
 
Back
Top Bottom