More PACE trial data released

I haven't followed all this discussion, and I am hugely grateful to @JohnTheJack and Alem Matthees for their fantastic efforts in getting some of the data.
Would it make sense to simply ask for all the data apart from that which could be seen as individual identifiers such as age and location. The entire data set must surely be stored in a single spreadsheet or similar.

I believe that would be rejected as taking too long to provide. There is a limit of 18 (?) hours' work.
 
I believe that would be rejected as taking too long to provide. There is a limit of 18 (?) hours' work.

That assumes the data are kept in a disorderly way and would require collating from different and incompatible files. If they are kept in a single file, it would be easier to send the lot with just personal identifiers deleted, than to pick out a few requested subsections and create a new file. But maybe that's all too sensible and logical for the likes of Peter White. Maybe there is indeed a disordered mess of data.
 
That assumes the data are kept in a disorderly way and would require collating from different and incompatible files. If they are kept in a single file, it would be easier to send the lot with just personal identifiers deleted, than to pick out a few requested subsections and create a new file. But maybe that's all too sensible and logical for the likes of Peter White. Maybe there is indeed a disordered mess of data.

I think #99 above addresses that better than I can. I just know from what they have been claiming as part of the arguments, they would claim a s12 exemption and that they would likely succeed in making it stick.
 
Another very quick thought. We may have to be prepared for the possibility that QMUL may not actually be able to guarantee a particular sort order! Someone who better understands databases will know this better, but I'm guessing that when data is queried out of a db there will be a default sort applied, which likely depends on the parameters chosen and their values. Different parameters / values might well lead to a different sort order, and the person doing the querying not knowing (or caring) how things got sorted.
Sort is irrelevant, you can't have a coherent database without IDs cross-referencing entries between tables. If IDs were excluded it was deliberate. It'd be like trying to organize a library of books with no covers or titles.

I did a lot of DB development, architecture and optimization. This is 101.
 
Sort is irrelevant, you can't have a coherent database without IDs cross-referencing entries between tables. If IDs were excluded it was deliberate. It'd be like trying to organize a library of books with no covers or titles.

I did a lot of DB development, architecture and optimization. This is 101.
I've also done my share of DB development. But nobody uses database software to analyse or store data from a scientific study. It would be a huge amount of overkill, and your statistical software wouldn't be able to read it. It would be like hiring a librarian to organise one Billy bookcase. Every study dataset I have ever seen is basically a spreadsheet, with records in the rows and variables in the columns. Indeed, a surprising number of studies are analyzed in Excel. PACE was analyzed with Stata, SAS, and SPSS, all of which assume the data is in a simple row/column form. The principle of using a key (such as a participant ID) to tie records together if they happen to get split across multiple files is of course the same as in database design, but this ought to be much simpler (not least because there shouldn't be multiple files!)

So if their data & data management processes are a mess - ie not up to basic standards, surely further undermines whole original trial as unreliable.
You could certainly make a good case for that. If the stories that they expect people to believe are true, then they are massively incompetent, to the point where they shouldn't be entrusted with anybody's data, let alone that of vulnerable patients. If not, then they are being wilfully obstructive. This whole story about "It would take a qualified statistician dozens of hours to identify the appropriate elements of the data" is almost certainly bollocks, relying on the judge not understanding the (very modest) technicalities of the process.

Also remember that PLoS ONE issued an Expression of Concern about the PACE "cost effectiveness" paper. Journals don't like doing that sort of thing, so they must have been unconvinced by the excuses of the authors. (They should, of course, have retracted the paper, in line with their stated policy of mandatory data sharing, but even the slap on the wrist in the form of an EoC won't have been done without a fair amount of failed negotiations with the authors.)

PS: As people here may or may not know, I'm not an ME patient, nor do I know (in real life) anyone who is. Over the past few years I've developed an interest in detecting and discussing (not to say "exposing") bad science, but I only got interested in this topic because for a while Jim Coyne was my PhD advisor and so PACE was often going past on my radar. Up to now I was always a little skeptical about some of the claims being made about the PACE researchers because of the obvious possibility for motivated reasoning, but the evidence that the researchers have at least something to hide certainly seems to be accumulating.
 
Sort is irrelevant, you can't have a coherent database without IDs cross-referencing entries between tables. If IDs were excluded it was deliberate. It'd be like trying to organize a library of books with no covers or titles.
Not what I was saying. I'm well aware that sort order is inapplicable to a relational database, else it would not be relational. But once you've extracted that data, or a subset of it, into a spreadsheet format, then sort order does matter if you want to correlate different spreadsheets you've been provided with, and be confident the participants (=> rows) match. Patrticipant IDs would be best.

Remember participant IDs were never provided with the original 2016 dataset, so even if we had them this time around we would still be no better off. It's only once you need to marry up 2 or more data subsets the omission becomes an issue.

I think this means that in future we need to ensure that for any FOI request (including the 'first' one), we explicitly request unique (anonymised) participant IDs, guaranteed to be the same per participant for any further data subsets extracted.
 
I've also done my share of DB development. But nobody uses database software to analyse or store data from a scientific study. It would be a huge amount of overkill, and your statistical software wouldn't be able to read it. It would be like hiring a librarian to organise one Billy bookcase. Every study dataset I have ever seen is basically a spreadsheet, with records in the rows and variables in the columns. Indeed, a surprising number of studies are analyzed in Excel. PACE was analyzed with Stata, SAS, and SPSS, all of which assume the data is in a simple row/column form. The principle of using a key (such as a participant ID) to tie records together if they happen to get split across multiple files is of course the same as in database design, but this ought to be much simpler (not least because there shouldn't be multiple files!)
That's interesting to know. In which case I can see where you and others are coming from. I think in future, for our own peace of mind, it would always be good to get unique patient IDs with every dataset requested; then we never have to trust the data providers to have got it right, because we can actively check.
 
QMUL should be asked for a complete consistent set of all the data that has been ordered to be released. If they refuse and it gets escalated through the FOI and appeal process I can imagine a judge not looking favorably on their deliberate attempt to avoid complying with their FOI obligations.
 
That's interesting to know. In which case I can see where you and others are coming from. I think in future, for our own peace of mind, it would always be good to get unique patient IDs with every dataset requested; then we never have to trust the data providers to have got it right, because we can actively check.
What's to stop them creating new identifiers for each set? What should be asked for is a set containing all the previously released data plus the newly requested data.
 
What's to stop them creating new identifiers for each set? What should be asked for is a set containing all the previously released data plus the newly requested data.
Which is why I said this ...
I think this means that in future we need to ensure that for any FOI request (including the 'first' one), we explicitly request unique (anonymised) participant IDs, guaranteed to be the same per participant for any further data subsets extracted.
... they would be flagrantly breaching the FOI requests if they did as you suggest.

I do agree with you in principle, but worry it would potentially give wriggle room to turn FOI requests down more easily, if we ask for cumulative datasets that are potentially requested by different people, not necessarily all collaborating with each other. I think it might get messier than it might appear. Their own data simply must have a unique participant identifier code of some kind, and it is that which we need to acquire each time.
 
I do agree with you in principle, but worry it would potentially give wriggle room to turn FOI requests down more easily, if we ask for cumulative datasets that are potentially requested by different people, not necessarily all collaborating with each other.
I admit to knowing nothing about how FOI works beyond what I gained by reading the ruling on Aleem's case. But I would have thought that once data had been released to one person the precedent would be set so asking for the already released data plus something new would just involve consideration of the new data, plus whether the combination makes identification more likely. Whether you ask for a complete set or new data plus an identifier that allows linking to the old set should make no difference, but a complete set would avoid all confusion.
Their own data simply must have a unique participant identifier code of some kind, and it is that which we need to acquire each time.
But since the original Matthees data set didn't include that a new FOI would be needed for that anyway.
 
It is quite deliberate that they've made it so the data sets cannot be combined. I knew they had some sort of trick up their sleeves, I guess this is it.
The fact that the sort order in the new files is not the same as the first file is either down to incompetence or malice. I always prefer the first of these explanations, but the level of incompetence (probably by multiple people) required in order for the newly-released variables not to be sorted correctly makes me wonder here.
Yeah, excuse me if my cynicism is running sky-high on this. They must have had all this data aligned at some point in order to do their own analysis in the first place.

My money is on this being deliberate obfuscation, possibly with the aim of delaying any publication of results by others using this data until after the NICE review has reported back. Wouldn't that be very convenient for them. :grumpy:

PS: As people here may or may not know, I'm not an ME patient, nor do I know (in real life) anyone who is. Over the past few years I've developed an interest in detecting and discussing (not to say "exposing") bad science, but I only got interested in this topic because for a while Jim Coyne was my PhD advisor and so PACE was often going past on my radar. Up to now I was always a little skeptical about some of the claims being made about the PACE researchers because of the obvious possibility for motivated reasoning, but the evidence that the researchers have at least something to hide certainly seems to be accumulating.
Very much obliged to your interest in this. People like you are exactly the sort we need to look at this mess. Patients cannot do this on our own.

:thumbup: :hug:
 
Up to now I was always a little skeptical about some of the claims being made about the PACE researchers because of the obvious possibility for motivated reasoning, but the evidence that the researchers have at least something to hide certainly seems to be accumulating.

Something to hide is a bold claim, but what is abundantly clear is they are not acting in good faith and seem to have deep contempt for anyone who asks difficult questions.

I mean patients demonstrated to them (in The Lancet) their criteria for "normal range" was fundamentally flawed, in response they provided excuses and misdirection (in The Lancet), then proceeded to use that same criteria to mean "recovery" in Psychological Medicine. That was 2013 and things have not improved since then.
As an aside, I am somewhat curious as to where such lack of good faith towards the concerns of patients has existed in the history of medical research...
 
But since the original Matthees data set didn't include that a new FOI would be needed for that anyway.
Unless a new FOI request included enough of the previously released data to reliably marry up new with old anyway, in which case it would make sense to just ask for for a cumulative dataset, as well as unique participant IDs for good measure.
 
So if their data & data management processes are a mess - ie not up to basic standards, surely further undermines whole original trial as unreliable.
There is an interesting nugget, can't remember where it was mentioned but I think it was from Sharpe, that stated that one of the issues they faced at one point was that a single participant retracted their consent and this delayed the analysis by weeks.

If that's true it's massively incompetent. Removing a single participant from the data should be trivial, a matter of minutes.

So there could be something to the massively incompetent possibility. That's simply not normal with the budget they were operating on.
 
Last edited:
Back
Top Bottom