Batch correction of genomic data in chronic fatigue syndrome using CMA-ES, 2020, Lopez Rincon et al

Dolphin

Senior Member (Voting Rights)
FREE ACCESS https://dl.acm.org/doi/pdf/10.1145/3377929.3389947

RESEARCH-ARTICLE
Batch correction of genomic data in chronic fatigue syndrome using CMA-ES



GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion

Batch correction of genomic data in chronic fatigue syndrome using CMA-ES
Pages 277–278

ABSTRACT

Modern genomic sequencing machines can measure thousands of probes from different specimens.

Nevertheless, theoretically comparable datasets can show considerably distinguishable properties, depending on both platform and specimen, a phenomenon known as batch effect.

Batch correction is the technique aiming at removing this effect from the data.

A possible approach to batch correction is to find a transformation function between different datasets, but optimizing the weights of such a function is not trivial:
As there is no explicit gradient to follow, traditional optimization techniques would fail. In this work, we propose to use a state-of-the-art evolutionary algorithm, Covariance Matrix Adaptation Evolution Strategy, to optimize the weights of a transformation function for batch correction.

The fitness function is driven by the classification accuracy of an ensemble of algorithms on the transformed data.

The case study selected to test the proposed approach is mRNA gene expression data of Chronic Fatigue Syndrome, a disease for which there is currently no established diagnostic test.

The transformation function obtained from three datasets, produced from different specimens, remarkably improves the performance of classifiers on the task of diagnosing Chronic Fatigue.

The presented results are an important steppingstone towards a reliable diagnostic test for this syndrome.
 
Chronic Fatigue Syndrome (CFS) is a rare condition, with a world- wide prevalence of approximately 0.76 to 3.28%
The paper starts pretty oddly - that's a higher prevalence range than I would bet on and even with that, they suggest it's a rare condition? Perhaps compared to obesity or heart disease...

But it's nice to see a paper treat 'CFS' as just another disease, one that has the problems of no diagnostic test, with genetic expression (mRNA) studies being small and from a range of sample types (e.g. whole blood; PBMCs).

I suspect the real answers are replication, larger sample sizes, and lab technique improvements rather than transformation functions. But still, it's nice to see scientists using CFS datasets in this way, trying to make the existing data more useful.
 
Back
Top Bottom