Evaluation of Stereopsis Performance, Gaze Direction and Pupil Diameter in Post-COVID Syndrome Using Machine Learning, 2025, Knauer et al.

SNT Gatchaman · Nov 27, 2025

Evaluation of Stereopsis Performance, Gaze Direction and Pupil Diameter in Post-COVID Syndrome Using Machine Learning

Knauer, Thomas S; Mardin, Christian Y; Rech, Jürgen; Michelson, Georg; Stog, Andreas; Zott, Julia; Steußloff, Fritz; Güttes, Moritz; Sarmiento, Helena; Ilgner, Miriam; Jakobi, Marie; Hohberger, Bettina; Schottenhamml, Julia

BACKGROUND/OBJECTIVES
Post-COVID syndrome (PCS) encompasses symptoms that persist for at least 12 weeks after the onset of a COVID-19 infection and cannot be explained by other causes. The most common symptoms are fatigue, cognitive impairments, and physical limitations. The objective diagnosis of PCS is still challenging, as specific biomarkers are lacking. One possibility to measure cognitive impairment is the virtual-reality-oculomotor-test-system (VR-OTS, Talkingeyes & More, Germany). It shows stereoscopic stimuli in a VR-environment to the test person. While working on the visual tasks, many features are recorded. These features can be categorized into three groups: stereopsis performance, gaze direction, and pupil diameter. The aim of this study was to investigate which of these three feature groups is best to distinguish patients with PCS from a healthy control group.

METHODS
In total, 429 patients with PCS were recruited within the disCOVer 1.0 and disCOVer 2.0 study at the Department of Ophthalmology, Universitatsklinikum (Erlangen, Germany). All patients received VR-OTS measurements. From these measurements, a total of 95 features were extracted, which can be categorized into three groups: gaze direction, pupil diameter, and stereopsis performance. In the first step, support vector machines (SVMs) were trained on these different feature sets and evaluated using the area under receiver operating characteristic (AUROC) as the evaluation metric. In the second step, the same procedure was repeated with each feature independently to investigate which were most the predictive per group.

RESULTS
The SVM using the pupil diameter features yielded an AUROC of 0.73, the one using the gaze direction features resulted in an AUROC of 0.68. and the stereopsis performance features produced an AUROC of 0.66. The SVM using all VR-OTS data showed an AUROC of 0.68. For the single features, the index of pupillary activity (IPA) showed the best discrimination. Moreover, all features that were evaluated at different difficulties showed the same pattern—that the more difficult test proved to be more predictive.

CONCLUSIONS
The study showed that VR-OTS can distinguish between patients with PCS and healthy control probands. Since different features showed a better performance than others, it makes sense for further studies to use a subset of the available features for further analysis.

Web | DOI | PDF | Biomedicines | Open Access

Hutan · Nov 29, 2025

Reaction time

The stereopsis feature group showed the lowest AUROC of 0.66. For the single features of this group, the minimum reaction time at the hardest difficulty level (disparity 275) yielded the best AUROC of 0.68.

Mehringer et al. [13] showed that the mean and median are increased in patients with PCS. In a further study of Güttes et al. [14], the reaction time of patients with PCS was significantly prolonged, while the p-value decreased inverse to the difficulty [14]. Prolonged reaction times in PCS patients were also found in multiple other studies [15,33,34]. An increase in differences with more complex tasks was also reported by Santoyo-Mora et al. [34]. The reason therefore is presumably cognitive deficiencies. Santoyo-Mora et al. [34] attribute this to a higher cognitive performance load, which causes patients with PCS to require more time. This is consistent with our observation that the AUROC values increase along with the difficulty, particularly regarding the minimum reaction time at our highest difficulty level.

A replication of the finding that minimum reaction time differs in ME/CFS-type illness

Gaze direction

The gaze direction feature group achieved an AUROC of 0.68. The best feature in this group was the fixation duration, which had an AUROC of 0.70.

How does a single feature get a particular AUROC, but a set of features that includes the single feature gets a lower AUROC? Shouldn't more information at worst make no difference and at best allow a better prediction? What am I missing there?

Pupil diameter

Pupil diameter was the best performing group of parameters, with an AUROC of 0.73. The values for mean IPA (IPA 275: 0.70; IPA 550: 0.71; IPA 1100: 0.66) and mean LHIPA (LHIPA 275: 0.67; LHIPA 550: 0.67; LHIPA 1100: 0.65) of the individual difficulties performed best. The minimum of the pupil diameter at the hardest difficulty yielded an AUROC of 0.65.

the mean of the index of pupillary activity (IPA) and the mean of the low/high index of pupillary activity (LHIPA) are computed across the entire pupillary signal. IPA [20] and LHIPA [21] are indices to measure cognitive load based on the oscillation and variation in pupil diameter signal.

The pupil diameter feature group showed the best mean AUROC-values in our study and performed better than in the earlier study by Mehringer et al. [13]. This may be again due to the small sample size used in their study. Multiple other studies also observed differences in the pupil behavior in patients with PCS compared to healthy controls [15–17]. The AUROC of the pupil diameter feature achieved a value of 0.73, indicating a moderate to good ability to discriminate between healthy patients and patients with PCS [36].

Consequently, patients with PCS could show smaller pupil sizes due to their increased cognitive load.

Comment on the presentation of the paper
It's a brief paper, most of the data is in an appendix. It would have been good to see some charts, or at least for the authors to indicate the directions of the observed differences. I got through Results, still not sure if the mean of the index of pupillary activity was more or less in people with post-Covid syndrome.

(Actually I've got through the discussion and I don't think they have actually said how IPA values compared between the PCS and control group. Sometimes authors are vague about what they found because it doesn't match what others have found or the story they want to push. I don't get the sense these authors are deliberately obfuscating, but we really should be told the results for the features and not just how good differences in the features are at predicting cohort membership. )

Limitations
Among the limitations noted are

Moreover, the PCS group contained more patients than the control group, in particular more female patients.

They note that examining the differences in the vision-related features with things like symptom presence and severity would be useful.

Lou B Lou · Nov 29, 2025

Hutan said:
Reaction time

A replication of the finding that minimum reaction time differs in ME/CFS-type illness

Gaze direction

How does a single feature get a particular AUROC, but a set of features that includes the single feature gets a lower AUROC? Shouldn't more information at worst make no difference and at best allow a better prediction? What am I missing there?

Pupil diameter

Comment on the presentation of the paper
It's a brief paper, most of the data is in an appendix. It would have been good to see some charts, or at least for the authors to indicate the directions of the observed differences. I got through Results, still not sure if the mean of the index of pupillary activity was more or less in people with post-Covid syndrome.

(Actually I've got through the discussion and I don't think they have actually said how IPA values compared between the PCS and control group. Sometimes authors are vague about what they found because it doesn't match what others have found or the story they want to push. I don't get the sense these authors are deliberately obfuscating, but we really should be told the results for the features and not just how good differences in the features are at predicting cohort membership. )

Limitations
Among the limitations noted are

They note that examining the differences in the vision-related features with things like symptom presence and severity would be useful.

What is ... 'the area under receiver operating characteristic (AUROC)' ?

.

Sean · Nov 29, 2025

Exactly the sort of stuff that needs a lot more attention by researchers, IMHO.

I think there is a whole zoo of basic sensory-motor-cognitive clues hiding in plain sight, just waiting to to be objectively measured, and which they can be with current technology and understanding.

Hutan · Nov 29, 2025

Lou B Lou said:
What is ... 'the area under receiver operating characteristic (AUROC)' ?

My basic understanding is that it is a measure of the effectiveness of an algorithm that includes one or more features used for distinguishing two groups.

So, say you had a group of dogs and a group of cats, then a model that included 'presence of retractable claws' would have an AUROC of essentially 1, because it would reliably identify all of the cats as cats (true positive), without identifying any of the dogs as cats (false positive).

But, the features of 'fur' or 'four legs' or 'two pointy ears' wouldn't make for a very good result. And, adding those features to 'retractable claws' wouldn't make the result better.

1 is perfect; 0.5 is random guessing

Wikipedia detail

Evaluation of Stereopsis Performance, Gaze Direction and Pupil Diameter in Post-COVID Syndrome Using Machine Learning, 2025, Knauer et al.

SNT Gatchaman

Senior Member (Voting Rights)

Hutan

Moderator

Lou B Lou

Senior Member (Voting Rights)

Sean

Moderator

Hutan

Moderator