Automatic Detection of Cheating Behavior on Remotely Conducted Word List Learning Tests Using Speech Analysis

February 29, 2024

Elisa Mallick, Nicklas Linz, Felix Menne, Alexandra König, and Johannes Tröger

* Poster presented at the SCTM’s 20th Annual Scientific Meeting, Washington DC, USA

Abstract

What is the Methodological Question Being Addressed?

We build and evaluate the feasibility of an automatic algorithm detecting cheating behavior on remotely conducted word list learning tests (WLT) such as the Rey Auditory Verbal Learning Test (RAVLT). The cheating detection algorithm should inform study teams about artificially and suspiciously inflated WLT performances when applied as a remote cognitive screening or monitoring tool in AD or dementia trials in general.

Introduction

WLTs are amongst the most commonly used neuropsychological testing paradigms in cognitive screening, specifically in AD but all causes of dementia and old age psychiatry in general. However during remote telehealth-based application, operational issues around artificially inflated word list learning performances in clinical studies have been reported. The most parsimonious explanation being individuals to “cheat” during unmonitored assessments. With the advent of remote clinical trials and fully remote cognitive assessment solutions, detecting and preventing cheating in remote assessment of WLT is a critical topic for future work in clinical AD and dementia trials.

Methods

Based on the remote Speech Biomarker for Cognition (SB-C) procedure we implemented cheating detection for the RAVLT assessment; however the results also transfer to most of the common word list learning tasks used in AD research. According to literature and experience from multiple previous studies the cheating detection algorithm focuses on the first two immediate recall trials of the RAVLT as the memory performance is typically lowest in the beginning and hence cheating should be the most obvious. Moreover specific qualitative item-level recall behavior in those two first trials is assessed: number of correctly remembered words in the so-called midlist (the middle of the list of the presented RAVLT item words), the number of words that are recalled in a serial cluster (exact the same order as presented in the learning list) and subjective clustering behaviors between trials (remembering words in the exact same order from one trial to another). Performance on those qualitative behavior features will be compared against SB-C norm population based on the Swedish H70 epidemiological cohort (Rydberg Sterner et al., 2019) using z-scores. If the z-score of any of those cheating features is significantly higher than the norm values, the assessment is flagged with suspected cheating.

For an internal feasibility evaluation, 18 healthy participants (7 females) participated using the Mili phone remote speech assessment solution performing the first 4 immediate recall trials of the RAVLT which is also part of the SB-C remote screening protocol which has been previously used for screening in a clinical AD trial (2) as well as evaluated for usability and acceptability (3). Participants were explicitly instructed to carry out the RAVLT task by cheating and were also informed that there would be a cheating detection mechanism in place which they should try to beat. After performing the procedure subjects were asked to describe the strategies they applied for cheating.

Subsequently the WLT performance was automatically determined by applying speech analysis and feature extraction and cheating detection was run on all files.

Results

The cheating detection procedure was applied to these 18 participants, and 14 of them were flagged with a suspicion of cheating. Most informative of cheating was qualitative information about serial positioning of remembered words from the first immediate recall trials e.g. the number of words that have been remembered exactly in the same order as they have been presented in trial 1 [for IDs with cheating detected m=10.00 (sd=4.87) compared to norm population m=0.96 (sd=0.89)] or the number of midlist items from trial 1 [for IDs with cheating detected m=6.43 (sd=1.87) compared to norm population m=1.21 (sd=1.08)]. While all participants wrote down the list of words they had to remember, the reproduction strategies differed. The four participants that remained undetected while cheating were using a sophisticated model of learning and memory processes in the task meaning that participants slowly increased their performance while starting relatively high and also showed a recall behavior savvy of primacy and recency effects in such a task.

Conclusion

Overall, we present how an automatic algorithm could potentially detect cheating behavior on remotely conducted word list learning tests (WLT) such as the Rey Auditory Verbal Learning Test (RAVLT). The cheating detection algorithm could inform study teams about artificially and suspiciously inflated WLT performances when applied as a remote cognitive screening or monitoring tool in AD or dementia trials in general. Future research is needed to validate the mechanism at scale and in a realistic clinical study scenario.

Disclosure

The authors thank all volunteers who participated in the internal evaluation. EM, NL, FM, AK and JT are employed by the speech biomarker company ki:elements. JT and NL hold shares in the speech biomarker company ki:elements.

References

Rydberg Sterner, T., Ahlner, F., Blennow, K., Dahlin-Ivanoff, S., Falk, H., Havstam Johansson, L., Hoff, M., Holm, M., Hörder, H., Jacobsson, T., Johansson, B., Johansson, L., Kern, J., Kern, S., Machado, A., Mellqvist Fässberg, M., Nilsson, J., Ribbe, M., Rothenberg, E., … Skoog, I. (2019). The Gothenburg H70 Birth cohort study 2014–16: Design, methods and study population. European Journal of Epidemiology, 34(2), 191–209. https://doi.org/10.1007/s10654-018-0459-8
Ruhmel S, Tröger J, Linz N, Hermann J, Quiceno M, Langel K. Pre-Screening Prodromal AD Trial Populations over the Telephone Using a Speech Biomarker for Cognition — Preliminary Results from AUTONOMY Phase 2 AD Trial Recruitment. In: 15th Conference Clinical Trials Alzheimer’s Disease. San Francisco, CA, USA; 2022.
Gregory, S., Harrison, J., Herrmann, J., Hunter, M., Jenkins, N., König, A., … & Tröger, J. (2023). Remote data collection speech analysis in people at risk for Alzheimer’s disease dementia: usability and acceptability results. Frontiers in Dementia, 2.

Share this article

Discover more articles

ki:elements Detects Alzheimer’s Pathology via Automated Phone Call: Study Validates Speech Biomarker Across Five European Cohorts

SAARBRÜCKEN, Germany–(BUSINESS WIRE)–New peer-reviewed research published by ki:elements and the PROSPECT-AD consortium demonstrates that the company’s Speech Biomarker for Cognition (SB-C) can reliably detect cognitive impairment and

Read

Speech-based digital cognitive assessment for clinical trials: Detecting cognitive impairment stages and AD biomarker relations across European cohorts

König et al., 2026.

Read

Enhancing Recruitment in Alzheimer’s Trials Using Speech-Based Cognitive Biomarkers: Preliminary Findings from the RETAIN Study

Kyani et al., 2026.

Read

Optimize your Alzheimer's study: Get the new white paper on avoiding trial pitfalls

Automatic Detection of Cheating Behavior on Remotely Conducted Word List Learning Tests Using Speech Analysis

Abstract

What is the Methodological Question Being Addressed?

Introduction

Methods

Results

Conclusion

Disclosure

References

ki:elements Detects Alzheimer’s Pathology via Automated Phone Call: Study Validates Speech Biomarker Across Five European Cohorts

Speech-based digital cognitive assessment for clinical trials: Detecting cognitive impairment stages and AD biomarker relations across European cohorts

Enhancing Recruitment in Alzheimer’s Trials Using Speech-Based Cognitive Biomarkers: Preliminary Findings from the RETAIN Study

Get in touch

Resources

Follow us