Language impairment in Alzheimer’s disease—robust and explainable evidence for AD-related deterioration of spontaneous speech through multilingual machine learning

Abstract

Alzheimer’s disease (AD) is a pervasive neurodegenerative disease that affects millions worldwide and is most prominently associated with broad cognitive decline, including language impairment. Picture description tasks are routinely used to monitor language impairment in AD. Due to the high amount of manual resources needed for an in-depth analysis of thereby-produced spontaneous speech, advanced natural language processing (NLP) combined with machine learning (ML) represents a promising opportunity. In this applied research field though, NLP and ML methodology do not necessarily ensure robust clinically actionable insights into cognitive language impairment in AD and additional precautions must be taken to ensure clinical-validity and generalizability of results. In this study, we add generalizability through multilingual feature statistics to computational approaches for the detection of language impairment in AD. We include 154 participants (78 healthy subjects, 76 patients with AD) from two different languages (106 English speaking and 47 French speaking). Each participant completed a picture description task, in addition to a battery of neuropsychological tests. Each response was recorded and manually transcribed. From this, task-specific, semantic, syntactic and paralinguistic features are extracted using NLP resources. Using inferential statistics, we determined language features, excluding task specific features, that are significant in both languages and therefore represent “generalizable” signs for cognitive language impairment in AD. In a second step, we evaluated all features as well as the generalizable ones for English, French and both languages in a binary discrimination ML scenario (AD vs. healthy) using a variety of classifiers. The generalizable language feature set outperforms the all language feature set in English, French and the multilingual scenarios. Semantic features are the most generalizable while paralinguistic features show no overlap between languages. The multilingual model shows an equal distribution of error in both English and French. By leveraging multilingual statistics combined with a theory-driven approach, we identify AD-related language impairment that generalizes beyond a single corpus or language to model language impairment as a clinically-relevant cognitive symptom. We find a primary impairment in semantics in addition to mild syntactic impairment, possibly confounded by additional impaired cognitive functions.

ki:elements Detects Alzheimer’s Pathology via Automated Phone Call: Study Validates Speech Biomarker Across Five European Cohorts

SAARBRÜCKEN, Germany–(BUSINESS WIRE)–New peer-reviewed research published by ki:elements and the PROSPECT-AD consortium demonstrates that the company’s Speech Biomarker for Cognition (SB-C) can reliably detect cognitive impairment and

Read

Speech-based digital cognitive assessment for clinical trials: Detecting cognitive impairment stages and AD biomarker relations across European cohorts

König et al., 2026.

Read

Enhancing Recruitment in Alzheimer’s Trials Using Speech-Based Cognitive Biomarkers: Preliminary Findings from the RETAIN Study

Kyani et al., 2026.

Read

Optimize your Alzheimer's study: Get the new white paper on avoiding trial pitfalls

Language impairment in Alzheimer’s disease—robust and explainable evidence for AD-related deterioration of spontaneous speech through multilingual machine learning

Abstract

ki:elements Detects Alzheimer’s Pathology via Automated Phone Call: Study Validates Speech Biomarker Across Five European Cohorts

Speech-based digital cognitive assessment for clinical trials: Detecting cognitive impairment stages and AD biomarker relations across European cohorts

Enhancing Recruitment in Alzheimer’s Trials Using Speech-Based Cognitive Biomarkers: Preliminary Findings from the RETAIN Study

Get in touch

Resources

Follow us