ki:elements

Automated Speech Analysis for Monitoring Symptom Progression and Treatment Response in Depression and Schizophrenia

Felix Menne, Felix Dörr, Elisa Mallick, Johannes Tröger, Alexandra König, Diana Immel, Simon Barton, René Hurlemann

Presented at ECNP 2025

Background: Psychiatric disorders such as schizophrenia (SZ) and major depressive disorder (MDD) involve symptom fluctuations over time, making it challenging to objectively assess changes in treatment response. Traditional assessments rely on subjective clinical scales that may fail to detect subtle subtle variations in affective and cognitive symptoms. Automated speech analysis has emerged as a promising tool for objective, continuous monitoring, by capturing linguistic and acoustic features that reflect underlying psychiatric states and symptom dynamics.

Aims: We aimed to investigate the potential of automated speech analysis for monitoring symptom progression and treatment response in SZ and MDD. Specifically, we explored whether speech-derived features change over time and whether they offer insight into cognitive and affective symptom trajectories beyond what traditional clinical assessments provide.

Methods: A total of 66 participants (SZ, n = 22; MDD, n = 22; healthy controls [HC], n = 22) were recruited from the Dept. of Psychiatry, University of Oldenburg, Germany. Diagnoses were made according to DSM-V criteria. Clinical assessments included the Positive and Negative Syndrome Scale (PANSS) and the Montgomery-Åsberg Depression Rating Scale (MADRS). Participants completed three narrative speech tasks (positive, negative, neutral storytelling) and the Boston Cookie Theft picture description task. All tasks were recorded at two time points, 14 days apart, during which participants received standard in-clinic treatment, resulting in improved clinical scores.

Audio recordings were processed to extract a range of speech and language features. Group comparisons were conducted to identify differences in speech features over time. Machine learning models were trained to evaluate whether speech-derived features could differentiate between groups. Model performance was compared against baseline models incorporating age, education, sex, and the Beck Depression Inventory (BDI), as well as a BDI-only model. Performance was assessed using the area under the receiver operating characteristic curve (AUC).

Results: In the picture description task, significant group differences were observed in features such as number of pauses, utterance duration, word count, and other characteristics (p<0.05). No significant differences were found during the positive storytelling task at T1, but at T2, speech features effectively differentiated SZ from MDD (Table 1).

In classification analyses, speech signals derived from the picture description task yielded the strongest differentiation between SZ and MDD, with the speech model achieving an AUC of 0.90. Other models based on demographic data and BDI showed AUC values below 0.64. For the positive storytelling task, the speech model outperformed the BDI-based model at T1 (AUC = 0.78 vs. 0.62). At T2, the BDI model performed slightly better (AUC = 0.79 vs. 0.76).

Conclusion: The differentiation between MDD and SZ at T2, coinciding with clinical improvement, suggests that automated speech analysis has potential for tracking symptom change over time. As symptoms improve, distinctions between cognitive (SZ) and affective (MDD) domains may become more pronounced in speech. Thus, speech analysis could support the differentiation of these conditions, particularly in cases with overlapping symptoms. By complementing traditional assessments such as PANSS and MADRS, speech analysis may provide real-time, objective data that enhances understanding of symptom progression and treatment response, supporting to optimize clinical strategies.

Share this article