Predicting antidepressant response using artificial intelligence

0
5


Antidepressants are a commonly used treatment for a range of mental health conditions, including depression and anxiety. Despite their frequency of use (i.e., an estimated 8.6 million people in England were prescribed antidepressants in 2022/2023 [NHSBSA, 2015]), challenges remain around understanding who will benefit from antidepressant treatment. It is estimated that two thirds of people with Major Depressive Disorder (MDD) will not achieve remission after first-line antidepressant treatment (Keks, Hope, & Keogh, 2016; Ionescu, Rosenbaum & Alpert, 2015), and there are additional concerns around the impact of side-effects and medication withdrawal, especially when taking medications long-term.

As the population continues to deal with the aftermath of the COVID-19 pandemic mental health crisis (ONS, 2021), we are seeing mental health service provisions stretched, with need far outweighing resource in many sectors (see Mind article here). As we try and tackle this problem, novel and exciting avenues of research are being explored in data science and machine learning, with the transformative potential of ‘data-driven psychiatry’ being imminent.

Machine learning (ML) can be simply defined as computers learning from data and making decisions or predictions without being specifically programmed to do so (datacamp, 2023). ML models are able to gain insights into the complex relationships between variables and outcomes without the researcher specifying a hypothesis first – this differs from traditional statistical approaches that are typically hypothesis-driven. There are multiple types of ML models that can be used for different research approaches, and many models are used to inform decision making or to make predictions.

In this paper, the authors (a group of researchers mostly from The Netherlands and Norway) evaluate a handful of ML models aimed at predicting patient response to the antidepressant sertraline in early psychiatric treatment stages, using data from a randomised controlled trial (RCT). They show that clinical data and a specific type of neuroimaging data are particularly useful for model prediction and suggest that these data could be used for treatment planning in psychiatric care.

Approximately two thirds of antidepressant users don’t respond to initial treatment. Machine learning models may help clinicians identify who those patients are likely to be at an early stage.

Research suggests that about two thirds of antidepressant users don’t respond to initial treatment. Machine learning models may help clinicians identify who those patients are likely to be at an early stage.

Methods

This paper uses XGBoost, an ML algorithm which works by harnessing multiple versions of an ML model called a decision tree, and ‘boosting’ the performance of each individual decision tree by learning from its prediction mistakes. An ML prediction algorithm was built and trained using data from the EMBARC clinical trial, a multisite trial initiated to discover potential biomarkers of antidepressant treatment outcomes across a range of domains, including genetic and environmental domains (Trivedi et al., 2016). The authors investigated whether response to sertraline, a selective serotonin reuptake inhibitor (SSRI), could be predicted in both pre-treatment and early-treatment stages (i.e., one week post-treatment initiation) in patients with depression.

The EMBARC trial recruited 296 patients and randomised them into one of two study conditions:

  1. Those who would receive sertraline treatment
  2. Those who would receive a placebo treatment.

The study consisted of two 8-week phases. In their analysis, the authors used three population subgroups:

  1. Those treated with sertraline (n=109)
  2. Those treated with placebo (n=120)
  3. Those who switched to sertraline in phase two of the study (n=58).

To evaluate model performance, one of the metrics the authors used was balanced accuracy. This approach takes the mean sensitivity (i.e., the model’s ability to accurately detect a positive case) and the mean specificity (i.e., the model’s ability to accurately detect a negative case) of the model and compares the accuracy of the model to the likelihood of these outcomes occurring purely by chance, defined here as the ‘a priori response rate’.

Results

A total of 229 patients were included in the analysis after exclusion due to missing data (mean age was 38.1 years, 65.9% female). The authors were able to predict sertraline response at week 8 from measurements taken in early treatment (week 1) with a balanced accuracy of 68% (AUROC=0.73, sensitivity=0.7, specificity=0.7). This means that instead of the clinician and patient having to wait 8 weeks to see if sertraline treatment has been effective, they have increased insight from the early-treatment stages. This could be particularly useful for people who experience side-effects early on, who will want to minimise the time spent on medication as much as possible if there is a low likelihood of it benefiting them.

Models trained on predictors which had the strongest scientific evidence backing them (e.g., Tier 1 predictors including age, hippocampal volume, symptom reduction) achieved the best performance compared to models trained on predictors with weaker scientific evidence (e.g., Tier 2 and 3 predictors including volumes of other brain areas, severity of depression, cerebral spinal fluid, education). The best model performance was achieved using data from early treatment as opposed to pre-treatment, but the authors note that all the models performed better than chance with the exception of one model trained on Tier 2 predictors. This is useful to know because it gives future researchers guidance on what types of information to include in the similar prediction models, and reduces the time spent experimenting to see which types of data might be most predictive.

The most important pre-treatment predictors were arterial spin labelling (ASL) features, a neuroimaging technique that measures tissue perfusion and cerebral blood flow (CBF) (Clement et al., 2022). The implication of this is that CBF may be related to depression, although whether CBF influences depression symptoms, or whether depression symptoms influence CBF is still unknown (i.e., reverse causality).

In the early treatment phase model, the most important predictors were clinical markers, namely the reduction in Hamilton Depression Rating Scale (HAM-D) score, HAM-D score at week 1, and anhedonic depression score (a measure of anhedonia, a symptom of depression characterised by lack of pleasure and enjoyment) on the Mood and Anxiety Symptom Questionnaire at baseline. It is notable that measures of depression symptom reduction were amongst the most important predictors. I would argue that this calls to question what these types of models can actually tell us about the nature of depression. It makes sense that you can make future predictions of symptom change if you observe symptom change initially, especially in the case of symptom improvement. Whilst these models are not always used to answer epidemiological research questions when on the hunt for biomarkers or biosignatures of depression (i.e., “can a prediction model tell us anything about what causes depression?”), ideally a valuable model should contribute a unique insight into a mechanism, pathway, or relationship relevant to the cause of depression that a human being (i.e., a clinician) could not.

The models were specifically good at predicting response to sertraline, but worse at predicting placebo response. ‘Multimodal’ models, defined here as models which integrate a wide range of MRI modalities, also outperformed ‘unimodal’ models which use one domain or type of data. This result in particular has been influential on the overall take home message of this article: that there is value in collecting both clinical and neuroimaging data for antidepressant response prediction.

There was some evidence that machine learning methods could predict sertraline response at week 8 from measurements taken in early treatment at week 1.

There was some evidence that machine learning methods could predict sertraline response at week 8 from measurements taken in early treatment at week 1.

Conclusions

The authors concluded that they have:

show[n] that pretreatment and early-treatment prediction of sertraline treatment response in MDD patients is feasible using brain MRI and clinical data.

They emphasise that their modelling approach, which includes training the prediction model(s) on MRI data from multiple domains with additional clinical data, outperformed models which used data from single domains. They also show that models trained on data that have the strongest scientific evidence base performed the best and ‘drove’ the model performance. Both clinical data and ASL perfusion data were strong predictors of antidepressant response, suggesting that these data types should be applied in future prediction modelling work in this area.

There is value in collecting both clinical and neuroimaging data for antidepressant response prediction in patients with depression.

There is value in collecting both clinical and neuroimaging data for antidepressant response prediction in patients with depression.

Strengths and limitations

When appraising the predictive ability of a ML model, it is important to pay considerable attention to the relationship(s) between predictor variables and target outcomes (i.e., what you are trying to predict). The authors emphasise that clinical data had high predictive ability in the early-treatment prediction of response to sertraline, and they outline that the most important predictors were reduction in HAM-D score, HAM-D score at week 1, and anhedonic depression score on the Mood and Anxiety Symptom Questionnaire at baseline. However, it must be noted that there is overlap between the predictors and the outcome here, as sertraline response is defined as a 50% reduction on the HAM-D scale after 8 weeks and remission is considered to be a score of 7 or lower on the HAM-D scale after 8 weeks. This overlap between predictors and outcome means that you could argue that these predictors will have a strong relationship with the outcome variable. This doesn’t seem like it should be a problem when models are deployed in context, but when you’re evaluating what a model has learned about the data (in this instance, what it has learned about treatment response), this relationship between predictors and outcome could constitute a form of bias when appraising model performance.

Again, whilst it could be argued that this consideration matters less when the clinical aim is treatment optimisation, it could potentially undermine the value of building models which integrate multiple data types, due to the high performance of clinical data over neuroimaging data. Considering that one of the aims of the study (and of the EMBARC trial overall) was to discover biomarkers that can be used for antidepressant response prediction, the question remains of whether there will ever be a biomarker more predictively powerful than data that is routinely collected in clinical assessment. Considering this alongside the costs of neuroimaging data acquisition – the financial impact of which the authors do acknowledge – the results of this modelling may not support the clinical need to routinely collect neuroimaging data.

On the other hand, the results of the pre-treatment model point to ASL perfusion data as being predictively powerful, an interesting result that has clinical and epidemiological value when exploring the relationship between the brain and SSRIs. However, when the model is given data on symptom reduction on the HAM-D scale, the power of neuroimaging markers decrease, and clinical data becomes the most predictively useful. It is relevant that the inclusion of neuroimaging data boosts performance in general, but clinical data as a single modality significantly outperforms all other single neuroimaging modalities.

An additional question remains of whether the ‘a priori’ prediction of treatment response, which the authors compare their model performance to, is a fair comparison. ’A priori’ prediction refers to the trial-and-error clinical approach to antidepressant prescription. This approach has been shown to lead to two-thirds of people not responding to treatment (i.e., the clinician’s ‘model’ which assumes 100% of patients will respond to treatment is 33% accurate). It’s unclear whether the authors consider information on symptom scale reduction in early treatment to be included in the clinician’s assessment, or if the a priori response rate is assumed to be informed by one measurement timepoint only (i.e., the first clinical consultation when antidepressants are prescribed).

The question remains of whether there will ever be a biomarker more predictively powerful than data that is routinely collected in clinical assessment

The question remains of whether there will ever be a depression biomarker more predictively powerful than data that is routinely collected in clinical assessment.

Implications for practice

The key question here is whether neuroimaging data should be used in clinical assessments in the early stages of treatment planning. Acquiring neuroimaging data is expensive, but the model which used both neuroimaging and clinical data outperformed all others. Whether this financial burden ends up being ‘worth’ the potential benefit of increased predictive ability will be difficult to measure. It would require complex health economics to calculate how model performance improvement leads to overall improvement in patient care, which could potentially justify the financial cost. However, the cost of neuroimaging for each patient would need to be shown to be lower than the overall cost of patients receiving the wrong initial treatment. This is a complex question requiring expertise from medicine, health economics, and data science – no mean feat.

Despite this, appraisal of these methods should not be restricted to a commentary about financial burden, financial gain, or other economic metrics of healthcare success. These prediction models have the potential to help real people struggling with their mental health to make more informed treatment decisions. It helps people to look into the future and consider whether employing a pharmacological approach to their symptom management is the best option for them, or whether they should explore other avenues like talking therapies, lifestyle interventions, and methods to improve social connectedness, purpose, and life satisfaction more generally. But when we are considering the transformative potential of AI for mental health, which requires large swathes of data, the financial backbone of the approach continues to be the first and last hurdle.

How much money does a high performing model save through potential reduction in ineffectual treatments, compared to a lower performing model that is cheaper to deploy?

How much money does a high performing model save through potential reduction in ineffective treatments, compared to a lower performing model that is cheaper to deploy?

Statement of interests

None to declare.

Links

Primary paper

Maarten G Poirot, Henricus G Ruhe, Henk-Jan M M Mutsaerts, Ivan I Maximov, Inge R Groote, Atle Bjørnerud, Henk A Marquering, Liesbeth Reneman, Matthan W A Caan. (2024) Treatment Response Prediction in Major Depressive Disorder Using Multimodal MRI and Clinical Data: Secondary Analysis of a Randomized Clinical Trial. Am J Psychiatry. Am J Psychiatry 181, 223-233 (2024). https://doi.org/10.1176/appi.ajp.20230206

Other references

Medicines Used in Mental Health – England – 2015/16 to 2022/23; NHSBSA (2023).

Keks, N., Hope, J. & Keogh, S. Switching and stopping antidepressants. Aust Prescr 39, 76–83 (2016).

Ionescu, D. F., Rosenbaum, J. F. & Alpert, J. E. Pharmacological approaches to the challenge of treatment-resistant depression. Dialogues Clin Neurosci 17, 111–126 (2015).

Coronavirus and depression in adults, Great Britain: July to August 2021; Office for National Statistics (2021).

Mental health crisis care services ‘under-resourced, understaffed and overstretched’, Mind.

What is Machine Learning? Definition, Types, Tools & More, datacamp (2023).

Trivedi, M. H. et al. Establishing moderators and biosignatures of antidepressant response in clinical care (EMBARC): Rationale and design. J Psychiatr Res 78, 11–23 (2016).

Clement, P. et al. A beginner’s guide to arterial spin labeling (ASL) image processing. Sec. Neuroradiology 2, 1-12 (2022).

Photo credits