This week we will be discussing a newly published article from the Journal of Consulting and Clinical Psychology. This article, titled “Personalized Prognostic Prediction of Treatment Outcome for Depressed Patients in a Naturalistic Psychiatric Hospital Setting: A Comparison of Machine Learning Approaches” was authored by Webb, Cohen, Beard, and Forgeard.

What did they do?

Participants (n = 484) in this study were recruited from a behavioral health hospital program (i.e., participants were already receiving treatment). For the purpose of generalizability, no exclusion criteria were implemented. The study procedure involved researchers constructing various prognostic machine learning models, based on existing literature. Overall, the predictive performance of 13 machine learning algorithms were compared, including but not limited to: elastic net regulation (ENR), random forest (RF), and Bayesian additive regression trees (BART). The algorithm with the strongest support was then applied to a sample of participants. Additionally, these researchers developed a “treatment outcome prognosis calculator” based on their final model, as a demonstration of how this model could be applied to new patients entering treatment.

Why did they do it?

As many of us are aware, much of the available research on predictors of depression outcomes relies heavily on randomized clinical trials. While these studies are informative, there are often strict inclusion and exclusion criteria that greatly limit the generalizability of the results. As the authors point out, there is evidence of poorer overall treatment outcomes in naturalistic settings compared to what has been reported in randomized control trials, which warrants further research to identify pre-treatment patient characteristics that may predict poorer outcomes in these settings.

Gaining knowledge about characteristics that predict poor mental health treatment outcomes can inform care and aid in identifying important individualized treatment targets. In addition, many existing treatment outcome prediction studies test the impact of only individual variables on depression outcomes. This study attempts to fill that gap by examining multivariable machine learning approaches, which allow for a combination of predictors to be analyzed to account for maximal outcome variance. As mentioned above, the researchers involved in this study also developed a “treatment outcome prognosis calculator”, which was developed with the intention of aiding in:

  1. The identification of patients requiring a high level of care
  2. Identifying individuals who are most likely to benefit from outcome monitoring
  3. Highlighting baseline patient characteristics that contribute to poor prognosis
  4. Providing a more efficient and targeted allocation of limited program resources

How did they do it?

The sole inclusion criterion for participants in this study was a diagnosis of current major depressive disorder. The full sample of participants reported average depression scores derived from the Patient Health Questionnaire-9 (PHQ-9) in the “moderately severe” range. 71.1% of the sample reported diagnostic comorbidity with an anxiety disorder,

26% had scores on the McLean Screening Instrument for borderline personality disorder (MSI-BPD) above the cut-off suggesting a borderline personality disorder (BPD) diagnosis, 70% of the sample endorsed suicidal ideation, and over half (54.1%) of the sample had previously been hospitalized for psychiatric concerns.

The sample of participants included in this study received behavioral health treatment at McLean Hospital, where they attended daily treatment sessions consisting of ~5 hours of clinical services. Most treatment was delivered via CBT groups, but patients also received psycho-education, attended behavioral activation skills groups, and had the opportunity to attend more in-depth CBT focus groups. Patients also received a structured diagnostic interview that was conducted the day following their admission into the program, and completed a battery of self-report questionnaires each morning before treatment. The Mini-International Neuropsychiatric Interview (MINI) was used to diagnose major depressive episodes, manic/hypomanic episodes, panic disorder, obsessive compulsive disorder, generalized anxiety disorder, social anxiety disorder, PTSD, alcohol abuse and dependence, and psychosis. The PHQ-9 was used to assess depressive symptoms. Treatment outcome predictor variables included clinical measures derived from the PHQ-9 and other symptom identifiers similar to the PHQ-9, demographic characteristics, psychiatric medication use, and physical health variables (e.g., weight, blood pressure, body mass index).

With the goal of applying prognostic models to new patients, the researchers divided their sample into a holdout sample (test sample), which consisted of the 20% of participants (n = 97) who most recently entered treatment, and a training sample which consisted of the remaining 80% (n = 387). The researchers then implemented and compared a range of machine learning algorithms. The best performing model from the training sample was then applied to patients in the holdout sample to predict post-treatment PHQ-9 scores.

What did they find?

Depression outcomes in the full sample varied widely, thus the authors reported substantial variance in their dependent variable. Nonetheless, analyses indicated that ENR was associated with the lowest prediction error. ENR is somewhat of a less complex approach, and is considered to be easily interpretable. Of the 51 baseline variables included in the final ENR model, 14 emerged as significant predictors of poor treatment prognosis including: higher depression and anxiety severity, greater fatigue and difficulties concentrating, heightened BPD symptoms, more relationship problems, more pessimistic expectations of symptom improvement, prior treatment at an intensive outpatient program or partial hospital program, identifying as White, earlier age of MDD onset, co-morbid diagnoses of OCD, PTSD or SAD, and a mood stabilizer prescription. When applying the ENR model to predictive analysis in the holdout sample, the predicted post treatment PHQ-9 scores differed from observed values by 2.65 points (on a 27-point scale).

What does it all mean (our take)?

We found this article to be absolutely brilliant. The application of higher level statistical approaches in the field of mental health is proving to provide high value when it comes to developing more individualized intervention plans.

But the question is - why isn’t this being done more often? Well, historically we have fallen below the mark when it comes to systematically evaluating and monitoring patients with mental health problems. This shortcoming has led to an inability to creatively apply statistical techniques that require larger sample sizes to train and optimize predictive models (i.e., to use machine learning techniques). This study developed such models using a relatively small sample of 484 participants. We can’t help but sit here and imagine the power of these models if we had thousands or tens-of-thousands of data points to train them...

We feel that as we continue to improve our data collection techniques, our ability to provide highly individualized treatment will improve concurrently. A careful combination of data analytics and provider decision-making will ultimately help us achieve our goal of maximizing quality of care for every patient experiencing mental health difficulties

Join the Conversation