This post will discuss a 2019 study titled, “Using Machine Learning to Identify Suicide Risk: A Classification Tree Approach to Prospectively Identify Adolescent Suicide Attempters” recently published by Hill, Oosterhoff, and Do in Archives of Suicide Research. This study aimed to evaluate the utility of classification tree analysis in developing a screen for suicide risk.


What did they do?

The authors of this study used classification tree analysis (a type of machine learning) to analyze data from the National Longitudinal Study of Adolescent to Adult Health and attempt to develop a set of suicide risk/attempt screening questions.


Why did they do it?

Suicide is among the top 10 leading causes of death in the United States, with just north of 40,000 deaths by suicide occurring annually. Interestingly, while there exists a vast amount of literature identifying suicide risk factors, much of this information is difficult to implement clinically. Said differently, clinicians  often understand the factors that may increase a patient’s risk for suicide, but often do not have convenient, efficient, and accurate approaches toward screening for and concluding level of risk. 

The authors propose that classification tree analysis can help determine the relevant questions to ask when evaluating patient suicide risk. Classification tree analysis leads to the development of a decision-tree for clinicians, walking them down a hierarchically ordered set of questions (i.e., nodes in the classification tree) from which patient responses indicate risk of suicidality.


How did they do it?

Before jumping into the specifics of this study, we should quickly explore what classification tree analysis is. In a nutshell, classification tree analysis is a machine learning approach that produces a set of rules (typically “if-then” rules) that can be used to categorize people into different groups - it classifies people, as the name of the method suggests. Decision-making rules are essentially listed hierarchically, with the strongest predictor coming first and subsequent predictors to follow.

Data from 4,834 participants (mean age at wave 1 = 16.15 years) who completed 2 waves of at-home interviews were analyzed. In addition to demographic data, interviews assessed suicide, depression, substance use, sexual activity/presence of sexually transmitted disease, engagement in risky behavior, and relationships. As you probably guessed, data from wave 1 were used to predict suicide risk/attempts at wave 2.


What did they find?

Ultimately, the authors produced 26 unique classification trees, each with varying degrees of sensitivity and specificity. For those who may be unaware, sensitivity is the “true positive” classification rate and specificity is the “true negative” classification rate. Often times, an increase in one of these values coincides with a decrease in the other, so evaluation of sensitivity/specificity is critical. 

The authors identified two specific decision trees that could be useful in identifying patients who are at an elevated risk for engaging in suicidal behavior. The trees differed in their sensitivity/specificity statistics, as well as their overall accuracy rates (ranging from 71.7% accuracy to 85.1% accuracy). 


Tree 1:

This tree was notably less complex and had moderate sensitivity and specificity statistics. It identified the following items - present at wave 1 - as relevant screening items for suicidal ideation at wave 2:

  1. History of suicidal ideation
  2. Frequency of feeling tired for no reason
  3. Perceived chance of getting a sexually transmitted disease
  4. Running away from home
  5. Asian/Pacific Islander ethnicity
  6. Having a friend die by suicide in the past year


Tree 2:

Conversely, this tree was a bit more complex and substituted high sensitivity for low specificity. It identified the following categories as important screener domains:

  1. Symptoms of depression/physiological symptoms
  2. Familial characteristics
  3. Risky behavior
  4. Sex/sexually transmitted disease-related variables
  5. Substance use
  6. Expected characteristics of a romantic relationship


What does it all mean (our take)?

Suicide and machine learning - two topics that are continuously gaining more traction in the literature. The authors’ application of big data methodology (i.e., machine learning) to try and address a real clinical issue is commendable, and we really enjoyed this article. 

The question is - are the results from this article applicable to today’s society? Technically data were collected nearly 2 decades ago...but, amidst the data being collected years ago, the variables that the authors examined are certainly still applicable to today’s society. If anything, the addition of currently relevant variables, such as social media use/experiences, may be additionally helpful in identifying risk for suicide in today’s adolescents. Practically, applying results from this methodology in a clinical setting is highly useful - it provides clinicians with a checklist of critical assessment items.

The ultimate take home in our eyes - big data methodology is the future of behavioral healthcare prediction. Taking large datasets and making sense of them using machine learning techniques will ultimately be the key to major breakthroughs in the world of mental health, and we look forward to playing a significant role in that journey.

Join the Conversation