Your Case Study Title

Depression is a silent and invisible enemy that seems to have been ignored for so long that it’s time that we stand together and take notice. With competitive careers and stressful jobs, we often ignore the need to balance our lives and maintain the harmony of the mind. According to WHO data, depression is a common illness in the world that affects more than 264 million people. And at its worst, it leads to suicides; suicide single-handedly is the second leading cause of death for people between the age of 15–29.

We all are aware that medical science now has preventive methods for depression but only when we seek help or have the necessary facilities to do so. This is a disease that, like all viruses, affects us all irrespective of our class, colour, gender, and caste. And with the pandemic hitting us right in the middle of our gut, we are seeing a steep rise in the cases of anxiety and depression. In COVID-19 monitoring mental health is of utmost importance due to the increasing no. of job losses, salary cut, personal and financial losses. Below we will talk about the steps of how we can track your mental health through social media platforms.

Problem Statement

How can we help to detect depression/suicidal tendencies among people by their posts on Social Media (Twitter, Instagram)?

The goal of this project is in 2 folds:

  1. To show how depression leads to suicidal tendencies

  2. How to predict suicidal tendencies among people


Your Case Study Title
Your Case Study Title

Step 1 - Data Collection

The first step of this process involves collecting data from various Suicidal/ Depression forums. For our project, we used the data from

This website is mainly a forum that consists of separate depression and suicidal threads; it also has advice and instruction. This forum acted as a fair source of

  • Depression Post

  • Suicidal Post

  • Advice/ Normal post

It was easy for us to label the posts manually since the website already has different types of threads. We scraped the website to collect the data. Various scraping tools are available, for the study we used a Chrome plugin called Web scraper (

Step 2 - Data Cleaning & Preparation

Once we have the raw data with us, it’s essential to clean the data by removing all the duplicate sentences, any URLs, whitespaces, user names, and stopwords (basically noise in the data) that are not relevant to our study. We also removed brackets, dashes, colons, or any other symbols that were present in the data. Finally, the dataset has 600 posts which include (190 depression posts, 180 suicidal posts, and the rest of non-suicidal/depression posts)

Apart from the data cleaning, we have to label the post across the 3 different categories (suicidal, depression & normal) which was done during this stage.

Your Case Study Title

Step 3 - Data Exploration & Discovery

Before training the data, we thought of analyzing depression & suicidal posts by looking into words & topics.

Step 3.1 - WordCloud

WordCloud is used to provide a visual understanding of the data. The word cloud shown here is specific to “Depression post”. It can be clearly seen from the word cloud that most of the depression posts speak about depression and they seek help and need advice.

Your Case Study Title

Step 3.2 — Topic Modelling

We performed a Machine learning analysis(called Topic Modelling) to analyze text data to find word groups and similar expressions. This is actually a Text analysis that helps us in gaining insights by classifying data by various topics. We extracted 3 topics from posts (depression and suicide).

Topic Modelling gave us a list of words and a score indicating how many times each word was discussed in each topic. We identified the top 6 words from each topic and used the same to label each topic accordingly.

(Kindly note the labelling of the topics Care, Thought and Reach is completely subjective. The number of topics to be identified can also be varied)

Your Case Study Title

It was interesting to find ‘work’ under two topics — Care and Thoughts. Does that mean work is one of the major reasons discussed under depression?

It is relevant from the graph that Reach was the most common topic being discussed across both Suicidal and Depression posts. Thus, it is all the more important to listen and assist the former before they crossover to the latter. These were interesting observations that need more relevant data and require a separate study.

Your Case Study Title

Step 3.3 - Does Depression Leads to Suicide

This is one of the goals of this study, to check whether depression really leads to suicidal tendencies. Topic Modelling also gave us an output indicating to what extent each topic was discussed across all the posts in threads. When we compared the results of the Depression post and the Suicidal post, we found that the posts in both were quite similar in terms of the topics discussed.
Using the top words from each topic identified in the previous step as attributes, we ran Similarity Analysis across all posts. This helped us in identifying how posts about depression and suicide are very alike.

Thus, we can conclude that Depression leads to Suicidal Tendency

Step 3.4 - Sentiment Analysis

We all know the word suicidal and depression have negative emotions. Just to check what the machine thinks about the suicidal post, we thought of doing sentiment analysis. Sentiment analysis is an interpretation of emotions (positive, negative, and neutral). It can be done with the help of scripts or tools. In our case, we used a tool called MonkeyLearn

You can clearly see that suicidal post has a Negative sentiment with 98.4% confidence.

Your Case Study Title

Step 4 - Machine Learning Algorithm

Machine Learning is the concept of programming in the computer in such a way that it studies the available data, and develops a relationship that can be later used to perform predictions. This is the premise of Artificial Intelligence.

Taking the current study as an example, the model will study existing posts and build a relationship, ie, which kind of posts in social media can suggest that the person is going through depression or that he/she might be suicidal. This is called model training.

For this purpose, we labeled the posts which we had collected into their respective threads, ie, depression, suicidal or normal, and then randomly divided them into a training set and a test set. The training was used to train a machine learning or ML model. We started with Logistic Regression for simplification and easy interpretation for this study. We also tried other models like SVM (support vector machine) which resulted in better accuracy.

AUROC (Area Under ROC curve) is a popular metric used to measure such models and a score above 0.5 can indicate that the model is already running better than a random prediction, ie, without any machine learning in place.


Now, we have developed a model that can help in predicting suicidal tendencies. How can we make it useful and save the lives of millions of people?

One of the ways to find people is through User Generated Content. Millions of people are online today with the help of social media platforms like Twitter, Facebook, Instagram, Reddit, etc. It’s a platform for them to not only interact with others but also convey their feelings, thoughts, and emotions. If our model can detect those depressing tweets, maybe we can save a few thousand lives.

Image alt tag

Image alt tag

We thought of choosing Twitter as a platform since people convey their feelings and are straightforward on Twitter. We thought of picking up 2 significant profiles:

Chester Bennington — American Singer-Songwriter who committed suicide on 2017

Deepika Padukone — Indian Actress — who was going through depression

Your Case Study Title

Scraping Twitter Data & Predicting

After the selection of profiles for prediction, we scraped the Tweets from both profiles and separated them according to the year. We tested all the texts in the order of the year it was posted and what we noticed was something insightful. The suicidal tendency of Chester Bennington increased over the year and he finally committed suicide in 2017 whereas the depression level of Deepika Padukone has decreased and it seems she is leading a happy married life.


There are a couple of limitations to this study.

The study was with a basic Machine Learning algorithm i.e. Logistic Regression & SVM where accuracy is not higher. The accuracy of this model can be further improved by Deep Learning algorithms like the Long Short Term Memory network (LSTM) model, which can be further combined with CNN Model.

The data set was very limited and consist of only 600 posts. We can collect data from various other forums and platforms. This will help us to further increase the accuracy of the model.


Our aim is merely to create a prototype to support the premise that newer technologies like machine learning and deep learning can be leveraged to solve social issues; depression and suicides in this case. People with depression in most countries do not get the right attention or avoid reaching out because of certain social stigmas attached to it. Though it is impossible for us to cover every nook and corner, the intention here was to create a social model based on the user data available.

What are we trying to achieve? We merely wanted to give out the idea that these types of applications can be used to predict user behavior based on their social media usage, not only the posts they create but also the posts they consume. AI, and ML models can not only be used by multinational organizations to grow their business but also can be used for a social cause. The findings can be used to identify and target the potential cases to specific motivational and inspiring groups/pages and self-help forums. Therapists and wellness organizations can leverage this opportunity to offer aid through social media applications by curating the target audience that needs help. Creating an API-based solution to integrate it with various applications in the future will be a step forward toward ensuring that people have access to information and healthcare.

This study was done as a part of an academic project at SP Jain Institute of Management & Research by Antara Datta, Suraj Chakraborty & Sugat Nayak, under the guidance of our professor Dr. Anitesh Barua.