Student Depression Prediction Using Artificial Neural Networks

Deep learning model predicting mental health risks with comprehensive student profiling and real-time probability assessment

Role ML Engineer & Data Scientist

Timeline 2025

Tools TensorFlow, Keras, Streamlit, Python

Try Live Demo

57.99%

Depression Probability

Sample prediction

Input Features

Comprehensive profiling

ANN

Deep Learning

TensorFlow/Keras

Real-time

Predictions

Instant assessment

Project Overview

A machine learning application that predicts student depression likelihood using Artificial Neural Networks. The system analyzes 13 different features including demographics, academic performance, lifestyle factors, and mental health indicators to provide real-time probability assessments through an intuitive Streamlit interface.

Challenge

Mental health issues among students are rising, but early detection remains challenging. Traditional screening methods are time-consuming, subjective, and often inaccessible. There's a critical need for an automated, data-driven approach that can identify at-risk students early using objective indicators like academic pressure, sleep patterns, CGPA, and lifestyle factors.

Student Depression Prediction input interface showing personal information section with gender dropdown set to Male and age slider set to 23 years

Academic and work pressure inputs showing sliders for academic pressure (2/5), work pressure (2/5), CGPA input (3.94), and satisfaction levels including study satisfaction (4/5)

Prediction results showing 57.99% depression probability with alert indicating student is likely to have depression

Input data summary displaying all encoded values used for prediction including gender, age, CGPA, pressure levels, and satisfaction metrics

Streamlit interface walkthrough: The application collects 13 features including personal information (gender, age), academic metrics (CGPA, academic pressure), lifestyle factors (sleep duration, dietary habits), mental health indicators (family history, suicidal thoughts), and satisfaction levels to generate a comprehensive depression risk assessment with real-time probability scores.

Technical Approach

Data Collection & Understanding

Utilized the Student Depression Dataset containing 13 features across multiple dimensions of student life. The dataset includes demographic, academic, lifestyle, and psychological indicators.

Demographics: Gender, Age, Profession
Academic: CGPA, Academic Pressure, Study Satisfaction
Lifestyle: Sleep Duration, Dietary Habits, Work/Study Hours
Mental Health: Family History, Suicidal Thoughts, Financial Stress
Satisfaction: Job Satisfaction, Work Pressure

Data Preprocessing & Feature Engineering

Implemented comprehensive data preprocessing pipeline including encoding, scaling, and transformation of categorical and numerical features for optimal model performance.

# Label Encoding for categorical features
label_encoders = {
    'gender': LabelEncoder(),
    'profession': LabelEncoder(),
    'dietary_habits': LabelEncoder(),
    'family_history': LabelEncoder(),
    'suicidal_thoughts': LabelEncoder()
}

# Feature Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Sleep duration mapping (categorical → numerical)
sleep_map = {
    'Less than 5 hours': 0.2,
    '5-6 hours': 0.4,
    '7-8 hours': 0.6,
    'More than 8 hours': 0.8
}

ANN Model Architecture

Designed and trained a deep Artificial Neural Network using TensorFlow/Keras with multiple hidden layers, dropout for regularization, and binary cross-entropy loss for depression classification.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, activation='relu', input_shape=(13,)),
    Dropout(0.3),
    Dense(64, activation='relu'),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')  # Binary classification
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy', 'AUC']
)

Input Layer: 13 features (student profile)
Hidden Layers: 128 → 64 → 32 neurons with ReLU activation
Dropout: 30% and 20% to prevent overfitting
Output Layer: Sigmoid activation for probability output

Streamlit Deployment

Built an interactive web application using Streamlit with comprehensive input forms, real-time predictions, and visual feedback for depression risk assessment.

User-friendly input forms with sliders and dropdowns
Real-time model inference with TensorFlow backend
Probability visualization with progress bars
Alert system for high-risk predictions
Input data summary for transparency
Pickle serialization for encoders and scalers

Feature Engineering & Encoding Strategy

📊 Categorical Encoding

Gender: 0 = Female, 1 = Male

Profession: 0 = Other, 1 = Student

Dietary Habits: 0 = Healthy, 1 = Moderate, 2 = Unhealthy

Family History: 0 = No, 1 = Yes

Suicidal Thoughts: 0 = No, 1 = Yes

⏰ Sleep Duration Mapping

Less than 5 hours: 0.2 (High risk)

5-6 hours: 0.4 (Moderate risk)

7-8 hours: 0.6 (Normal)

More than 8 hours: 0.8 (Optimal)

📏 Numerical Features (Scaled)

Age: 18-60 years (StandardScaler applied)

CGPA: 0.0-10.0 (standardized)

Pressure Scales: 0-5 (0=None, 5=Extreme)

Satisfaction Levels: 0-5 (0=Very Dissatisfied, 5=Very Satisfied)

Work/Study Hours: Continuous variable (scaled)

Financial Stress: 0-5 scale (standardized)

Results & Impact

🎯 Model Performance

Binary classification with sigmoid output
Real-time inference under 100ms
Probability scores for risk assessment
Trained on comprehensive student dataset

🔍 Prediction Capabilities

13-feature comprehensive profiling
Probability output (0-100%)
Binary prediction (Yes/No)
Explainable input summary

💡 Use Cases

University counseling centers
Early intervention programs
Student wellness screening
Mental health research

📈 Example Prediction

Input Profile: 23-year-old male student with CGPA 3.94, moderate academic pressure (2/5), moderate work pressure (2/5), high study satisfaction (4/5), healthy diet, 8+ hours sleep

Prediction: 57.99% depression probability → At-risk classification

Despite high academic performance and healthy lifestyle, moderate pressure levels and age demographics contribute to elevated risk, demonstrating the model's nuanced understanding of multiple risk factors.

Technical Stack

Deep Learning

TensorFlow 2.x
Keras API
Sequential ANN

Data Processing

scikit-learn (encoders, scalers)
Pandas
NumPy

Web Application

Streamlit
Python 3.8+

Model Persistence

Pickle (encoders/scalers)
HDF5 (.h5 model)

Lessons Learned

Key Insights

Feature Engineering is Critical: Proper encoding and scaling of categorical variables (gender, dietary habits) and sleep duration mapping significantly improved model performance and interpretability.
Dropout Prevents Overfitting: Adding 30% and 20% dropout layers was essential for generalization, especially with limited training data in mental health datasets.
Multi-dimensional Risk Factors: Depression risk isn't linear—students with high CGPA can still be at risk if pressure levels and satisfaction metrics indicate stress, demonstrating the value of holistic profiling.
Probability vs Binary: Providing probability scores (57.99%) rather than just Yes/No predictions gives counselors more nuanced information for intervention prioritization.
Serialization Challenges: Managing multiple pickle files (label encoders, scaler, model) requires careful version control to ensure consistency between training and deployment.
Ethical Considerations: Mental health prediction models require careful deployment—predictions should support, not replace, professional clinical assessment.

How To Run It

Setup Instructions

Create a virtual environment: python -m venv venv
Activate the environment: venv\Scripts\activate (Windows) or source venv/bin/activate (Mac/Linux)
Install dependencies: pip install -r requirements.txt

Ensure model files are present:

model.h5
scaler.pkl
label_encoder_gender.pkl
label_encoder_profession.pkl
label_encoder_dietary_habits.pkl
label_encoder_family_history.pkl
label_encoder_suicidal.pkl
sleep_map.pkl
cgpa.pkl

Run the Streamlit application: streamlit run app.py
Open your browser to http://localhost:8501
Input student features and get real-time depression predictions

Training Your Own Model

Use the Jupyter notebook ANN_Classification.ipynb to:

Load and explore student_depression_dataset.csv
Perform EDA and feature engineering
Train the ANN model with your own hyperparameters
Export model and preprocessing artifacts

Requirements

Python 3.8 or higher
TensorFlow 2.x (CPU or GPU)
8GB+ RAM recommended

Need a Custom ML Solution for Healthcare or Education?

I can help you build predictive models, classification systems, and deep learning applications tailored to your domain-specific challenges.

Let's Talk View More Projects