Student Depression Prediction Using Artificial Neural Networks
Deep learning model predicting mental health risks with comprehensive student profiling and real-time probability assessment
Project Overview
A machine learning application that predicts student depression likelihood using Artificial Neural Networks. The system analyzes 13 different features including demographics, academic performance, lifestyle factors, and mental health indicators to provide real-time probability assessments through an intuitive Streamlit interface.
Challenge
Mental health issues among students are rising, but early detection remains challenging. Traditional screening methods are time-consuming, subjective, and often inaccessible. There's a critical need for an automated, data-driven approach that can identify at-risk students early using objective indicators like academic pressure, sleep patterns, CGPA, and lifestyle factors.
Streamlit interface walkthrough: The application collects 13 features including personal information (gender, age), academic metrics (CGPA, academic pressure), lifestyle factors (sleep duration, dietary habits), mental health indicators (family history, suicidal thoughts), and satisfaction levels to generate a comprehensive depression risk assessment with real-time probability scores.
Technical Approach
Data Collection & Understanding
Utilized the Student Depression Dataset containing 13 features across multiple dimensions of student life. The dataset includes demographic, academic, lifestyle, and psychological indicators.
- Demographics: Gender, Age, Profession
- Academic: CGPA, Academic Pressure, Study Satisfaction
- Lifestyle: Sleep Duration, Dietary Habits, Work/Study Hours
- Mental Health: Family History, Suicidal Thoughts, Financial Stress
- Satisfaction: Job Satisfaction, Work Pressure
Data Preprocessing & Feature Engineering
Implemented comprehensive data preprocessing pipeline including encoding, scaling, and transformation of categorical and numerical features for optimal model performance.
# Label Encoding for categorical features
label_encoders = {
'gender': LabelEncoder(),
'profession': LabelEncoder(),
'dietary_habits': LabelEncoder(),
'family_history': LabelEncoder(),
'suicidal_thoughts': LabelEncoder()
}
# Feature Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Sleep duration mapping (categorical → numerical)
sleep_map = {
'Less than 5 hours': 0.2,
'5-6 hours': 0.4,
'7-8 hours': 0.6,
'More than 8 hours': 0.8
}
ANN Model Architecture
Designed and trained a deep Artificial Neural Network using TensorFlow/Keras with multiple hidden layers, dropout for regularization, and binary cross-entropy loss for depression classification.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation='relu', input_shape=(13,)),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.2),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid') # Binary classification
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy', 'AUC']
)
- Input Layer: 13 features (student profile)
- Hidden Layers: 128 → 64 → 32 neurons with ReLU activation
- Dropout: 30% and 20% to prevent overfitting
- Output Layer: Sigmoid activation for probability output
Streamlit Deployment
Built an interactive web application using Streamlit with comprehensive input forms, real-time predictions, and visual feedback for depression risk assessment.
- User-friendly input forms with sliders and dropdowns
- Real-time model inference with TensorFlow backend
- Probability visualization with progress bars
- Alert system for high-risk predictions
- Input data summary for transparency
- Pickle serialization for encoders and scalers
Feature Engineering & Encoding Strategy
📊 Categorical Encoding
Gender: 0 = Female, 1 = Male
Profession: 0 = Other, 1 = Student
Dietary Habits: 0 = Healthy, 1 = Moderate, 2 = Unhealthy
Family History: 0 = No, 1 = Yes
Suicidal Thoughts: 0 = No, 1 = Yes
⏰ Sleep Duration Mapping
Less than 5 hours: 0.2 (High risk)
5-6 hours: 0.4 (Moderate risk)
7-8 hours: 0.6 (Normal)
More than 8 hours: 0.8 (Optimal)
📏 Numerical Features (Scaled)
Age: 18-60 years (StandardScaler applied)
CGPA: 0.0-10.0 (standardized)
Pressure Scales: 0-5 (0=None, 5=Extreme)
Satisfaction Levels: 0-5 (0=Very Dissatisfied, 5=Very Satisfied)
Work/Study Hours: Continuous variable (scaled)
Financial Stress: 0-5 scale (standardized)
Results & Impact
🎯 Model Performance
- Binary classification with sigmoid output
- Real-time inference under 100ms
- Probability scores for risk assessment
- Trained on comprehensive student dataset
🔍 Prediction Capabilities
- 13-feature comprehensive profiling
- Probability output (0-100%)
- Binary prediction (Yes/No)
- Explainable input summary
💡 Use Cases
- University counseling centers
- Early intervention programs
- Student wellness screening
- Mental health research
📈 Example Prediction
Input Profile: 23-year-old male student with CGPA 3.94, moderate academic pressure (2/5), moderate work pressure (2/5), high study satisfaction (4/5), healthy diet, 8+ hours sleep
Prediction: 57.99% depression probability → At-risk classification
Despite high academic performance and healthy lifestyle, moderate pressure levels and age demographics contribute to elevated risk, demonstrating the model's nuanced understanding of multiple risk factors.
Technical Stack
Deep Learning
- TensorFlow 2.x
- Keras API
- Sequential ANN
Data Processing
- scikit-learn (encoders, scalers)
- Pandas
- NumPy
Web Application
- Streamlit
- Python 3.8+
Model Persistence
- Pickle (encoders/scalers)
- HDF5 (.h5 model)
Lessons Learned
Key Insights
- Feature Engineering is Critical: Proper encoding and scaling of categorical variables (gender, dietary habits) and sleep duration mapping significantly improved model performance and interpretability.
- Dropout Prevents Overfitting: Adding 30% and 20% dropout layers was essential for generalization, especially with limited training data in mental health datasets.
- Multi-dimensional Risk Factors: Depression risk isn't linear—students with high CGPA can still be at risk if pressure levels and satisfaction metrics indicate stress, demonstrating the value of holistic profiling.
- Probability vs Binary: Providing probability scores (57.99%) rather than just Yes/No predictions gives counselors more nuanced information for intervention prioritization.
- Serialization Challenges: Managing multiple pickle files (label encoders, scaler, model) requires careful version control to ensure consistency between training and deployment.
- Ethical Considerations: Mental health prediction models require careful deployment—predictions should support, not replace, professional clinical assessment.
How To Run It
Setup Instructions
- Create a virtual environment:
python -m venv venv - Activate the environment:
venv\Scripts\activate(Windows) orsource venv/bin/activate(Mac/Linux) - Install dependencies:
pip install -r requirements.txt - Ensure model files are present:
model.h5 scaler.pkl label_encoder_gender.pkl label_encoder_profession.pkl label_encoder_dietary_habits.pkl label_encoder_family_history.pkl label_encoder_suicidal.pkl sleep_map.pkl cgpa.pkl - Run the Streamlit application:
streamlit run app.py - Open your browser to
http://localhost:8501 - Input student features and get real-time depression predictions
Training Your Own Model
Use the Jupyter notebook ANN_Classification.ipynb to:
- Load and explore
student_depression_dataset.csv - Perform EDA and feature engineering
- Train the ANN model with your own hyperparameters
- Export model and preprocessing artifacts
Requirements
- Python 3.8 or higher
- TensorFlow 2.x (CPU or GPU)
- 8GB+ RAM recommended
Need a Custom ML Solution for Healthcare or Education?
I can help you build predictive models, classification systems, and deep learning applications tailored to your domain-specific challenges.