Healthcare Data Warehousing & Business Intelligence
Enterprise-grade healthcare BI solution featuring complete data pipeline: OLTP operational database, Pentaho ETL workflows, dimensional data warehouse, SSAS OLAP cubes, and interactive Power BI dashboards
Project Overview
Designed and implemented a comprehensive healthcare Business Intelligence solution from the ground up, covering the entire data lifecycle: operational database design (OLTP), ETL pipeline development, dimensional data warehouse architecture, OLAP cube creation, and interactive dashboard visualization. This end-to-end project demonstrates proficiency in modern data warehousing principles, dimensional modeling, ETL orchestration, and healthcare analytics.
Business Challenge
Healthcare organizations generate massive amounts of operational data across departments, patient admissions, medications, nursing staff, and appointments. However, this transactional data resides in normalized OLTP systems optimized for day-to-day operations, not for analytical queries or strategic insights. Stakeholders needed a robust BI infrastructure that could:
- Consolidate Data: Integrate patient, department, medication, nurse, and appointment data from multiple operational sources
- Enable Analytics: Transform transactional data into a dimensional model optimized for complex analytical queries and aggregations
- Support Decision-Making: Provide hospital administrators, department heads, and healthcare analysts with actionable insights through interactive dashboards
- Track Performance: Monitor key healthcare KPIs including patient admissions, department performance, nurse workloads, medication usage, and appointment trends
- Ensure Scalability: Build a flexible architecture that can grow with increasing data volumes and evolving analytical requirements
This project addresses these challenges by implementing a complete data warehousing and BI solution following industry best practices and dimensional modeling methodologies.
System Architecture
The solution follows a classic data warehousing architecture with clear separation of concerns across five distinct layers:
1. OLTP Source Database
SQL Server - Healthcare operational database with 8 normalized tables
Patients | Departments | Nurses | Medications | Appointments | Lab Reports
2. ETL Pipeline - Pentaho Data Integration
Extract, Transform, Load workflows with data cleansing and validation
Dimension ETL (5) | Fact Table ETL (5) | Job Orchestration
3. Data Warehouse - Star Schema
SQL Server dimensional model optimized for analytical queries
Dimensions (6) | Fact Tables (5) | Optimized for OLAP
4. OLAP Cubes - SSAS Multidimensional
Pre-aggregated cubes for fast multi-dimensional analysis
4 Cubes | Measures | Hierarchies | Partitions | MDX Queries
5. Presentation Layer
Interactive dashboards and web-based query interface
Power BI Dashboards | Flask Web App | MDX Query Interface
Architecture Highlights
- Layered Architecture: Clean separation between operational, ETL, warehouse, OLAP, and presentation layers
- Star Schema Design: Dimensional modeling with 6 dimension tables and 5 fact tables for optimal query performance
- Automated ETL: Pentaho workflows with error handling, logging, and incremental load capabilities
- OLAP Acceleration: Pre-aggregated cubes with partitions for sub-second query response times
- Multi-Channel Access: Both interactive dashboards (Power BI) and programmatic access (MDX queries via Flask)
Technical Implementation
OLTP Database Design
Created a normalized operational database following 3NF principles to support healthcare operations:
- Departments Table: Hospital departments with bed capacity and descriptions (Primary Key: department_id)
- Care Type Table: Types of medical care offered with average durations (Primary Key: care_type_id, FK: department_id)
- Nurses Dataset: Nursing staff information with joining dates and shift assignments (Primary Key: nurse_id, FK: department_id)
- Nurse Education & Training: Nurse qualifications, certifications, specializations, and experience (Primary Key: training_id, FK: nurse_id, department_id)
- Medications Table: Medication catalog with dosage, frequency, administration routes (Primary Key: medication_id, FK: department_id)
- Patients Dataset: Patient records with demographics, conditions, admission/discharge details (Primary Key: patient_id, FK: care_type_id, medication_id)
- Lab Reports: Laboratory test results linked to patients and nurses (Primary Key: report_id, FK: patient_id, nurse_id)
- Doctor Appointments: Appointment scheduling with status tracking (Primary Key: appointment_id, FK: patient_id, nurse_id)
- Implemented referential integrity constraints, check constraints, and proper indexing for transactional performance
- Designed for ACID compliance ensuring data consistency and reliability
Data Warehouse - Dimensional Modeling
Designed and implemented a star schema data warehouse optimized for analytical queries:
Dimension Tables:
- dim_patient: Patient dimension with slowly changing dimension (SCD) Type 1 for current patient information
- dim_department: Department dimension with hierarchical structure for organizational analysis
- dim_medication: Medication dimension with drug classification and administration details
- dim_nurse: Nurse dimension with department assignments and shift patterns
- dim_care_type: Care type dimension for analyzing different treatment modalities
- dim_date: Date dimension with day, month, year, weekday for comprehensive time-based analysis
Fact Tables:
- fact_patient_admissions: Grain: One row per patient admission. Tracks admission/discharge dates, room assignments, care types, and medications
- fact_medication_usage: Grain: One row per medication administration. Captures medication usage patterns, prescriber information, and administration dates
- fact_nurse_workload: Grain: One row per nurse per day. Monitors nurse workload, patient assignments, and shift patterns for resource optimization
- fact_department_performance: Grain: One row per department per day. Aggregates department metrics including average stay duration and patient counts
- fact_appointments: Grain: One row per appointment. Tracks appointment scheduling, completion status, and patient-nurse interactions
- Implemented surrogate keys for all dimensions to handle slowly changing dimensions
- Created composite indexes on fact table foreign keys for optimal join performance
- Denormalized dimensions where appropriate to reduce query complexity
ETL Pipeline - Pentaho Data Integration
Developed robust ETL workflows using Pentaho Data Integration (Kettle) for automated data movement:
Dimension ETL Transformations:
- CareType.ktr: Extracts care type data, applies business rules, loads dim_care_type with lookup transformations
- Date.ktr: Generates date dimension with calendar calculations, fiscal periods, and holiday flags
- Department.ktr: Loads department dimension with data quality checks and deduplication
- Medication.ktr: Processes medication data with drug classification and standardization logic
- Nurses.ktr: Handles nurse dimension with shift pattern validation and department assignment
- Patients.ktr: Loads patient dimension with data cleansing, age calculation, and demographic validation
Fact ETL Transformations:
- Patients Admission.ktr: Populates fact_patient_admissions with surrogate key lookups and date calculations
- Medication Usuage.ktr: Loads fact_medication_usage with temporal validation and outlier detection
- Nurse Workload.ktr: Calculates nurse workload metrics and loads fact_nurse_workload with aggregations
- Department Performance.ktr: Computes department KPIs including average length of stay and patient throughput
- Appointments.ktr: Processes appointment data with status validation and historical tracking
Job Orchestration:
- Job 1.kjb: Master job orchestrating dimension loads followed by fact table loads with dependency management
- Implemented error handling with email notifications and detailed logging (hospital_etl_log1.txt)
- Created etl.bat batch file for scheduling and automated execution
- Built incremental load logic using watermark tables for efficient updates
- Configured parallel execution for independent transformations to optimize runtime
OLAP Cubes - SSAS Multidimensional
Designed and deployed SSAS (SQL Server Analysis Services) multidimensional cubes for advanced analytics:
Cube Structures:
- Appointments Cube (Appointments.cube):
- Measures: Appointment count, completion rate, cancellation rate, average wait time
- Dimensions: Date, Patient, Nurse, Department
- Hierarchies: Date (Year → Quarter → Month → Day), Department (Hospital → Department)
- Partitions (Appointments.partitions): Monthly partitions for efficient processing and querying
- Department Performance Cube (Department Performance.cube):
- Measures: Patient count, average length of stay, bed occupancy rate, department efficiency
- Dimensions: Date, Department, Care Type
- Calculated Members: Occupancy percentage, patient turnover rate, efficiency score
- Partitions (Department Performance.partitions): Quarterly partitions with aggregation design
- Medication Usage Cube (Medication Usuage.cube):
- Measures: Medication quantity, usage frequency, cost per medication, prescription rate
- Dimensions: Date, Medication, Patient, Department, Nurse
- KPIs: High-utilization medications, medication adherence rate, cost efficiency
- Partitions (Medication Usuage.partitions): Date-based partitions with incremental processing
- Additional Cubes: Patient Admissions cube tracking admission patterns and length of stay analytics
SSAS Project Components:
- Data Source (DW Health Care.ds): Connection to SQL Server data warehouse with integrated security
- Data Source View (DW Health Care.dsv): Logical view defining relationships between dimensions and facts
- Dimension Definitions: Dim Care Type.dim, Dim Date.dim, Dim Department.dim, Dim Medication.dim, Dim Nurse.dim, Dim Patient.dim
- Fact Definitions: Fact Appointments.dim, Fact Medication Usage.dim, Fact Nurse Workload.dim, Fact Patient Admissions.dim
- Database Configuration (HealthCare Cube.database): Processing options, caching strategies, aggregation design
- Implemented attribute relationships for optimal aggregation and query performance
- Created user hierarchies enabling drill-down/drill-up navigation
- Configured partition strategies for large fact tables to enable parallel processing
- Built calculated measures and KPIs using MDX (Multidimensional Expressions)
Presentation Layer
Developed multiple interfaces for data access and visualization:
Power BI Dashboards:
- Dashboard.pbix: Executive dashboard with high-level KPIs and trend analysis
- OLAP Cube.pbix: Power BI connected directly to SSAS cubes for real-time OLAP reporting
- Interactive visualizations: KPI cards, line charts, bar charts, heat maps, geographic maps
- Drill-through pages for detailed analysis from summary views
- Slicers and filters for dynamic data exploration
- Mobile-optimized layouts for on-the-go access
Flask Web Application (Frontend/):
- app.py (639 lines): Flask web server providing multiple interfaces:
- Dashboard page displaying database tables and summary statistics
- Insert forms for data entry into OLTP and DW databases
- View table interface for browsing dimension and fact tables
- Query interface for executing MDX queries against SSAS cubes
- DW Healthcare page for data warehouse schema exploration
- Templates: Jinja2 HTML templates with responsive design
- base.html: Base template with navigation and common layout
- dashboard.html: Main dashboard with table listings
- dw_healthcare.html: Data warehouse schema viewer
- insert_form.html: Dynamic form for data insertion
- insert_hub.html: Central hub for selecting insert operations
- query_interface.html: MDX query editor with syntax highlighting
- view_table.html: Data grid for viewing table contents
- Static Assets: style.css for consistent UI styling and branding
- Integrated pyodbc for SQL Server connectivity and pyadomd for SSAS/MDX queries
- Implemented CRUD operations for both OLTP and data warehouse databases
- Built foreign key dropdown menus for maintaining referential integrity during data entry
Key Features & Capabilities
🏥 Complete Healthcare Analytics
End-to-end BI solution covering patient care, department operations, medication management, nurse workload, and appointment scheduling with full traceability from source to dashboard.
⚡ High-Performance OLAP
Pre-aggregated multidimensional cubes with partitioning and intelligent caching deliver sub-second query response times even for complex analytical queries across millions of records.
🔄 Automated ETL Workflows
Pentaho ETL pipelines with job orchestration, error handling, logging, and incremental loads ensure reliable data movement from operational systems to the warehouse with minimal manual intervention.
📊 Interactive Dashboards
Power BI dashboards connected to SSAS cubes provide real-time insights with drill-down capabilities, interactive filters, and mobile-responsive layouts for decision-makers at all levels.
🌐 Web-Based Interface
Flask web application provides browser-based access to data warehouse operations, MDX query execution, and CRUD operations, enabling users to interact with the BI system without specialized tools.
📐 Dimensional Modeling
Star schema design with 6 dimension tables and 5 fact tables follows Kimball methodology, optimizing query performance while maintaining data integrity and supporting complex analytical scenarios.
Technologies & Tools
Database & Data Warehousing
ETL & Data Integration
Business Intelligence & Analytics
Application Development
Project Outcomes & Impact
Achievements
- Comprehensive BI Infrastructure: Successfully designed and implemented a complete data warehousing solution from source system to dashboards, demonstrating end-to-end BI expertise
- Performance Optimization: OLAP cubes with partitioning deliver sub-second query response for complex analytical queries, enabling real-time decision-making
- Automation: Pentaho ETL workflows automate data movement with error handling and logging, reducing manual effort and ensuring data consistency
- Scalable Architecture: Star schema design and partitioned cubes support future growth in data volume and analytical complexity
- Multi-Channel Access: Both Power BI dashboards and Flask web interface provide flexible access patterns for different user personas
- Healthcare-Specific Analytics: Domain-specific dimensions and measures enable meaningful healthcare KPIs including patient flow, nurse workload, medication usage, and department efficiency
Technical Skills Demonstrated
- Data warehouse design and dimensional modeling (Kimball methodology)
- ETL development and job orchestration using Pentaho Data Integration
- SSAS multidimensional cube design with partitioning and aggregation design
- MDX query development for complex analytical scenarios
- Power BI dashboard design with SSAS connectivity
- SQL Server database administration and T-SQL development
- Python web application development with Flask
- Healthcare domain knowledge and analytical modeling
Project Structure
The complete project is organized into the following components:
DW&BI/ ├── HealthCare_DB_DDL.sql # OLTP database schema (8 tables) ├── HealthCare_DB_Insert.sql # Sample data for OLTP database ├── DW_dim.sql # Data warehouse schema (star schema) │ ├── Pentaho ETL/ # ETL workflows │ ├── Job 1.kjb # Master job orchestration │ ├── CareType.ktr # Care type dimension ETL │ ├── Date.ktr # Date dimension ETL │ ├── Department.ktr # Department dimension ETL │ ├── Medication.ktr # Medication dimension ETL │ ├── Nurses.ktr # Nurse dimension ETL │ ├── Patients.ktr # Patient dimension ETL │ ├── Facts/ │ │ ├── Appointments.ktr # Appointment fact ETL │ │ ├── Department Performance.ktr │ │ ├── Medication Usuage.ktr │ │ ├── Nurse Workload.ktr │ │ └── Patients Admission.ktr │ ├── etl.bat # Batch execution script │ └── hospital_etl_log1.txt # ETL execution log │ ├── HealthCare Cube/ # SSAS multidimensional project │ ├── HealthCare Cube.sln # Visual Studio solution │ ├── HealthCare Cube/ │ │ ├── DW Health Care.ds # Data source │ │ ├── DW Health Care.dsv # Data source view │ │ ├── Dim *.dim # 6 dimension definitions │ │ ├── Fact *.dim # 5 fact definitions │ │ ├── Appointments.cube # Appointments cube │ │ ├── Appointments.partitions │ │ ├── Department Performance.cube │ │ ├── Department Performance.partitions │ │ ├── Medication Usuage.cube │ │ ├── Medication Usuage.partitions │ │ └── HealthCare Cube.database │ ├── Frontend/ # Flask web application │ ├── app.py # Main Flask application (639 lines) │ ├── templates/ │ │ ├── base.html │ │ ├── dashboard.html │ │ ├── dw_healthcare.html │ │ ├── insert_form.html │ │ ├── insert_hub.html │ │ ├── query_interface.html │ │ └── view_table.html │ └── static/ │ └── style.css │ ├── Dashboard.pbix # Power BI dashboard ├── OLAP Cube.pbix # Power BI with SSAS connection └── Project.pdf # Project documentation
Interested in Data Warehousing & BI Solutions?
Let's discuss how I can help design and implement enterprise-grade data warehousing and business intelligence solutions for your organization.