https://healthcare-fraud-detection.pages.dev
Machine Learning
Blockchain
Healthcare

HealthFraudMLChain

Healthcare insurance fraud detection using an Optuna-tuned ML ensemble with SHA-256 blockchain audit trails and ECIES encryption, achieving F1 = 0.7345 on 5,410 Medicare providers.

F1 Score

0.7345

Providers

5,410

Blockchain

110 blocks

Highlights

  • F1 = 0.7345 leakage-free ensemble
  • ECIES encryption (secp256k1 + AES-256-GCM)
  • 110-block SHA-256 blockchain audit trail
  • SHAP + LIME dual explainability

Overview

HealthFraudMLChain is a healthcare insurance fraud detection system built as an M.Sc. dissertation at NIT Patna. It combines a weighted ensemble of five Optuna-tuned classifiers with a SHA-256 blockchain audit trail and ECIES encryption for provider PII protection.

Key Results

  • F1 = 0.7345 (fixed threshold t = 0.444, 10-fold stratified CV, zero data leakage)
  • Precision: 73.7%, Recall: 74.7%, ROC-AUC: 0.9587
  • Friedman test p = 0.00089 confirming significant model differences
  • Bootstrap 95% CI: F1 [0.7118, 0.7715]

Technical Architecture

The system aggregates 558,211 Medicare claims into 5,410 provider-level records with 190 engineered features. Five gradient-boosted classifiers (XGBoost, LightGBM, CatBoost, GradientBoosting, RandomForest) are tuned with Optuna TPE (60 trials each) and combined via AUC-PR-optimized weighted voting. Predictions are encrypted with ECIES (secp256k1 + HKDF-SHA256 + AES-256-GCM) and recorded on a custom SHA-256 blockchain with Merkle tree integrity verification.

Explainability

SHAP (global feature importance) and LIME (local conditional rules) provide dual explainability. Top fraud indicators: deductible payment patterns, maximum reimbursement amounts, and claim duration anomalies.

Related Projects

Like what you see? Let's talk.

I am always open to discussing new projects, collaborations, or opportunities.