
HealthFraudMLChain
Healthcare insurance fraud detection using an Optuna-tuned ML ensemble with SHA-256 blockchain audit trails and ECIES encryption, achieving F1 = 0.7345 on 5,410 Medicare providers.
F1 Score
0.7345
Providers
5,410
Blockchain
110 blocks
Highlights
- F1 = 0.7345 leakage-free ensemble
- ECIES encryption (secp256k1 + AES-256-GCM)
- 110-block SHA-256 blockchain audit trail
- SHAP + LIME dual explainability
Overview
HealthFraudMLChain is a healthcare insurance fraud detection system built as an M.Sc. dissertation at NIT Patna. It combines a weighted ensemble of five Optuna-tuned classifiers with a SHA-256 blockchain audit trail and ECIES encryption for provider PII protection.
Key Results
- F1 = 0.7345 (fixed threshold t = 0.444, 10-fold stratified CV, zero data leakage)
- Precision: 73.7%, Recall: 74.7%, ROC-AUC: 0.9587
- Friedman test p = 0.00089 confirming significant model differences
- Bootstrap 95% CI: F1 [0.7118, 0.7715]
Technical Architecture
The system aggregates 558,211 Medicare claims into 5,410 provider-level records with 190 engineered features. Five gradient-boosted classifiers (XGBoost, LightGBM, CatBoost, GradientBoosting, RandomForest) are tuned with Optuna TPE (60 trials each) and combined via AUC-PR-optimized weighted voting. Predictions are encrypted with ECIES (secp256k1 + HKDF-SHA256 + AES-256-GCM) and recorded on a custom SHA-256 blockchain with Merkle tree integrity verification.
Explainability
SHAP (global feature importance) and LIME (local conditional rules) provide dual explainability. Top fraud indicators: deductible payment patterns, maximum reimbursement amounts, and claim duration anomalies.
