Hospital AKI Detection System

Kubernetes-based real-time acute kidney injury detection from live HL7 laboratory streams with pager alerts and fault-tolerant inference.

  • GitHub Link
  • |
  • Python
  • Kubernetes
  • Docker
  • HL7 / Kafka
  • Prometheus / Grafana

A production-grade ML system capable of processing streaming hospital lab results, running inference in real time, and dispatching clinical pager alerts when AKI risk is detected.

Built as a fully fault-tolerant system and validated with chaos-engineering-style failure injection — network partitions, pod evictions, node failures, database restarts, and message backlogs. Sustained 100% uptime over a two-week evaluation and exceeded the NHS baseline for AKI detection with an F3-score of 99.9%.

Key Aspects

  • Real-time streaming — ingested HL7 lab messages via a message broker, running ML inference with sub-second latency and dispatching pager alerts.
  • Fault tolerance — validated via chaos-monkey style failure injection: network partitions, pod evictions, node failures, database restarts, and message backlogs.
  • Observability — Prometheus metrics, Grafana dashboards, and SLO-based alerting with on-call runbooks for operational reliability.
  • Autoscaling & rollouts — horizontal pod autoscaling with graceful rolling deployments and zero-downtime upgrades.
  • Clinical performance — exceeded the NHS baseline for AKI detection, achieving an F3-score of 99.9% over a sustained two-week evaluation with 100% uptime.