Local-First Clinical Text Structuring with Fine-Tuned MedGemma for Readmission Risk Assessment

Loading...
Thumbnail Image
Date
2026
Journal Title
Journal ISSN
Volume Title
Publisher
https://zenodo.org
Abstract
Background. Unstructured clinical notes remain a bottleneck for deployable healthcare AI; cloud-dependent pipelines raise privacy and infrastructure barriers. Methods. We present MedGemma StructCore, a local-first two-stage extraction pipeline using compact MedGemma 4B models. Stage 1 applies Schema-Guided Reasoning to summarize notes into structured JSON across nine clinical clusters. Stage 2 projects summaries into canonical KVT4 (Cluster|Keyword|Value|Timestamp) facts via a LoRA-adapted model. Deterministic normalization, a signal-integrity gate, and offline hybrid regeneration audit and reduce silent objective signal-loss between stages. Prompt KV-cache reuse yields +10.6% speedup with bit-exact output [Verified]. Results. On MIMIC-IV (N=50,000; patient-level split; Ntest=9,857), the tabular baseline (A4) achieves AUROC 0.685 (95% CI 0.670–0.699) [Verified]. On the full canonical test split (Ntest=9,857), under a constrained training regime (Ntrain=1,500, Nval=400), A3factlevel achieves AUROC 0.659, AUPRC 0.321, and Brier 0.145. Against a fair tabular refit baseline (LogReg and XGBoost) with the same training split and demographic covariates, A3factlevel improves AUPRC and Brier [Verified], while AUROC uplift is small and not statistically verified [Preliminary]. Notably, XGBoost does not outperform logistic regression on the same feature set, confirming that downstream gains are attributable to KVT4 features rather than estimator choice. As a post-closure continuation branch, direct typed downstream fusion of four high-signal semantic labels improves the current Stage 2 baseline on the same canonical split and yields a verified AUPRC gain over the canonical A4 tabular arm [Verified], while remaining near-parity rather than clearly superior to A3factlevel. KVT4 format validity is 99.74%; a signal-integrity audit (N=4,000) finds 15.55% doc-level objective loss (among admissions with Stage 1 numeric vitals/labs), reduced to 8.48% by offline hybrid regeneration without additional LLM calls. Structured-reference validation now includes a large LABS benchmark on the full canonical test split and a preliminary VITALS benchmark path with chartevents-backed BP/Weight evaluation. A model scaling pilot replacing Stage 1 with GPT4.1-mini confirms that moderate LABS micro-F1 (≈0.52 ceiling) reflects reference-alignment mismatch rather than model capacity [Preliminary, N=200]. Conclusion. The primary contribution is reliable, auditable local-first clinical text structuring infrastructure running on consumer hardware. On the canonical test split, factlevel KVT tokenization improves precision–recall and probabilistic accuracy metrics (AUPRC, Brier) over a tabular refit baseline (Verified); AUROC uplift is small (Preliminary). Direct typed downstream fusion now provides the strongest verified continuation path over the current Stage 2 baseline, suggesting that typed semantic signals are a more promising next optimization target than further free-form Stage 2 generator variants. The current revision package therefore supports a conservative conclusion: notes-derived KVT4 facts add useful predictive signal, but stronger extraction-quality and fairness claims still require further validation.
Description
Keywords
TECHNOLOGY, SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science::Information technology, MEDICINE, TECHNOLOGY::Other technology::Medical engineering, MEDICINE::Physiology and pharmacology::Physiology::Medical informatics, MEDICINE::Physiology and pharmacology::Physiology::Medical technology
Citation
Заболотній С.В., Голінько В. Local-First Clinical Text Structuring with Fine-Tuned MedGemma for Readmission Risk Assessment. Zenodo (Preprint). 2026-02-19. https://doi.org/10.5281/zenodo.18701786