-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Is your feature request related to a problem? Please describe.
Currently, MONAI does not provide built-in support for calibration metrics, such as Expected Calibration Error, or auxiliary calibration losses that can be used to improve the calibration of medical image segmentation networks. I have implemented these features, using the MONAI framework, which can be found at this repo, with unit testing, and corresponding publications here and here
Describe the solution you’d like
I propose adding my calibration metrics and handlers, as well as auxiliary calibration losses to MONAI.
-
Calibration Metrics
CalibrationErrorMetric(subclass ofCumulativeIterationMetric) supporting Expected (ECE), Average (ACE), and Maximum (MCE) reductions, with batched, per-class, and background-exclusion settings.
-
Differentiable Calibration Losses
HardL1ACELossand its compound variants (HardL1ACEandCELoss,HardL1ACEandDiceLoss,HardL1ACEandDiceCELoss)SoftL1ACELossand its compound variants (SoftL1ACEandCELoss,SoftL1ACEandDiceLoss,SoftL1ACEandDiceCELoss)
These losses implement the L1 Average Calibration Error (ACE) in hard- and soft-binned form, with options for background exclusion, one-hot encoding, custom activation, and class weighting.
-
Ignite Handlers
CalibrationErrorinheriting fromIgniteMetricHandler, to attach calibration metrics to training and evaluation engines, with automatic logging and CSV export of per-image details.
Optionally, I also have visualisation method for reliability diagrams and reliability dataset histograms, however these may be harder to integrate nicely into the current MONAI framework.
Visualisation Utilities (maybe)
- Plotting functions
draw_case_reliability_diagramsanddraw_dataset_reliability_diagramsfor reliability diagrams and histograms, with customisable figure size, colormaps, and annotations. And their associated ReliabilityDiagramMetric and ReliabilityDiagramHandler
All components are already implemented and thoroughly unit-tested in Average-Calibration-Losses, and integrate into MONAI’s bundle-based pipelines out of the box.
Describe alternatives you’ve considered
- Third-party libraries: Calibration Error Metric do exist in torchmetrics and net:cal but are not implemented via the CumulativeIterationMetric used in MONAI. Differentiable calibration losses for semantic segmentation are not implemented elsewhere.
Integrating these features into MONAI directly will give users a standardised, well-tested interface for calibration, reduce duplication, and promote reproducible, robust model evaluation.
I’m happy to contribute these components as a PR to MONAI. Please let me know which of these features would be useful and how best to align with MONAI’s architecture and naming conventions, and any feedback on API design or testing guidelines!