Skip to content

Feature: Calibration Error Metrics and Losses #8505

@theo-barfoot

Description

@theo-barfoot

Is your feature request related to a problem? Please describe.

Currently, MONAI does not provide built-in support for calibration metrics, such as Expected Calibration Error, or auxiliary calibration losses that can be used to improve the calibration of medical image segmentation networks. I have implemented these features, using the MONAI framework, which can be found at this repo, with unit testing, and corresponding publications here and here


Describe the solution you’d like

I propose adding my calibration metrics and handlers, as well as auxiliary calibration losses to MONAI.

  1. Calibration Metrics

    • CalibrationErrorMetric (subclass of CumulativeIterationMetric) supporting Expected (ECE), Average (ACE), and Maximum (MCE) reductions, with batched, per-class, and background-exclusion settings.
  2. Differentiable Calibration Losses

    • HardL1ACELoss and its compound variants (HardL1ACEandCELoss, HardL1ACEandDiceLoss, HardL1ACEandDiceCELoss)
    • SoftL1ACELoss and its compound variants (SoftL1ACEandCELoss, SoftL1ACEandDiceLoss, SoftL1ACEandDiceCELoss)
      These losses implement the L1 Average Calibration Error (ACE) in hard- and soft-binned form, with options for background exclusion, one-hot encoding, custom activation, and class weighting.
  3. Ignite Handlers

    • CalibrationError inheriting from IgniteMetricHandler, to attach calibration metrics to training and evaluation engines, with automatic logging and CSV export of per-image details.

Optionally, I also have visualisation method for reliability diagrams and reliability dataset histograms, however these may be harder to integrate nicely into the current MONAI framework.

Visualisation Utilities (maybe)


All components are already implemented and thoroughly unit-tested in Average-Calibration-Losses, and integrate into MONAI’s bundle-based pipelines out of the box.


Describe alternatives you’ve considered

  • Third-party libraries: Calibration Error Metric do exist in torchmetrics and net:cal but are not implemented via the CumulativeIterationMetric used in MONAI. Differentiable calibration losses for semantic segmentation are not implemented elsewhere.

Integrating these features into MONAI directly will give users a standardised, well-tested interface for calibration, reduce duplication, and promote reproducible, robust model evaluation.


I’m happy to contribute these components as a PR to MONAI. Please let me know which of these features would be useful and how best to align with MONAI’s architecture and naming conventions, and any feedback on API design or testing guidelines!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions