Skip to content

famura/binned-cdf

Repository files navigation

binned-cdf

License: CC-BY-4.0 python Docs CI CD Coverage Tests mkdocs-material mypy pre-commit pytest Ruff uv

A PyTorch-based distribution parametrized by the logits of CDF bins

Background

The Cumulative Distribution Function (CDF) is a fundamental concept in probability theory and statistics that describes the probability that a random variable $X$ takes on a value less than or equal to a given threshold $x$. Formally, the CDF is defined as $F(x) = P(X \leq x)$, where $F(x)$ ranges from 0 to 1 as $x$ varies from negative to positive infinity. The CDF provides a complete characterization of the probability distribution of a random variable: for continuous distributions, it is the integral of the probability density function (PDF), while for discrete distributions, it is the sum of probabilities up to and including $x$. Key properties of any CDF are the monotonicity and the boundary conditions $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$. CDFs are particularly useful for computing probabilities of intervals, quantiles, and for statistical inference.

Application to Machine Learning

This repository uses the CDF to model and learn flexible probability distributions in machine learning tasks. By parameterizing the CDF with binned logits, it enables differentiable training and efficient sampling, making it suitable for uncertainty estimation, probabilistic prediction, and distributional modeling in neural networks.

Implementation

The PiecewiseConstantBinnedCDF and PiecewiseLinearBinnedCDF classes inherit directly from torch.distributions.Distribution, implementing all necessary methods plus some convenience functions. They support multi-dimensional batch shapes and CUDA devices. The bins can be initialized linearly or log-spaced.

torch>=2.7 it the only non-dev dependency of this repo.

Getting Started

I recommend using PiecewiseLinearBinnedCDF for most applications.

from binned_cdf import PiecewiseLinearBinnedCDF

distr = PiecewiseLinearBinnedCDF(
    logits=logits,  # shape: (*batch_shape, num_bins)
    bound_low=-5,  # adapt to your data
    bound_up=7,  # adapt to your data
    log_spacing=True,  # if False, linear spacing is used
    bin_normalization_method="sigmoid",  # "sigmoid" or "softmax"
)

# ... use it like any other torch.distribution.Distribution

👉 Please have a look at the documentation to get started.

About

A PyTorch-based distribution parametrized by the logits of CDF bins

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages