There is several options for the relationship between input and output for the interval score:
- One option is to think of it as a many-to-one relationship: One forecast comprises several quantiles / intervals. But you only get one score per forecast (same as e.g. for CRPS)
- Another option is to treat every quantile / interval as a separate forecast. Then you could get one score per quantile / interval.

The current implementation of score() returns one score per quantile. That's why I created a horribly named function wis_one_to_one() which also does that. The function wis() only returns one score per forecast, not one score per quantile.
Current options are:
- use
wis() within score() and change the current behaviour from "one score per quantile" to "one score per forecast"
- create an additional format, "interval" in which one forecast really is one interval. Then people could run
score() on that and would get one score per interval. (we'd have to think a bit about how we handle the median when summarising, see below)
- use
wis_one_to_one() or a better named cousin in score() and return one score per quantile as we used to do in the past.
- Merge
wis() and wis_one_to_one() into one function. That's easily possible, there will just be a lot of arguments which are ugly (like count_median_twice, output, `mapping = c("one-to-one", "many-to-one").
Related: Computation of coverage, see #389
Additional background:
An additional complication is the difference between the quantile score and the interval score. You can compute a quantile score for a single quantile, but you can't compute overprediction, dispersion and underprediction.
You can do that for a single interval (which comprises to quantiles), so you can easily compute one score per interval.
If you want to return one score per quantile, but have over/underprediction + dispersion then you actually need to compute an interval score and then merge the result back.
Adding to that complication is that there is two ways to conceptualise the WIS:
a) as an average of quantile scores or
b) as an average of interval scores.
The implication is how you treat the median: If you think of the median as a 0% prediction interval (which I think is illegal and you're not allowed to do, but which we do computationally) then you end up with a difference between average quantile score and average interval score. If you average over interval scores, then the median effectively appears twice (once as lower and once as upper bound), whereas when you average over quantile scores, it appears only once. Therefore we have an argument count_median_twice.
There is several options for the relationship between input and output for the interval score:
The current implementation of
score()returns one score per quantile. That's why I created a horribly named functionwis_one_to_one()which also does that. The functionwis()only returns one score per forecast, not one score per quantile.Current options are:
wis()withinscore()and change the current behaviour from "one score per quantile" to "one score per forecast"score()on that and would get one score per interval. (we'd have to think a bit about how we handle the median when summarising, see below)wis_one_to_one()or a better named cousin inscore()and return one score per quantile as we used to do in the past.wis()andwis_one_to_one()into one function. That's easily possible, there will just be a lot of arguments which are ugly (likecount_median_twice,output, `mapping = c("one-to-one", "many-to-one").Related: Computation of coverage, see #389
Additional background:
An additional complication is the difference between the quantile score and the interval score. You can compute a quantile score for a single quantile, but you can't compute overprediction, dispersion and underprediction.
You can do that for a single interval (which comprises to quantiles), so you can easily compute one score per interval.
If you want to return one score per quantile, but have over/underprediction + dispersion then you actually need to compute an interval score and then merge the result back.
Adding to that complication is that there is two ways to conceptualise the WIS:
a) as an average of quantile scores or
b) as an average of interval scores.
The implication is how you treat the median: If you think of the median as a 0% prediction interval (which I think is illegal and you're not allowed to do, but which we do computationally) then you end up with a difference between average quantile score and average interval score. If you average over interval scores, then the median effectively appears twice (once as lower and once as upper bound), whereas when you average over quantile scores, it appears only once. Therefore we have an argument
count_median_twice.