Summary
score() does not correctly handle the scale column when scoring forecast_multivariate_sample objects produced by transform_forecasts(append=TRUE). The two scales (natural + transformed) are collapsed into a single set of scored rows instead of being scored separately. The univariate forecast_sample path handles this correctly.
Reprex
library(scoringutils)
#> scoringutils_2.1.2.9000
# === Univariate: works correctly ===
transformed_uni <- transform_forecasts(
example_sample_continuous, fun = sqrt, append = TRUE, label = "sqrt"
)
scored_uni <- score(transformed_uni)
table(scored_uni$scale, useNA = "always")
#> natural sqrt <NA>
#> 887 878 0
# Two sets of scores, one per scale -- correct.
# === Multivariate: broken ===
transformed_mv <- suppressWarnings(transform_forecasts(
example_multivariate_sample, fun = sqrt, append = TRUE, label = "sqrt"
))
nrow(example_multivariate_sample)
#> [1] 35624
nrow(transformed_mv)
#> [1] 71248
# Correctly doubled.
table(transformed_mv$scale, useNA = "always")
#> natural sqrt <NA>
#> 35624 35624 0
# Both scales present in transformed data.
scored_mv <- score(transformed_mv)
nrow(scored_mv)
#> [1] 224
# Expected 448 (224 per scale). Only one set of scores returned.
table(scored_mv$scale, useNA = "always")
#> <NA>
#> 0
# scale column dropped entirely.
Session info
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.3
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scoringutils_2.1.2.9000
Expected behaviour
score() should group by scale (in addition to the forecast unit columns) when scoring, producing separate scores for each scale. This is how it works for univariate forecast types (e.g., forecast_sample, forecast_quantile).
Context
Related to #1071 / #1072 -- the transform_forecasts(append=TRUE) fix resolved the immediate error for multivariate forecasts, but this downstream issue in score() remains.
Discovered while implementing sample forecast scoring in hubverse-org/hubEvals#94.
Summary
score()does not correctly handle thescalecolumn when scoringforecast_multivariate_sampleobjects produced bytransform_forecasts(append=TRUE). The two scales (natural + transformed) are collapsed into a single set of scored rows instead of being scored separately. The univariateforecast_samplepath handles this correctly.Reprex
Session info
Expected behaviour
score()should group byscale(in addition to the forecast unit columns) when scoring, producing separate scores for each scale. This is how it works for univariate forecast types (e.g.,forecast_sample,forecast_quantile).Context
Related to #1071 / #1072 -- the
transform_forecasts(append=TRUE)fix resolved the immediate error for multivariate forecasts, but this downstream issue inscore()remains.Discovered while implementing sample forecast scoring in hubverse-org/hubEvals#94.