pairwise_comparison() outputs both NA and NaN

## Reprex 

``` r
d <- tempfile()

download.file(
  "https://gist.githubusercontent.com/Bisaloo/62615a3821da71b8f542f5d5bc096327/raw/1415d04901e8168ceed2ca1f554b8799cf3ba6ac/reprex_NA_NaN_scoringutils.txt",
  d
)

df <- dget(d)

library(tidyverse)
library(scoringutils)
#> Note: The definition of the weighted interval score has slightly changed in version 0.1.5. If you want to use the old definition, use the argument `count_median_twice = TRUE` in the function `eval_forecasts()`
#> 
#> Attaching package: 'scoringutils'
#> The following object is masked from 'package:purrr':
#> 
#>     update_list

suppressWarnings({
  df %>%
    filter(n_quantiles == 23) %>%
    select(model, target_variable, horizon, location, location_name,
           forecast_date, interval_score = wis) %>%
    pairwise_comparison(
      metric = "interval_score",
      baseline = "EuroCOVIDhub-baseline",
      by = c("model", "target_variable", "forecast_date", "horizon",
             "location", "location_name"),
      summarise_by = c("model", "target_variable", "horizon", "location")
    ) %>%
    select(model, target_variable, horizon, location, rel_wis = scaled_rel_skill) %>%
    distinct()
})
#>                    model target_variable horizon location rel_wis
#> 1: EuroCOVIDhub-baseline       inc death       3       IS     NaN
#> 2: EuroCOVIDhub-ensemble       inc death       3       IS      NA
#> 3: EuroCOVIDhub-ensemble       inc death       3       IS     NaN
#> 4:               ILM-EKF       inc death       3       IS     NaN
#> 5:            MUNI-ARIMA       inc death       3       IS     NaN
#> 6:    RobertWalraven-ESG       inc death       3       IS     NaN
#> 7:       UMass-MechBayes       inc death       3       IS     NaN
#> 8:         USC-SIkJalpha       inc death       3       IS     NaN
#> 9:  epiforecasts-EpiNow2       inc death       3       IS       0
```

<sup>Created on 2021-11-10 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1.9000)</sup>

As you can see here, we have two almost identical lines for `EuroCOVIDhub-ensemble` excepted that one is with `NA` and the other with `NaN`. The fact that a model can have more than one row cause issues in downstream analyses in our cases.

## Description of the problem

It looks like `pairwise_comparison()` sometimes returns `NA` and sometimes `NaN` when it cannot compute the value. This leads to confusion because both `NA` and `NaN` indicate almost the same thing but they have strange incompatibilites:

``` r
identical(NA, NaN)
#> [1] FALSE

NA + NaN
#> [1] NA

NaN + NA
#> [1] NaN
```

<sup>Created on 2021-11-10 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1.9000)</sup>

## Proposed solution

`NaN` is rarely used and confusing (IMO). They often appear because the function doesn't control the output for errors/impossible computations. And they can cause serious issues in downstream analyses (such as in our case). Tomas Kalibera gives a good overview of the hell that is `NA` vs `NaN`:

> The result of binary operations involving NA and NaN is hardware dependent (the propagation of NaN payload) - on some hardware, it actually works the way we would like - NA is returned - but on some hardware you get NaN or sometimes NA and sometimes NaN. Also there are C compiler optimizations re-ordering code, as mentioned in ?NaN. Then there are also external numerical libraries that do not distinguish NA from NaN (NA is an R concept)

So we should stick to only one of these. As far as I know, `NA` always propagates as `NA` while `f(NaN)` can return `NA` or `NaN` depending on `f()` so a conscious choice to always output `NA` would be much better / clearer IMO.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pairwise_comparison() outputs both NA and NaN #140

Reprex

Description of the problem

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pairwise_comparison() outputs both NA and NaN #140

Description

Reprex

Description of the problem

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions