Skip to content

holdout function to specify length using common language #180

@davidtedfordholt

Description

@davidtedfordholt

It would be nice to specify a holdout period without having to use filter. It gets tricky if different series in a tsbl have different dates for their final observation. I suggest a function that takes either a length of holdout (e.g. "14 days") or a number of observations, as well as an optional start_date (or a number that represents how many observations from the end would be the first observation in the holdout, to deal with the problem of different end dates).

aus_retail %>%
    holdout(length = "6 months") %>%
    model(snaive = SNAIVE(Turnover)) %>%
    forecast() %>%
    autoplot(aus_retail)

This would be a step in the direction of cross-validation, especially if you passed the length of the holdout into the models such that forecast could automatically forecast for the length of the holdout. The above would work relatively simply, and a CV would, in a sense, just be mapping that over a list of cutoff_dates (or inverse-ordered observation numbers) with the same holdout period, leading to a cv-fbl that adds a column for the cutoff date.

The cross-validation would be relatively simple as:

aus_retail %>%
    CV(cv_length = "2 years", cv_horizon = "6 months") %>%
    model(snaive = SNAIVE(Turnover))

can generate a nested table or the like (converting cv_length and cv_horizon to numbers of observations in the frequency of the tsbl) with rows corresponding the the keys and the cutoff_dates per key, generated as simply as:

aus_retail %>% 
    as_tibble() %>% 
    group_by(State, Industry) %>%
    summarise(
        cutoff_dates = list(
            Month[
                which(
                    Month >= nth(Month, -(cv_length + cv_horizon) &&
                    Month < nth(Month, -cv_horizon)
                )])) 

From there, for a given key, you are just slicing the tsbl off at each cutoff_date in the list, modeling and forecasting, then binding the fbl results with a column for the cutoff as an additional index.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions