-
Notifications
You must be signed in to change notification settings - Fork 36
Description
It would be nice to specify a holdout period without having to use filter. It gets tricky if different series in a tsbl have different dates for their final observation. I suggest a function that takes either a length of holdout (e.g. "14 days") or a number of observations, as well as an optional start_date (or a number that represents how many observations from the end would be the first observation in the holdout, to deal with the problem of different end dates).
aus_retail %>%
holdout(length = "6 months") %>%
model(snaive = SNAIVE(Turnover)) %>%
forecast() %>%
autoplot(aus_retail)
This would be a step in the direction of cross-validation, especially if you passed the length of the holdout into the models such that forecast could automatically forecast for the length of the holdout. The above would work relatively simply, and a CV would, in a sense, just be mapping that over a list of cutoff_dates (or inverse-ordered observation numbers) with the same holdout period, leading to a cv-fbl that adds a column for the cutoff date.
The cross-validation would be relatively simple as:
aus_retail %>%
CV(cv_length = "2 years", cv_horizon = "6 months") %>%
model(snaive = SNAIVE(Turnover))
can generate a nested table or the like (converting cv_length and cv_horizon to numbers of observations in the frequency of the tsbl) with rows corresponding the the keys and the cutoff_dates per key, generated as simply as:
aus_retail %>%
as_tibble() %>%
group_by(State, Industry) %>%
summarise(
cutoff_dates = list(
Month[
which(
Month >= nth(Month, -(cv_length + cv_horizon) &&
Month < nth(Month, -cv_horizon)
)]))
From there, for a given key, you are just slicing the tsbl off at each cutoff_date in the list, modeling and forecasting, then binding the fbl results with a column for the cutoff as an additional index.