Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2a60eef
refactored base_forecast and prophet_forecast to enable easier testing
jaredsnyder Jul 9, 2024
340fabf
Apply suggestions from code review
jaredsnyder Jul 10, 2024
6c7d3f2
add test for fit
jaredsnyder Jul 10, 2024
38e721d
revert signatures
jaredsnyder Jul 10, 2024
9b17337
made timezone-aware stamps naive
jaredsnyder Jul 10, 2024
90a822e
finished base_forecast tests
jaredsnyder Jul 10, 2024
72fabef
added tests for prophet class
jaredsnyder Jul 11, 2024
1ece1dd
linting
jaredsnyder Jul 11, 2024
606e2e4
fixed divide by zero
jaredsnyder Jul 11, 2024
585f2ca
linting again
jaredsnyder Jul 11, 2024
97bd46c
adding tests to funnel_forecast
jaredsnyder Jul 23, 2024
0e0ea91
Merge branch 'main' into kpi_forecasting_funnel_unit_tests
jaredsnyder Jul 23, 2024
c35247d
added tests for funnel_forecast
jaredsnyder Jul 29, 2024
e54d2c3
Merge branch 'main' into kpi_forecasting_funnel_unit_tests
jaredsnyder Jul 29, 2024
6ab0527
feat(workday):remove unwanted fields (#249)
JCMOSCON1976 Jul 29, 2024
07e5388
fix(exit):Added sys.exit() call (#250)
JCMOSCON1976 Jul 30, 2024
b102a7a
fix issue with call to _get_crossvalidation_metric
jaredsnyder Jul 30, 2024
0726287
fixed type check
jaredsnyder Aug 5, 2024
65f8e27
Merge branch 'main' into kpi_forecasting_funnel_unit_tests
jaredsnyder Aug 5, 2024
d8db825
added string case to aggregate_to_period and added tests
jaredsnyder Aug 6, 2024
6b6dac6
merge main
jaredsnyder Aug 7, 2024
2358ee3
update
jaredsnyder Aug 7, 2024
83aa229
revert file
jaredsnyder Aug 7, 2024
d5a0e63
added more tests to prophet_forecast
jaredsnyder Aug 8, 2024
b3edd10
removed DotMap
jaredsnyder Aug 9, 2024
fd1435b
modified README to make it match better between FunnelForecast and Pr…
jaredsnyder Aug 9, 2024
f551f4c
Update jobs/kpi-forecasting/kpi_forecasting/models/base_forecast.py
jaredsnyder Aug 9, 2024
1a63912
Brad easy fixes
jaredsnyder Aug 9, 2024
6a8c90c
remove magic year
jaredsnyder Aug 12, 2024
963a116
removed DotMap
jaredsnyder Aug 9, 2024
0f2f509
modified README to make it match better between FunnelForecast and Pr…
jaredsnyder Aug 9, 2024
e93162c
added test for more complex segments
jaredsnyder Aug 12, 2024
bfe4a54
Merge branch 'kpi_refactor' of github.com:mozilla/docker-etl into kpi…
jaredsnyder Aug 12, 2024
5f0536d
renamed use_holidays to use_all_us_holidays
jaredsnyder Aug 13, 2024
e0903b3
typo
m-d-bowerman Aug 13, 2024
109dff7
added detail to prophet parameter descriptions
m-d-bowerman Aug 13, 2024
2dea195
Merge branch 'main' into kpi_refactor
jaredsnyder Aug 14, 2024
3b7452b
Merge branch 'kpi_refactor' of github.com:mozilla/docker-etl into kpi…
jaredsnyder Aug 14, 2024
c6ed03c
updated setting of default start date and added tests
jaredsnyder Aug 14, 2024
2f8b3d0
remove print
jaredsnyder Aug 14, 2024
877d07c
moved filter and updated tests to relfect this
jaredsnyder Aug 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 81 additions & 2 deletions jobs/kpi-forecasting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,87 @@ The tests can be run locally with `python -m pytest` in the root directory of th

# YAML Configs

Each of the sections in the YAML files contains a list of arguments that are passed to their relevant objects or methods.
Definitions should be documented in the code.
Configuration for each forecast is found in the `configs` folder. Below is an example config file with sample values and a description of what the field means as a comment when it is not self-evident

```
metric_hub: # this configures the observed data fed to the model which is obtained via metrichub
app_name: "multi_product" # metric-hub app name
slug: "search_forecasting_ad_clicks" # metric-hub slug
alias: "search_forecasting_ad_clicks" # metric-hub alias
start_date: "2018-01-01" # date at which the observed data should start
end_date: "last complete month"
# date at which the observed data will end, can be a date or "last complete month"
# which uses `utils.parse_end_date` to determine the last complete month
segments:
# this section is optional and currently only used in funnel forecast,
# specifies which segments are used to partition the data,
# enabling separate models to be fit for each partition.
# Values underneath are a map of column names to be output by the
# metric-hub call and the SQL queries to populate those columns
device: "device"
channel: "'all'"
country: "CASE WHEN country = 'US' THEN 'US' ELSE 'ROW' END"
partner: "partner"
where: "partner = 'Google'" # filter to apply to the metric hub pull

forecast_model: # this section configures the model
model_type: "funnel"
# type of model object to use, current options are "funnel" for FunnelForecast and "prophet" for ProphetForecast
start_date: NULL
# starting date for the predicted data (unless predict_historical_dates is set),
# if unset, value depends on predict_historical_dates.
end_date: NULL
# final date for the predicted data
use_all_us_holidays: False
For prophet-based models, when true, call `model.add_country_holidays(country_name="US")` on model
predict_historical_dates: True
# if predict_historical_dates is True, set to first date of the observed data
# if predict_historical_dates is False, defaults to the day after the last day in the observed data
number_of_simulations: 1000
# for prophet-based models,number of simulations to run
parameters:
# this section can be a map or a list.
# If it's a map, these parameters are used for all models
# (recall multiple models are train if there is a metric_hub.segments)
# If it's a list, it will set different parameters
# for different subsets of the parition specified in `metric_hub.segments`.
- segment:
# specifies which subset of the partitions this applies to
# key is a column specified in metric_hub.segments
# value is a value that column can take to which the configuration is applied
device: desktop
start_date: "2018-01-01" # only applies to FunnelForecast, allows one to set start date for each sub-model
end_date: NULL # only applies to FunnelForecast, allows one to set end date for each sub-model
holidays: ["easter", "covid_sip11"] # holidays specified in `configs.model_inputs.holidays` to use.
regressors: ["post_esr_migration", "in_covid", "ad_click_bug"] # regressors specified in `configs.model_inputs.regressors`
grid_parameters:
# sets grid for hyperparameter tuning
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5] # parameter of prior distribution controlling how much the trend fluctuates at changepoints
changepoint_range: [0.8, 0.9] # the proportion of the time series over which the changepoints are distributed
n_changepoints: [25, 50] # number of trend changepoints, equally spaced over the time series
weekly_seasonality: True # if weekly seasonality is included in the model
yearly_seasonality: True # if yearly seasonality is included in the model
cv_settings:
# sets parameters for prophet cross-validation used in FunnelForecast
initial: "1296 days" # the initial training period, used to train the first iteration of the model for CV
period: "30 days" # spacing between cutoff dates, the sliding window over which each round of cross validation is performed
horizon: "30 days" # forecast horizon used to make predictions and calculate model fit metrics for optimization
parallel: "processes" # how parallelization is performed by Prophet, or None if no paralellization is used
...

summarize:
# parameters used to summarize and aggregate the predictions
periods: ["day", "month"] # periods to aggregate up to
numpy_aggregations: ["mean"] # numpy aggregation functions to use when aggregating predictions
percentiles: [10, 50, 90] # precentiles to calculate on aggregation

write_results:
# set the project, dataset and table for output data
project: "moz-fx-data-shared-prod"
dataset: "search_derived"
table: "search_funnel_forecasts_v1"
components_table: "search_forecast_model_components_v1"
```

# Development

Expand Down
14 changes: 7 additions & 7 deletions jobs/kpi-forecasting/kpi_forecasting.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from kpi_forecasting.inputs import CLI, YAML
from kpi_forecasting.inputs import CLI, load_yaml
from kpi_forecasting.models.prophet_forecast import ProphetForecast
from kpi_forecasting.models.funnel_forecast import FunnelForecast
from kpi_forecasting.metric_hub import MetricHub
Expand All @@ -13,17 +13,17 @@

def main() -> None:
# Load the config
config = YAML(filepath=CLI().args.config).data
model_type = config.forecast_model.model_type
config = load_yaml(filepath=CLI().args.config)
model_type = config["forecast_model"]["model_type"]

if model_type in MODELS:
metric_hub = MetricHub(**config.metric_hub)
model = MODELS[model_type](metric_hub=metric_hub, **config.forecast_model)
metric_hub = MetricHub(**config["metric_hub"])
model = MODELS[model_type](metric_hub=metric_hub, **config["forecast_model"])

model.fit()
model.predict()
model.summarize(**config.summarize)
model.write_results(**config.write_results)
model.summarize(**config["summarize"])
model.write_results(**config["write_results"])

else:
raise ValueError(f"Don't know how to forecast using {model_type}.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ forecast_model:
model_type: "prophet"
start_date: NULL
end_date: NULL
use_holidays: False
use_all_us_holidays: False
predict_historical_dates: False
number_of_simulations: 1000
parameters:
seasonality_prior_scale: 0.00825
changepoint_prior_scale: 0.15983
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ forecast_model:
model_type: "prophet"
start_date: NULL
end_date: NULL
use_holidays: True
use_all_us_holidays: True
predict_historical_dates: False
number_of_simulations: 1000
parameters:
seasonality_prior_scale: 0.01
changepoint_prior_scale: 0.01
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
from pathlib import Path


from kpi_forecasting.inputs import YAML
from kpi_forecasting.inputs import load_yaml


PARENT_PATH = Path(__file__).parent
HOLIDAY_PATH = PARENT_PATH / "holidays.yaml"
REGRESSOR_PATH = PARENT_PATH / "regressors.yaml"

holiday_collection = YAML(HOLIDAY_PATH)
regressor_collection = YAML(REGRESSOR_PATH)
holiday_collection = load_yaml(HOLIDAY_PATH)
regressor_collection = load_yaml(REGRESSOR_PATH)


@attr.s(auto_attribs=True, frozen=False)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,43 +16,45 @@ forecast_model:
model_type: "funnel"
start_date: NULL
end_date: NULL
use_holidays: False
use_all_us_holidays: False
predict_historical_dates: True
number_of_simulations: 1000
parameters:
model_setting_split_dim: "device"
segment_settings:
desktop:
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid", "ad_click_bug"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
n_changepoints: [25, 50]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
mobile:
start_date: "2022-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [.01, .1, .15, .2]
changepoint_range: [0.8, 0.9, 1]
n_changepoints: [30]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: desktop
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid", "ad_click_bug"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
n_changepoints: [25, 50]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: mobile
start_date: "2022-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [.01, .1, .15, .2]
changepoint_range: [0.8, 0.9, 1]
n_changepoints: [30]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"

summarize:
periods: ["day", "month"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,42 @@ forecast_model:
model_type: "funnel"
start_date: NULL
end_date: NULL
use_holidays: False
use_all_us_holidays: False
predict_historical_dates: True
number_of_simulations: 1000
parameters:
model_setting_split_dim: "device"
segment_settings:
desktop:
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
mobile:
start_date: "2021-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: desktop
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: mobile
start_date: "2021-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"

summarize:
periods: ["day", "month"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,42 @@ forecast_model:
model_type: "funnel"
start_date: NULL
end_date: NULL
use_holidays: False
use_all_us_holidays: False
predict_historical_dates: True
number_of_simulations: 1000
parameters:
model_setting_split_dim: "device"
segment_settings:
desktop:
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
mobile:
start_date: "2020-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: desktop
start_date: "2018-01-01"
end_date: NULL
holidays: ["easter", "covid_sip11"]
regressors: ["post_esr_migration", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1, 0.2, 0.5]
changepoint_range: [0.8, 0.9]
weekly_seasonality: True
yearly_seasonality: True
cv_settings:
initial: "1296 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"
- segment:
device: mobile
start_date: "2020-01-01"
end_date: NULL
holidays: ["easter"]
regressors: ["after_fenix", "in_covid"]
grid_parameters:
changepoint_prior_scale: [0.001, 0.01, 0.1]
weekly_seasonality: True
yearly_seasonality: True
growth: "logistic"
cv_settings:
initial: "366 days"
period: "30 days"
horizon: "30 days"
parallel: "processes"

summarize:
periods: ["day", "month"]
Expand Down
Loading