Skip to content

Refactor Config files#263

Merged
jaredsnyder merged 41 commits into
mainfrom
kpi_refactor
Aug 16, 2024
Merged

Refactor Config files#263
jaredsnyder merged 41 commits into
mainfrom
kpi_refactor

Conversation

@jaredsnyder
Copy link
Copy Markdown
Contributor

@jaredsnyder jaredsnyder commented Aug 9, 2024

This PR modifies the config files, especially for the FunnelForecast jobs, to bring them more closely into alignment. This is a first step toward adding the ability to fit different "sub-models" to different partitions of the data (metric_hub.segments set in the FunnelForecast config) to the base class.

DotMap was also removed because managing the conversion for dict to DotMap and back was beginning to be more trouble than it was worth as the config files got bigger and used in more complex ways.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is
    referenced, the pull request should include the bug number in the title)
  • Scan the PR and verify that no changes (particularly to
    .circleci/config.yml) will cause environment variables (particularly
    credentials) to be exposed in test logs
  • Ensure the container image will be using permissions granted to
    telemetry-airflow
    responsibly.

@jaredsnyder jaredsnyder closed this Aug 9, 2024
@jaredsnyder jaredsnyder reopened this Aug 9, 2024
@jaredsnyder jaredsnyder changed the base branch from kpi_forecasting_funnel_unit_tests to kpi_forecast_add_more_prophet_tests August 9, 2024 18:19
@jaredsnyder
Copy link
Copy Markdown
Contributor Author

Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the predict_historical_dates parameter so that we wouldn't have to overwite dates_to_predict in the __post_init__. I would like to use predict_historical_dates to modify the start_date attribute via _default_start_date rather than overwriting dates_to_predict. The only place in FunnelForecast I can see this causing problems is here:

What is the purpose of this line? Also what do you think of the general approach?

Comment thread jobs/kpi-forecasting/kpi_forecasting/models/base_forecast.py
@m-d-bowerman
Copy link
Copy Markdown
Contributor

Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the predict_historical_dates parameter so that we wouldn't have to overwite dates_to_predict in the __post_init__. I would like to use predict_historical_dates to modify the start_date attribute via _default_start_date rather than overwriting dates_to_predict. The only place in FunnelForecast I can see this causing problems is here:

What is the purpose of this line? Also what do you think of the general approach?

In the FunnelForecast class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here:

components_df = segment_settings.segment_model.predict(dates_to_predict)[

That's the motivation behind overwriting dates_to_predict, while retaining the start_date param to trim down the forecast_df to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.

@jaredsnyder
Copy link
Copy Markdown
Contributor Author

Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the predict_historical_dates parameter so that we wouldn't have to overwite dates_to_predict in the __post_init__. I would like to use predict_historical_dates to modify the start_date attribute via _default_start_date rather than overwriting dates_to_predict. The only place in FunnelForecast I can see this causing problems is here:

What is the purpose of this line? Also what do you think of the general approach?

In the FunnelForecast class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here:

components_df = segment_settings.segment_model.predict(dates_to_predict)[

That's the motivation behind overwriting dates_to_predict, while retaining the start_date param to trim down the forecast_df to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.

Given that it gets filtered to dates in the future using self.start_date how is this different than just having dates_to_predict start at self.start_date?

@m-d-bowerman
Copy link
Copy Markdown
Contributor

Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the predict_historical_dates parameter so that we wouldn't have to overwite dates_to_predict in the __post_init__. I would like to use predict_historical_dates to modify the start_date attribute via _default_start_date rather than overwriting dates_to_predict. The only place in FunnelForecast I can see this causing problems is here:

What is the purpose of this line? Also what do you think of the general approach?

In the FunnelForecast class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here:

components_df = segment_settings.segment_model.predict(dates_to_predict)[

That's the motivation behind overwriting dates_to_predict, while retaining the start_date param to trim down the forecast_df to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.

Given that it gets filtered to dates in the future using self.start_date how is this different than just having dates_to_predict start at self.start_date?

The forecast dates get filtered that way, yeah, but the components table doesn't -- it'll have a row for every date in the observed data and the forecast dates.

Base automatically changed from kpi_forecast_add_more_prophet_tests to main August 14, 2024 14:30
@jaredsnyder
Copy link
Copy Markdown
Contributor Author

I'm going to leave dealing with start_date for the next PR where I generalize the use of SegmentSettings objects. So this one is really for a full review

Copy link
Copy Markdown
Contributor

@bochocki bochocki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, although I'll defer to @m-d-bowerman for the approval since there are some funnel forecast changes that I'm not familiar with.

Comment thread jobs/kpi-forecasting/kpi_forecasting/results_processing.py Outdated
@jaredsnyder
Copy link
Copy Markdown
Contributor Author

Created a notebook to validate that the outputs are the same. Ended up finding an error that was corrected in the latest commit, but now everything matches

https://colab.research.google.com/drive/1dLeLUz_99ln9PC1AG-izZILj9-zIJHmJ#scrollTo=oukMs3d3qiEJ

@jaredsnyder jaredsnyder merged commit 9109957 into main Aug 16, 2024
@jaredsnyder jaredsnyder deleted the kpi_refactor branch August 16, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants