Skip to content

DS-3791: Groupid aggregation#273

Merged
danielkberry merged 43 commits into
mainfrom
groupid_aggregation
Aug 22, 2024
Merged

DS-3791: Groupid aggregation#273
danielkberry merged 43 commits into
mainfrom
groupid_aggregation

Conversation

@danielkberry
Copy link
Copy Markdown
Contributor

@danielkberry danielkberry commented Jul 31, 2024

Fulfills DS-3791: Modify mozanalysis to support groupID level aggregation.

Changes:

  • Adds an AnalysisUnit Enum to denote the experimental unit of the experiment. Possible values are CLIENT (mapping to client_id) or GROUP (mapping to profile_group_id)
  • Adds an analysis_unit: AnalysisUnit attribute to the Experiment class to denote the unit for the experiment. The experimental unit is now used to define the join key in the query builders:
    • Experiment.build_enrollments_query
    • Experiment._build_enrollments_query now gates Experiment._build_enrollments_query_<subtype> methods by checking for valid values of experimental_unit for the given enrollment_query_type.
    • Experiment.build_exposure_query (Experiment._build_exposures_query performs similar gating as Experiment._build_enrollments_query)
    • Experiment.build_metrics_query
  • Adds a analysis_unit: AnalysisUnit attribute to the TimeSeriesResult class (used when joining multiple windows together).
  • DataSource:
    • DataSources can support both analysis units simultaneously (e.g., data may be keyed by both client_id and profile_group_id). Thus, the analysis unit should be passed in at runtime, not at instantiation time.
    • Adds a group_id_column: str attribute to DataSource to mark the name of the group_id column.
    • Adds a analysis_unit: AnalysisUnit parameter to DataSource.build_query to define the level to which data should be aggregated when building a particular query.
  • Adds an IncompatibleAnalysisUnit error type.
  • Some minor changes were made to clean up typing (mypy now runs without complaint for experiment.py)
  • Unit tests for enrollments and metrics query building for both client and group levels
  • Downsampling not allowed for group-level experiments until decision made on group-level sample_ids

Possible Extensions:

  • We could consider moving EnrollmentsQueryType from a runtime parameter passed to the query methods to being an attribute of the Experiment object.
    • As best I can tell, the query type should be static across the lifetime of the Experiment object, so we could make it an attribute. This would allow us to move the above validity check to Experiment instantiation time, so that the check happens as early as possible and isn't necessary to repeat in Experiment._build_enrollments_query and Experiment._build_exposure_query.
    • Currently, CLIENT experimental units are supported for all EnrollmentsQueryTypes, while GROUP is only supported for EnrollmentsQueryType.NORMANDY.
  • Should AnalysisUnit live in metric-config-parser instead?

Considered Alternatives:

  • I considered simply modifying the initial SELECT portions of the query to rename group_id to client_id (i.e., SELECT profile_group_id AS client_id), which would have simplified the necessary changes. This option was rejected due to risk of subtle errors (e.g., one step is missed and group-level data was joined to client-level data).

Notes:

  • profile_group_id is not yet on the telemetry.events view, but is scheduled to be.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 90.69767% with 8 lines in your changes missing coverage. Please review.

Project coverage is 84.46%. Comparing base (595cca3) to head (b8ec995).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/mozanalysis/metrics.py 88.23% 4 Missing ⚠️
src/mozanalysis/experiment.py 92.30% 3 Missing ⚠️
src/mozanalysis/config.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #273      +/-   ##
==========================================
+ Coverage   83.93%   84.46%   +0.52%     
==========================================
  Files          19       19              
  Lines        1376     1429      +53     
==========================================
+ Hits         1155     1207      +52     
- Misses        221      222       +1     
Flag Coverage Δ
project 84.46% <90.69%> (+0.52%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@danielkberry danielkberry changed the title Groupid aggregation DS-3791: Groupid aggregation Aug 1, 2024
@danielkberry danielkberry marked this pull request as ready for review August 1, 2024 16:56
@danielkberry danielkberry requested a review from mikewilli August 1, 2024 16:58
Comment thread src/mozanalysis/types.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Copy link
Copy Markdown
Contributor

@mikewilli mikewilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had a chance to review the tests but thought it might be helpful to get these comments to you today. I don't think I saw any major issues though, nice work!

Comment thread src/mozanalysis/types.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/metrics.py Outdated
Comment thread src/mozanalysis/metrics.py Outdated
Comment thread src/mozanalysis/metrics.py Outdated
Comment thread src/mozanalysis/types.py Outdated
Comment thread src/mozanalysis/metrics.py Outdated
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
danielkberry and others added 8 commits August 2, 2024 08:25
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Comment thread src/mozanalysis/metrics.py
@danielkberry danielkberry requested a review from mikewilli August 20, 2024 17:53
Copy link
Copy Markdown
Contributor

@mikewilli mikewilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck yea, great work and love all the tests!

A couple outstanding questions to address (including leftover from previous comments), but they're minor enough that I'm happy to approve in the meantime.

Comment thread .circleci/config.yml
Comment thread src/mozanalysis/experiment.py Outdated
Comment thread src/mozanalysis/metrics.py Outdated
@danielkberry danielkberry merged commit 8abcbb5 into main Aug 22, 2024
@danielkberry danielkberry deleted the groupid_aggregation branch August 22, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants