DS-3791: Groupid aggregation#273
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #273 +/- ##
==========================================
+ Coverage 83.93% 84.46% +0.52%
==========================================
Files 19 19
Lines 1376 1429 +53
==========================================
+ Hits 1155 1207 +52
- Misses 221 222 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…into groupid_aggregation
mikewilli
left a comment
There was a problem hiding this comment.
Haven't had a chance to review the tests but thought it might be helpful to get these comments to you today. I don't think I saw any major issues though, nice work!
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
mikewilli
left a comment
There was a problem hiding this comment.
Heck yea, great work and love all the tests!
A couple outstanding questions to address (including leftover from previous comments), but they're minor enough that I'm happy to approve in the meantime.
Fulfills DS-3791: Modify mozanalysis to support groupID level aggregation.
Changes:
AnalysisUnitEnum to denote the experimental unit of the experiment. Possible values areCLIENT(mapping toclient_id) orGROUP(mapping toprofile_group_id)analysis_unit: AnalysisUnitattribute to theExperimentclass to denote the unit for the experiment. The experimental unit is now used to define the join key in the query builders:Experiment.build_enrollments_queryExperiment._build_enrollments_querynow gatesExperiment._build_enrollments_query_<subtype>methods by checking for valid values ofexperimental_unitfor the givenenrollment_query_type.Experiment.build_exposure_query(Experiment._build_exposures_queryperforms similar gating asExperiment._build_enrollments_query)Experiment.build_metrics_queryanalysis_unit: AnalysisUnitattribute to theTimeSeriesResultclass (used when joining multiple windows together).DataSource:DataSources can support both analysis units simultaneously (e.g., data may be keyed by bothclient_idandprofile_group_id). Thus, the analysis unit should be passed in at runtime, not at instantiation time.group_id_column: strattribute toDataSourceto mark the name of the group_id column.analysis_unit: AnalysisUnitparameter toDataSource.build_queryto define the level to which data should be aggregated when building a particular query.IncompatibleAnalysisUniterror type.mypynow runs without complaint forexperiment.py)Possible Extensions:
EnrollmentsQueryTypefrom a runtime parameter passed to the query methods to being an attribute of theExperimentobject.Experimentobject, so we could make it an attribute. This would allow us to move the above validity check toExperimentinstantiation time, so that the check happens as early as possible and isn't necessary to repeat inExperiment._build_enrollments_queryandExperiment._build_exposure_query.CLIENTexperimental units are supported for allEnrollmentsQueryTypes, whileGROUPis only supported forEnrollmentsQueryType.NORMANDY.AnalysisUnitlive in metric-config-parser instead?Considered Alternatives:
SELECTportions of the query to renamegroup_idtoclient_id(i.e.,SELECT profile_group_id AS client_id), which would have simplified the necessary changes. This option was rejected due to risk of subtle errors (e.g., one step is missed and group-level data was joined to client-level data).Notes:
profile_group_idis not yet on thetelemetry.eventsview, but is scheduled to be.