Skip to content

Bug2020962 sessions#3438

Draft
travis79 wants to merge 6 commits intomozilla:mainfrom
travis79:Bug2020962-Sessions
Draft

Bug2020962 sessions#3438
travis79 wants to merge 6 commits intomozilla:mainfrom
travis79:Bug2020962-Sessions

Conversation

@travis79
Copy link
Copy Markdown
Member

@travis79 travis79 commented Apr 16, 2026

I attempted to structure the commits here to make this easier to review, so I recommend reviewing it that way rather than as a whole.

Introduce the `out_of_session` flag on `CommonMetricData` (defaulting to
`true`) so metrics can declare whether they participate in session-level
sampling.  Add session configuration fields (`session_mode`,
`session_sample_rate`, `session_inactivity_timeout_ms`) to
`InternalConfiguration` and the RLB `Configuration` struct, and add
`session_sample_rate` to `RemoteSettingsConfig` for server-side overrides.

Update all existing test, benchmark, and example call sites to supply
the new configuration fields.
Implement the session management engine for Glean:

- New `session` module with `SessionManager`, `SessionMode` (Auto,
  Lifecycle, Manual), `SessionState`, `SessionMetadata`, and
  `EventSessionContext` types, plus persistence helpers for session_id,
  session_seq, event_seq, inactive_since, and session_start_time.

- Core lifecycle integration in `Glean`: session_start/session_end,
  handle_client_active/inactive transitions, inactivity timeout
  evaluation, dirty-flag crash recovery emitting synthetic
  session_end("abnormal"/"abnormal_inactive"), orphaned session cleanup
  on mode-mismatch across builds (session_end("abandoned")), and
  session state restoration across clean restarts in AUTO mode.

- Boundary events: `glean.session_start` and `glean.session_end` defined
  in metrics.yaml and emitted as out-of-session events carrying
  session_id, session_seq, session_start_time, and sampled_in.

- Session sampling gate in `MetricType::should_record()` suppresses
  in-session metrics when the active session is sampled out, applying
  uniformly to all metric types.

- Event-session integration: `EventSessionContext` flows through
  `event_database::record()` so in-session events carry per-event
  session metadata (session_id, session_seq, event_seq,
  session_sample_rate, session_start_time).

- Remote Settings override for session_sample_rate is intentionally
  in-memory only; RS refreshes on every startup so persistence would
  risk making stale values sticky.

- session_start_time uses millisecond-precision RFC 3339 serialization
  for consistency with millisecond event timestamps.
Add session lifecycle methods to the UniFFI interface definition and
wire them through all language bindings:

- UDL: export SessionMode enum, session_start(), session_end(),
  handle_client_active/inactive session hooks, and
  apply_server_knobs_config session_sample_rate support.

- Android/Kotlin: forward handle_client_active/inactive calls through
  the Glean singleton; update test configuration for session fields.

- iOS/Swift: forward handle_client_active/inactive calls; update test
  helpers and assertions for session-aware event payloads.

- Python: forward handle_client_active/inactive calls; update
  _RecordingUploader to accept a list of filter strings (needed because
  session lifecycle now submits additional pings on inactive); update
  test configuration.

- RLB: expose session_start/session_end and configuration fields on the
  public Configuration struct.
Comprehensive test coverage for session management:

- tests/session.rs (30 tests): Auto/Lifecycle/Manual mode basics,
  session_seq persistence and monotonic increase across restarts,
  event_seq increment and reset semantics, event_seq continuity across
  AUTO mode restarts, session metadata on in-session events,
  out-of-session event bypass, sampling gate enforcement, boundary
  event emission, inactivity timeout evaluation, session resumption
  vs. new session on restart, session_start_time persistence,
  sampled-out session resumption, sessions_seen diagnostic counter,
  and mode-mismatch orphaned session cleanup.

- lib_unit_tests.rs (4 tests): dirty-flag crash recovery emitting
  session_end("abnormal") for active crashes and
  session_end("abnormal_inactive") for mid-shutdown crashes, no-op
  when no session is persisted, and session_seq continuity after
  crash recovery.
@travis79 travis79 force-pushed the Bug2020962-Sessions branch from 7471f21 to 00f8341 Compare April 16, 2026 11:30
Copy link
Copy Markdown
Member

@badboy badboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a first somewhat-thorough review and left lots of comments.

High-level I think this is a good approach, some clarifications in the details needed maybe.
I have not yet gone through all the new test; I'd like to do that before I'm out tomorrow.

What's currently missing though is docs, both dev docs about all the new state we're storing (and the state changes! oh is it time for a new diagram?), and user-facing docs.
Fine if they land in followups, but should be there before we get this out to applications.

@@ -83,6 +83,9 @@ pub fn metric_dispatcher_benchmark(c: &mut Criterion) {
ping_schedule: Default::default(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a note to the very first line of the diff for the first commit, because one still can't leave a comment for the whole commit (or the message).

Thanks for splitting it up into individual commits. That's certainly helpful, though unfortunately this commit is not working on its own, introducing use of the SessionMode before it's defined.
I think I can cope with it in review though, just comments might be a bit out of order then.

ping_lifetime_max_time: 0,
session_mode: glean_core::SessionMode::Auto,
session_sample_rate: 1.0,
session_inactivity_timeout_ms: 1_800_000,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gnah, I really wish we could make this a Duration. But eh, it's the UDL-exposed type, we stick with integers for now.

/// Note: the naming is an inversion of the spec's `in_session` field.
/// `out_of_session: false` ↔ `in_session: true`.
#[serde(default = "default_out_of_session")]
pub out_of_session: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the inversion?
The spec says in_session and that's also what ends up in the metrics.yaml in the end.
It seems confusing to me to use the inverted one here then.

Event metrics that participate in session tracking must explicitly set this to false

Setting something to false to participate/opt-in seems also the wrong way around in my mind.

// Non-event metrics are out-of-session by default per the spec:
// "By default, all non-event metrics will be in_session: false."
// Event metrics must explicitly set `out_of_session: false`.
out_of_session: true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using in_session instead would also make Default derivable again because the default for bool is false

/// Server knobs configuration received from remote settings.
pub server_knobs_config: ObjectMetric,

/// The total number of sessions started, regardless of sampling outcome.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started per what? (looks like this is ping lifetime on the health ping, so ... sessions started between two full starts of the application [I think that's the only time we sent the health ping? though the health ping's lifecycle is not as strict, we could at any point change it, making this counter even more ambiguous]

/// Returns `None` if no session ID is stored or if it was cleared.
pub(crate) fn read_session_id(glean: &Glean) -> Option<String> {
let id = make_session_id_metric().get_value(glean, INTERNAL_STORAGE)?;
if id.is_empty() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstances would it be the empty string?


/// Clears the persisted session start timestamp.
pub(crate) fn clear_session_start_time(glean: &Glean) {
make_session_start_time_metric().set_sync(glean, "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. got it.

# It's an empty ping.
assert "metrics" not in payload
# No duration is recorded on an active ping.
assert "timespan" not in payload.get("metrics", {})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these changes?

Comment thread glean-core/src/lib_unit_tests.rs Outdated
Comment on lines +1415 to +1416
"abnormal",
extra.get("reason").expect("reason missing from synthetic session_end"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to simplify this into a single assertion like:

Suggested change
"abnormal",
extra.get("reason").expect("reason missing from synthetic session_end"),
Some("abnormal"),
extra.get("reason"),

Saves us one long line and expect, gives the same good error message if it's not right.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried and that won't work as-is because expected `Option<&str>`, found `Option<&String>`
Maybe we keep that expect

"synthetic session_end must carry the crashed session's session_id"
);

// After recovery, session state must be fully cleared.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I tend to also add these comments, but to be honest, the assert message has the same info. We could remove the comment (up to you though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants