Skip to content

Refactor metrics definitions to be easily understandable#10215

Merged
jerop merged 1 commit intomainfrom
bobcatfish/9209-refactor-really-real
Oct 1, 2025
Merged

Refactor metrics definitions to be easily understandable#10215
jerop merged 1 commit intomainfrom
bobcatfish/9209-refactor-really-real

Conversation

@bobcatfish
Copy link
Copy Markdown
Contributor

TLDR

As part of #9209 in preparation for adding / modifying telemetry to comply with the convention set architecture, move all the definitions into one place so we can see exactly what metrics are are exposing and what their structure is.

Dive Deeper

Not quite sure that the initialization of the peformance based metrics actually needs to be done separately from the rest, but keeping it that way for now to be consistent.

(FYI @eLyiN ! I don't think this will cause too much trouble with your pending changes 🙏 )

Reviewer Test Plan

If you wanted, you could hook up telemetry and try it out (https://github.com/google-gemini/gemini-cli/blob/main/docs/telemetry.md) but we should have pretty good coverage via the unit + integration tests.

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

Part of #9209

@bobcatfish bobcatfish requested a review from a team as a code owner September 30, 2025 04:05
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @bobcatfish, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the telemetry system by centralizing metric definitions and simplifying the API for recording metrics. The changes aim to make the telemetry code more understandable, maintainable, and extensible, aligning with the goal of preparing for future additions and modifications to the system's telemetry. This refactoring improves the clarity of what metrics are being exposed and their structure, while also enhancing type safety.

Highlights

  • Centralized Metric Definitions: All telemetry metric definitions have been moved from constants.ts to metrics.ts, consolidating them into structured objects (COUNTER_DEFINITIONS, HISTOGRAM_DEFINITIONS, etc.) for better organization and maintainability.
  • Simplified Metric Recording API: Metric recording functions (e.g., recordToolCallMetrics, recordTokenUsageMetrics) now accept a single attributes object instead of multiple individual parameters, streamlining their usage and improving readability.
  • Dynamic Metric Initialization: Metric instruments are now initialized dynamically by iterating over the new definition objects, reducing boilerplate and making it easier to add new metrics.
  • Type Safety for Attributes: A new MetricDefinitions type has been introduced to provide strong type checking for metric attributes, enhancing code reliability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great refactoring of the telemetry metrics. Centralizing the metric definitions in packages/core/src/telemetry/metrics.ts and switching to attribute objects for recording metrics significantly improves code clarity and maintainability. The changes are consistently applied across the codebase, including the test files. I've found one high-severity issue related to a regression in how API error metrics are recorded, which I've detailed in a specific comment.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 30, 2025

Size Change: +2 kB (+0.01%)

Total Size: 17.5 MB

Filename Size Change
./bundle/gemini.js 17.4 MB +2 kB (+0.01%)
ℹ️ View Unchanged
Filename Size
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB
./bundle/sandbox-macos-permissive-open.sb 830 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB

compressed-size-action

@jerop jerop force-pushed the bobcatfish/9209-refactor-really-real branch from 2377d9f to 35eccf6 Compare October 1, 2025 12:34
As part of #9209 in preparation for adding / modifying telemetry to
comply with the convention set architecture, move all the definitions
into one place so we can see exactly what metrics are are exposing and
what their structure is.

Not quite sure that the initialization of the peformance based metrics
actually needs to be done separately from the rest, but keeping it that
way for now to be consistent.
@jerop jerop force-pushed the bobcatfish/9209-refactor-really-real branch from 35eccf6 to 2123577 Compare October 1, 2025 12:36
@jerop jerop enabled auto-merge October 1, 2025 12:40
@jerop jerop added this pull request to the merge queue Oct 1, 2025
Merged via the queue into main with commit 5c6f006 Oct 1, 2025
46 of 56 checks passed
@jerop jerop deleted the bobcatfish/9209-refactor-really-real branch October 1, 2025 13:39
@jerop jerop linked an issue Oct 1, 2025 that may be closed by this pull request
2 tasks
import type { Config } from '../config/config.js';
import type { ModelRoutingEvent, ModelSlashCommandEvent } from './types.js';

const TOOL_CALL_COUNT = 'gemini_cli.tool.call.count';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion: All of these constants (to me) feel like they'd be a good candidate for an enum. That will increase the type safety of the usage site.

Benefits:

  • an enum type is effectively a union type of all of these options, meaning that the strictness at the call site could go from string -> a fixed list of specific items, which is always better and helps catch errors
  • I like to reach for enums when I have a fixed list of things, which is exactly what this is.

Also note:

  • Unlike some other languages, Typescript has "string enums", which allow for string values (and not just numbers). This is quite handy in cases like this.

'gemini_cli.performance.regression.percentage_change';
const BASELINE_COMPARISON = 'gemini_cli.performance.baseline.comparison';

const baseMetricDefinition = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a type that can be declared on this object? Right now, the type will be inferred, which technically works, but the errors can be a bit obscure when you try to shove this object into a shape this isn't expecting. A structural type (or even a partial structural type) is suggested.

go/tsjs-style#use-structural-types

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it looks like you're just using this at the callsites as baseMetricDefintion.whateverWhatever. I'd consider just making this an unexported file-level function (using the function keyword, plz) and avoid the noise of the object. It (the object) feels like unnecessary baggage here.

tokens_after: number;
},
},
} as const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as const is a good start, but there should/must be a type here that can assert this shape.

You can have the best of as const and type safety by doing

as const satisfied Record<SomeEnum, SomeInterface>

Ideally this type would be Record<SomeEnum, SomeInterface>

description: 'Count of CLI sessions started.',
valueType: ValueType.INT,
assign: (c: Counter) => (sessionCounter = c),
attributes: {} as Record<string, never>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, I think as never would suffice.

'routing.decision_source': string;
},
},
} as const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same commentary here. (and the next one, and so on.


const COUNTER_DEFINITIONS = {
[TOOL_CALL_COUNT]: {
description: 'Counts tool calls, tagged by function name and success.',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I read on down the CL here, I definitely feel like an interface that defines each of these fields (maybe with some nice docblocks that describe each field). I've nearly read this entire CL, and I still don't understand what the assign does. I think an interface with a docblock would make that more obvious to the reader.

description: 'Latency of API requests in milliseconds.',
unit: 'ms',
valueType: ValueType.INT,
Object.entries(COUNTER_DEFINITIONS).forEach(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer classical for... of... loop.

The .foreach is kind of discouraged because all of the other methods on Array.prototype return a new array (e.g. filter, reduce, etc, etc). Foreach is the one weird child that returns void. For that reason, I almost always prefer for ... of... loops as it makes my intent very explicit.

status_code: statusCode ?? 'ok',
...baseMetricDefinition.getCommonAttributes(config),
model: attributes.model,
status_code: attributes.status_code ?? 'ok',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: I think all of your other status_code have been 200, 500, etc (maybe I'm hallucinating). It's weird that this is ok.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment above about this, but it feels like the status_code thing could use some work.

I think we probably want to avoid a world where we have a bunch of rando strings in the DB indicated "success" vs "ok" || "failure" vs "failed" vs "error". This data could get messy fast and I think the stricter we make the logging calls, ultimately the better, IMHO. Our logging system (the typing around it) should really help us enforce "clean telemetry data"

typeof PERFORMANCE_COUNTER_DEFINITIONS &
typeof PERFORMANCE_HISTOGRAM_DEFINITIONS;

export type MetricDefinitions = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, rather than type = prefer interface

assign: (c: Counter) => (apiRequestCounter = c),
attributes: {} as {
model: string;
status_code?: number | string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd probably hoist a specific type for status_code and use either:

  • a number (like http status)
  • an enum of fixed options (success, failure, etc)
  • (least preferred) a string union of success|failure|etc

...baseMetricDefinition.getCommonAttributes(config),
model: attributes.model,
status_code: attributes.status_code ?? 'error',
error_type: attributes.error_type ?? 'unknown',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I feel the same commentary for status_code could potentially apply to error_type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor(telemetry): Refactor Metrics

4 participants