Refactor metrics definitions to be easily understandable#10215
Conversation
Summary of ChangesHello @bobcatfish, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the telemetry system by centralizing metric definitions and simplifying the API for recording metrics. The changes aim to make the telemetry code more understandable, maintainable, and extensible, aligning with the goal of preparing for future additions and modifications to the system's telemetry. This refactoring improves the clarity of what metrics are being exposed and their structure, while also enhancing type safety. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a great refactoring of the telemetry metrics. Centralizing the metric definitions in packages/core/src/telemetry/metrics.ts and switching to attribute objects for recording metrics significantly improves code clarity and maintainability. The changes are consistently applied across the codebase, including the test files. I've found one high-severity issue related to a regression in how API error metrics are recorded, which I've detailed in a specific comment.
|
Size Change: +2 kB (+0.01%) Total Size: 17.5 MB
ℹ️ View Unchanged
|
2377d9f to
35eccf6
Compare
As part of #9209 in preparation for adding / modifying telemetry to comply with the convention set architecture, move all the definitions into one place so we can see exactly what metrics are are exposing and what their structure is. Not quite sure that the initialization of the peformance based metrics actually needs to be done separately from the rest, but keeping it that way for now to be consistent.
35eccf6 to
2123577
Compare
| import type { Config } from '../config/config.js'; | ||
| import type { ModelRoutingEvent, ModelSlashCommandEvent } from './types.js'; | ||
|
|
||
| const TOOL_CALL_COUNT = 'gemini_cli.tool.call.count'; |
There was a problem hiding this comment.
Opinion: All of these constants (to me) feel like they'd be a good candidate for an enum. That will increase the type safety of the usage site.
Benefits:
- an enum type is effectively a union type of all of these options, meaning that the strictness at the call site could go from
string->a fixed list of specific items, which is always better and helps catch errors - I like to reach for enums when I have a fixed list of things, which is exactly what this is.
Also note:
- Unlike some other languages, Typescript has "string enums", which allow for string values (and not just numbers). This is quite handy in cases like this.
| 'gemini_cli.performance.regression.percentage_change'; | ||
| const BASELINE_COMPARISON = 'gemini_cli.performance.baseline.comparison'; | ||
|
|
||
| const baseMetricDefinition = { |
There was a problem hiding this comment.
Is there a type that can be declared on this object? Right now, the type will be inferred, which technically works, but the errors can be a bit obscure when you try to shove this object into a shape this isn't expecting. A structural type (or even a partial structural type) is suggested.
go/tsjs-style#use-structural-types
There was a problem hiding this comment.
Also, it looks like you're just using this at the callsites as baseMetricDefintion.whateverWhatever. I'd consider just making this an unexported file-level function (using the function keyword, plz) and avoid the noise of the object. It (the object) feels like unnecessary baggage here.
| tokens_after: number; | ||
| }, | ||
| }, | ||
| } as const; |
There was a problem hiding this comment.
as const is a good start, but there should/must be a type here that can assert this shape.
You can have the best of as const and type safety by doing
as const satisfied Record<SomeEnum, SomeInterface>
Ideally this type would be Record<SomeEnum, SomeInterface>
| description: 'Count of CLI sessions started.', | ||
| valueType: ValueType.INT, | ||
| assign: (c: Counter) => (sessionCounter = c), | ||
| attributes: {} as Record<string, never>, |
There was a problem hiding this comment.
IIRC, I think as never would suffice.
| 'routing.decision_source': string; | ||
| }, | ||
| }, | ||
| } as const; |
There was a problem hiding this comment.
Same commentary here. (and the next one, and so on.
|
|
||
| const COUNTER_DEFINITIONS = { | ||
| [TOOL_CALL_COUNT]: { | ||
| description: 'Counts tool calls, tagged by function name and success.', |
There was a problem hiding this comment.
As I read on down the CL here, I definitely feel like an interface that defines each of these fields (maybe with some nice docblocks that describe each field). I've nearly read this entire CL, and I still don't understand what the assign does. I think an interface with a docblock would make that more obvious to the reader.
| description: 'Latency of API requests in milliseconds.', | ||
| unit: 'ms', | ||
| valueType: ValueType.INT, | ||
| Object.entries(COUNTER_DEFINITIONS).forEach( |
There was a problem hiding this comment.
Prefer classical for... of... loop.
The .foreach is kind of discouraged because all of the other methods on Array.prototype return a new array (e.g. filter, reduce, etc, etc). Foreach is the one weird child that returns void. For that reason, I almost always prefer for ... of... loops as it makes my intent very explicit.
| status_code: statusCode ?? 'ok', | ||
| ...baseMetricDefinition.getCommonAttributes(config), | ||
| model: attributes.model, | ||
| status_code: attributes.status_code ?? 'ok', |
There was a problem hiding this comment.
Sanity check: I think all of your other status_code have been 200, 500, etc (maybe I'm hallucinating). It's weird that this is ok.
There was a problem hiding this comment.
I left a comment above about this, but it feels like the status_code thing could use some work.
I think we probably want to avoid a world where we have a bunch of rando strings in the DB indicated "success" vs "ok" || "failure" vs "failed" vs "error". This data could get messy fast and I think the stricter we make the logging calls, ultimately the better, IMHO. Our logging system (the typing around it) should really help us enforce "clean telemetry data"
| typeof PERFORMANCE_COUNTER_DEFINITIONS & | ||
| typeof PERFORMANCE_HISTOGRAM_DEFINITIONS; | ||
|
|
||
| export type MetricDefinitions = { |
There was a problem hiding this comment.
nit, rather than type = prefer interface
| assign: (c: Counter) => (apiRequestCounter = c), | ||
| attributes: {} as { | ||
| model: string; | ||
| status_code?: number | string; |
There was a problem hiding this comment.
Personally, I'd probably hoist a specific type for status_code and use either:
- a number (like http status)
- an enum of fixed options (success, failure, etc)
- (least preferred) a string union of success|failure|etc
| ...baseMetricDefinition.getCommonAttributes(config), | ||
| model: attributes.model, | ||
| status_code: attributes.status_code ?? 'error', | ||
| error_type: attributes.error_type ?? 'unknown', |
There was a problem hiding this comment.
If possible, I feel the same commentary for status_code could potentially apply to error_type
TLDR
As part of #9209 in preparation for adding / modifying telemetry to comply with the convention set architecture, move all the definitions into one place so we can see exactly what metrics are are exposing and what their structure is.
Dive Deeper
Not quite sure that the initialization of the peformance based metrics actually needs to be done separately from the rest, but keeping it that way for now to be consistent.
(FYI @eLyiN ! I don't think this will cause too much trouble with your pending changes 🙏 )
Reviewer Test Plan
If you wanted, you could hook up telemetry and try it out (https://github.com/google-gemini/gemini-cli/blob/main/docs/telemetry.md) but we should have pretty good coverage via the unit + integration tests.
Testing Matrix
Linked issues / bugs
Part of #9209