-
Notifications
You must be signed in to change notification settings - Fork 13
HYPERFLEET-382: Add metrics conventions standard #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HYPERFLEET-382: Add metrics conventions standard #62
Conversation
WalkthroughAdds a HyperFleet Metrics Standard and updates component docs to adopt it: renames adapter metrics from Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Pre-merge checks✅ Passed checks (3 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)
67-78: Update all metric examples to include requiredcomponentandversionlabels per the metrics standard.The metrics standard (hyperfleet/standards/metrics.md lines 60–65) mandates that all metrics include
componentandversionlabels. However, all examples in this file useadapter_nameinstead ofcomponentand omit theversionlabel entirely.Apply the following pattern to all metric examples throughout the document:
🔎 Example fix for `hyperfleet_adapter_events_processed_total` (lines 74–77)
- hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="success"} 1523 - hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="error"} 12 - hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="skipped"} 89 - hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="NodePool",status="success"} 342 + hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 1523 + hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="error"} 12 + hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="skipped"} 89 + hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="NodePool",status="success"} 342Apply the same pattern to all other metric examples (lines 101–105, 130–132, 182–185, 210–213, 237–239, 264–267, 291–293, etc.).
🧹 Nitpick comments (1)
hyperfleet/standards/metrics.md (1)
28-30: Add language identifier to code fences (markdown lint MD040).Lines 28–30 and 90–94 have fenced code blocks without a language identifier. These should specify a language for syntax highlighting and linting compliance.
🔎 Suggested fixes
Line 28–30 (format example):
-``` +```text hyperfleet_<component>_<metric_name>_<unit> -``` +```Line 90–94 (endpoint sanitization):
-``` +```text /clusters/cls-abc123 → /clusters/{id} /clusters/cls-abc/nodepools/np-1 → /clusters/{id}/nodepools/{id} /namespaces/ns-123/jobs/job-456 → /namespaces/{ns}/jobs/{name} -``` +```Also applies to: 90-94
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
hyperfleet/standards/metrics.md
28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
90-90: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
hyperfleet/components/sentinel/sentinel-deployment.md (1)
220-221: Reference path and placement are correct.The link to the metrics standard is properly formatted and the relative path is correct. This addition appropriately directs implementers to the standard.
hyperfleet/components/adapter/framework/adapter-metrics.md (2)
11-11: Metric naming convention updates are consistently applied.The renaming from
adapter_*tohyperfleet_adapter_*is thorough and consistent across:
- All metric definitions and their documentation
- Example outputs and PromQL queries
- Alerting rules
- Dashboard queries
This aligns well with the new naming standard.
Also applies to: 62-78, 87-106, 117-156, 168-214, 225-240, 251-268, 279-300, 305-320, 329-340
354-357: Cross-reference to metrics standard is correct.Line 354 appropriately directs to the standard for naming conventions. The path at line 11 (
[HyperFleet Metrics Standard](../../../standards/metrics.md)) is correctly formatted.
236e656 to
02045b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
hyperfleet/standards/metrics.md
28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
90-90: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (20)
hyperfleet/components/adapter/framework/adapter-metrics.md (19)
47-56: Excellent header and format documentation.The metrics format section clearly specifies Prometheus format, OpenMetrics compatibility, and explicitly mandates component and version labels with a reference to the Metrics Standard. Health endpoints reference is clear.
64-80: Event processing metric naming and labeling is consistent.
hyperfleet_adapter_events_processed_totalis properly named with thehyperfleet_adapter_prefix, and all three examples correctly includecomponentandversionlabels alongside adapter-specific dimensions (adapter_name, resource_kind, status).
89-108: Histogram metric examples properly labeled.
hyperfleet_adapter_event_processing_duration_secondsexamples correctly show the metric name, bucket structure, and required labels (component, version) along with domain-specific labels (adapter_name, resource_kind, status). Buckets (0.1, 0.5, 1, 2, 5, 10, 30, 60, 120) are well-reasoned for event processing.
119-135: Resource management metrics consistent with standard.Both
hyperfleet_adapter_resources_created_totalandhyperfleet_adapter_resources_deleted_totalinclude required labels (component, version) and resource-specific labels (adapter_name, resource_type, namespace, status). Naming and labeling conventions are uniform.
170-188: API metrics properly labeled and sanitized.
hyperfleet_adapter_api_requests_totalexamples demonstrate correct prefix, required labels (component, version), and sanitized endpoint paths (e.g.,/clusters/{id},/namespaces/{ns}/jobs) with no high-cardinality IDs. PromQL examples are updated to reflect new metric names.
197-216: API request duration histogram is well-structured.
hyperfleet_adapter_api_request_duration_secondsexamples show proper bucket values (0.01, 0.05, 0.1, 0.5, 1, 2, 5) tuned for API latency, and all examples include required labels (component, version). Histogram sum and count are correctly formatted.
227-242: Precondition evaluation metric follows conventions.
hyperfleet_adapter_preconditions_evaluated_totalproperly names, labels (component, version, adapter_name, precondition_name, result), and documents label values (pass, fail, error).
253-270: Status reporting metric is comprehensive.
hyperfleet_adapter_status_reports_totalexamples include required labels (component, version) and contextual boolean labels (applied, available). Labels are kept to low cardinality.
281-296: Error metric labels refined for clarity.
hyperfleet_adapter_errors_totalnow useserror_component(line 289) instead of a generic "component" for internal error location (event_processor, precondition_evaluator, resource_manager, status_reporter). This correctly disambiguates the HyperFleet-standardcomponentlabel (adapter name) from internal error source. Naming is clear and examples are correct.
307-322: Workload monitoring metric is properly structured.
hyperfleet_adapter_workload_status_totalincludes required labels (component, version) and domain labels (adapter_name, workload_type, status). Label values are documented (Job, Deployment, StatefulSet; running, succeeded, failed, unknown).
331-342: Health metric (dead man's switch) correctly positioned.
hyperfleet_adapter_last_processed_timestamp_secondsis a gauge with Unix timestamp and includes required labels (component, version, adapter_name). The purpose—detecting silent failures via timestamp staleness—is clear and examples are correct.
351-360: Implementation guidelines correctly reference Metrics Standard.Section 1 instructs developers to follow "Prometheus naming best practices and HyperFleet standards," with explicit reference to the Metrics Standard (line 356) and the
hyperfleet_adapter_prefix requirement. Guidance on snake_case, metric names, and label consistency is aligned with the standard.
361-382: Label best practices section is well-documented.DO/DON'T list covers cardinality, consistency, and sanitization. The "Example of Sanitized Endpoints" (lines 374-381) is concrete and prevents high-cardinality issues. Guidance aligns with the standard's endpoint sanitization section.
385-415: Go code example lacks component and version label injection.The metric collection example (lines 387-414) demonstrates instrumenting
hyperfleet_adapter_events_processed_totalandhyperfleet_adapter_event_processing_duration_seconds, but it only showsWithLabelValues(a.config.Name, event.Data.Kind, status). It does not illustrate howcomponentandversionlabels are injected. This creates a gap: the code example doesn't show how to satisfy the mandatory label requirement.Consider adding a comment or brief explanation of how
componentandversionlabels are set (e.g., via middleware, client library initialization, or a wrapper). Alternatively, confirm that label injection is handled elsewhere in the codebase.
509-533: PromQL queries updated for new metric names.Event processing rate (line 511), latency percentiles (lines 525-532), and average processing time queries all use the new
hyperfleet_adapter_metric names. Queries are syntactically correct and reference the appropriate metric variants (_bucket, _sum, _count).
537-559: Resource and API query examples are correct.Resource creation rate (line 539), success rate (lines 542-546), API latency (lines 553-555), and error rate (line 558) queries all reference updated metric names and filter appropriately. Status code patterns (e.g.,
5..for server errors) are idiomatic PromQL.
564-580: Precondition and error rate queries are well-formed.Precondition pass rate (lines 565-569) and error rate queries (lines 576-579) use the updated metric names and aggregation patterns. The
sum by(component)error aggregation (line 579) correctly references the HyperFleet-standardcomponentlabel.
589-650: Alerting rules correctly reference updated metrics.All three alert definitions (lines 589-650):
- AdapterNotProcessing (line 591): References
hyperfleet_adapter_last_processed_timestamp_secondswith correct time-since-update logic- AdapterHighErrorRate (lines 606-608): Uses updated metric names with correct rate aggregation
- AdapterSlowEventProcessing (line 624): References histogram bucket metric with correct percentile calculation
- AdapterHighAPIErrorRate (lines 640-642): Uses updated metric name and status code patterns
All rules correctly reference
$labels.adapter_namefor per-adapter alerting. Severities and thresholds are reasonable.
719-724: References section comprehensively updated.Cross-component reference to Metrics Standard (line 721) is added. Prometheus, OpenMetrics, and Go client library references remain. No broken or stale links are evident.
hyperfleet/components/sentinel/sentinel-deployment.md (1)
220-220: No action needed. The path reference../../docs/health-endpoints.mdon line 220 is correct—the file exists in the docs directory, and the relative path from the sentinel-deployment.md file is accurate.
02045b2 to
c81ce68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
hyperfleet/standards/metrics.md (1)
116-128: Build info and health metrics are well-defined.The examples correctly include the required
componentandversionlabels. However, consider documenting how these labels are injected at runtime (e.g., via middleware, client library configuration, or environment variables), as this is a prerequisite for consistent implementation across components.hyperfleet/components/sentinel/sentinel-deployment.md (1)
210-218: Implementation requirements correctly mandate standard labels.Line 212 explicitly states that all metrics must include
componentandversionlabels per the Metrics Standard, and lines 215-217 define label value constraints. This provides clear implementation guidance. One minor suggestion: consider adding a note about label value injection mechanism (e.g., via middleware or environment variables) to ease implementation.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🔇 Additional comments (12)
hyperfleet/standards/metrics.md (3)
50-73: LGTM — Metric examples correctly include required labels.All metric examples include both mandatory
componentandversionlabels alongside dimension-specific labels. Examples clearly demonstrate the naming convention and label structure for Counter, Gauge, and Histogram metrics. This provides a solid reference for component teams implementing metrics.
75-94: Label best practices guidance is thorough and well-organized.The "DO" and "DON'T" guidelines address common pitfalls (cardinality, high-cardinality IDs, endpoint sanitization). The sanitization examples are particularly helpful for preventing high-cardinality metrics. This section will serve as a valuable reference for component teams.
233-234: Cross-component reference links are valid.The anchor link #metrics-and-observability exists in sentinel-deployment.md, and adapter-metrics.md is accessible from this location. No changes needed.
hyperfleet/components/sentinel/sentinel-deployment.md (2)
202-208: LGTM — Sentinel metrics now align with standard label requirements.The metrics table now includes
componentandversionas required labels alongside component-specific labels (resource_selector,resource_type, and operation-specific labels likeready_state,broker_type). This addresses the alignment gap flagged in prior review. The metrics definition clearly maps to the HyperFleet Metrics Standard format.
220-222: No action needed—the path is correct.Line 220's reference to
../../docs/health-endpoints.mdis correct. The file exists in the/docsdirectory, not in/standards/. While line 222 correctly references../../standards/metrics.md, these two files reside in different directories by design.hyperfleet/components/adapter/framework/adapter-metrics.md (7)
11-11: LGTM — Cross-references to Metrics Standard are correctly integrated.The file now consistently references the HyperFleet Metrics Standard (line 11) and embeds the required label statement (line 54) and health-endpoints reference (line 56). These additions tie the adapter metrics to the broader HyperFleet standards and make implementation requirements explicit.
Also applies to: 54-54, 56-56
64-80: Metric examples consistently include required labels.All metric examples throughout the file (event processing, resource management, API calls, preconditions, status reporting, errors, and workload monitoring) include both
componentandversionlabels alongside domain-specific labels. This ensures alignment with the Metrics Standard and provides a clear implementation template for adapter authors.Also applies to: 89-108, 119-159
287-296: Error labeling correctly distinguishes error source from component identity.The
error_typeanderror_componentlabels serve different purposes:error_typecategorizes the error, whileerror_componentidentifies the internal adapter component where the error originated. This separation avoids overloading the requiredcomponentlabel and provides finer-grained observability.
384-415: LGTM — Implementation example demonstrates correct label injection pattern.The code example clearly shows how to inject
adapter_name,resource_kind, andstatuslabels viaWithLabelValues(), following the Prometheus Go client library pattern. This will serve as a helpful reference for adapter implementers.
509-559: Dashboard PromQL queries are updated and syntactically sound.All PromQL examples have been updated to use the new
hyperfleet_adapter_*metric names and include appropriate label filters and aggregations. The queries cover key observability areas: event processing rate/latency, resource creation, API performance, preconditions, and errors. These will form a solid foundation for dashboards.
586-650: Alerting rules correctly map to updated metric names.All alert rules (silent failure, high error rate, slow event processing, API errors) have been updated to reference the new
hyperfleet_adapter_*metric names. The thresholds and alert logic appear reasonable (5m for silent failure, 5% error rate, 60s for slow processing, 1% API error rate).
574-580: Clarify intent of "Error rate by component" query and use consistent label.Line 579's query
sum by(component)aggregates by the global adapter component (e.g., "adapter-validation"), but the metric also defineserror_component(line 289) to indicate where errors originated (event_processor,precondition_evaluator,resource_manager,status_reporter). The comment "Error rate by component" is ambiguous—confirm whether you intend to show error rates per adapter (keepcomponent) or per internal error component (useerror_component). Update the query and comment accordingly for clarity.
c81ce68 to
00062dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)
64-80: Add component and version labels to all metric definition label tables.Every metric example shows
componentandversionlabels (e.g.,component="adapter-validation",version="v1.0.0",...), and line 54 correctly states that these labels are required by the Metrics Standard. However, the "Labels" section of each metric definition omits them.For example,
hyperfleet_adapter_events_processed_total(lines 69–72) lists onlyadapter_name,resource_kind, andstatus, but the example (lines 76–79) shows all five labels includingcomponentandversion.This inconsistency between the label tables and the examples makes it unclear to implementers which labels are mandatory. For each metric definition, either:
- Add
componentandversionto the label list, or- Add a note such as: "Also includes required labels:
component,version(see Metrics Standard)"This should be applied to all 8 metric definitions (events_processed_total, event_processing_duration_seconds, resources_created_total, resources_deleted_total, api_requests_total, api_request_duration_seconds, preconditions_evaluated_total, status_reports_total, errors_total, workload_status_total, last_processed_timestamp_seconds).
Also applies to: 89-107, 119-135, 144-159, 170-188, 197-216, 227-242, 253-270, 281-296, 307-327, 331-342
🧹 Nitpick comments (2)
hyperfleet/components/sentinel/sentinel-deployment.md (1)
200-208: Clarify that component and version labels are required for all metrics.The metrics table shows metric definitions with labels like
resource_selector,resource_type,ready_state, andoperation. However, the table does not explicitly listcomponentandversionas columns, even though they are required by the Metrics Standard (referenced on line 212) and appear in the Implementation Requirements section (line 212).To improve clarity, either: (1) add
componentandversionto each metric's label list in the table, or (2) add a note stating "All metrics include required labels:componentandversion(see Metrics Standard)".The current approach of stating the requirement once in Implementation Requirements is reasonable but may cause readers who focus on the table to miss this mandatory requirement.
hyperfleet/components/adapter/framework/adapter-metrics.md (1)
385-415: Clarify how component and version labels are initialized in the Go example.The code example shows metric instrumentation using
WithLabelValues()for metric-specific labels (adapter_name,resource_kind,status), but does not show how the requiredcomponentandversionlabels are set. Since every metric example includes these labels, the implementation guidance should clarify:
- Are
componentandversionlabels applied at collector initialization (e.g., viaNewCounterVecwith these label names)?- Or are they added to each
WithLabelValues()call?This detail is important for implementers to follow the standard correctly.
🔎 Example of how to clarify this
Consider expanding the code example to show metrics initialization or adding a comment that explains label initialization strategy:
// Metrics are initialized with component and version labels that are set // once at startup (e.g., via flag or config) and applied to all observations. // Metric-specific labels (adapter_name, resource_kind, status, etc.) are // added per observation.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (1)
- hyperfleet/standards/metrics.md
🔇 Additional comments (2)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)
56-56: Ensure health-endpoints path is consistent across documentation.Line 56 correctly references
../../../standards/health-endpoints.md. However, the sentinel-deployment.md file (line 220) references../../docs/health-endpoints.mdinstead. Verify that both files use the correct path for the health-endpoints standard (which should be in/standards/per PR #60).Both references should point to the same location. Based on the PR dependency, update sentinel-deployment.md to use
../../standards/health-endpoints.md.hyperfleet/components/sentinel/sentinel-deployment.md (1)
220-220: Path reference is correct; no changes needed.The current path
../../docs/health-endpoints.mdon line 220 correctly resolves to the actual file location at./hyperfleet/docs/health-endpoints.md. Whilemetrics.mdwas moved to the/standards/directory,health-endpoints.mdremains in/docs/and should not be changed.Likely an incorrect or invalid review comment.
00062dd to
3983e30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (2)
- hyperfleet/standards/metrics.md
- hyperfleet/components/sentinel/sentinel-deployment.md
🔇 Additional comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)
1-56: Excellent consistency in metrics standard adoption.The refactoring comprehensively and consistently applies the HyperFleet Metrics Standard:
- All metrics properly renamed to
hyperfleet_adapter_<metric>_<unit>format ✓- Every example (8 sections, 25+ instances) includes mandatory
componentandversionlabels ✓- All PromQL queries and alerting rules updated to reference new metric names ✓
- Cross-component standard references added (lines 11, 56, 352, 717) ✓
- Error labeling updated from
adapter_*toerror_component(line 285) ✓- Link paths validated (e.g.,
../../../standards/metrics.mdcorrectly navigates to shared standard)The implementation guidance and examples are thorough and align well with the standard.
Also applies to: 347-411, 505-646
- Create hyperfleet/standards/metrics.md with: - Naming convention: hyperfleet_<component>_<metric>_<unit> - Required labels: component, version - Standard metrics: build_info, up - Histogram bucket recommendations - Metrics exposition: port 9090, path /metrics - Update adapter-metrics.md: - Change prefix from adapter_* to hyperfleet_adapter_* - Add reference to metrics standard - Update sentinel-deployment.md: - Add reference to metrics standard Follow-up tickets created: - HYPERFLEET-426: Add metrics standard to Claude plugin - HYPERFLEET-427: Add required labels to Sentinel metrics
3983e30 to
aaf82ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
hyperfleet/components/adapter/framework/adapter-metrics.md (2)
381-411: Clarify how required labels (componentandversion) are initialized in the code example.The metric recording code (lines 396–407) shows
WithLabelValues()with 3 label values, but these metrics MUST includecomponentandversionlabels (per line 54). The example doesn't demonstrate how these mandatory labels are set, which could lead implementers to omit them.In Prometheus Go client libraries, these labels are typically added via
WithConstLabels()at metric initialization. Please add a note or extend the example to show this initialization pattern.🔎 Example clarification to add (not a code change but documentation)
Add a comment or section explaining metric initialization:
// At adapter initialization: labels := prometheus.Labels{ "component": "adapter-validation", // from deployment/config "version": "v1.0.0", // from build/config } a.metrics.eventsProcessed = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "hyperfleet_adapter_events_processed_total", Help: "...", ConstLabels: labels, }, []string{"adapter_name", "resource_kind", "status"}, )Then at recording time, use the 3 dynamic labels as shown in the current example.
478-497: Clarify howcomponentandversionlabels are propagated to the ServiceMonitor.The ServiceMonitor example (lines 478–497) shows how Prometheus discovers and scrapes metrics, but it doesn't explain how the mandatory
componentandversionlabels (required by line 54) reach the scraped metrics.In a typical setup, these labels would be:
- Added via pod labels and Prometheus relabeling rules, or
- Set via the ServiceMonitor's
labelsfield and relabeling configurationPlease add a note or example showing how to ensure these labels are present in the scraped metrics (e.g., via relabeling rules or pod labels).
🔎 Example relabeling configuration (for reference)
In the Prometheus scrape config or ServiceMonitor relabeling:
relabel_configs: # Add component label from pod label or fixed value - target_label: component replacement: "adapter-validation" # Add version label from pod annotation or fixed value - source_labels: [__meta_kubernetes_pod_label_version] target_label: versionConsider adding this or a similar pattern to the ServiceMonitor section.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hyperfleet/components/adapter/framework/adapter-metrics.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (1)
- hyperfleet/standards/metrics.md
🔇 Additional comments (5)
hyperfleet/components/sentinel/sentinel-deployment.md (3)
202-208: Metrics table properly updated with required labels.All metrics now include the mandatory
componentandversionlabels as specified in the new metrics standard. The addition of component-specific labels (resource_selector, resource_type, ready_state, operation, broker_type) is well-scoped and documented.
215-217: Label value constraints clearly documented.The explicit allowed values for ready_state, operation, and broker_type are well-defined and will help ensure consistency across implementations.
212-212: All path references in sentinel-deployment.md are correct.The relative paths
../../standards/metrics.mdand../../standards/health-endpoints.mdat lines 212, 220, and 222 correctly resolve to the expected locations. The directory structure is in place: sentinel-deployment.md is athyperfleet/components/sentinel/and the standards files are athyperfleet/standards/. No changes needed.hyperfleet/components/adapter/framework/adapter-metrics.md (2)
570-579: Well-resolved query ambiguity with clarifying comments.The error rate PromQL queries now properly distinguish between:
- Line 574–575: Error rate by adapter deployment (
componentlabel)- Line 577–578: Error rate by internal error source (
error_componentlabel)The clarifying comments resolve the prior ambiguity flagged in the previous review. This makes the intent explicit for operators writing dashboards or alerts.
11-11: All cross-file references verified successfully. The HyperFleet Metrics Standard is properly referenced at lines 11, 352, and 720, and the Health Endpoints Specification is correctly referenced at line 56. Bothhyperfleet/standards/metrics.mdandhyperfleet/standards/health-endpoints.mdexist, and all relative paths (../../../standards/) are correctly formed and point to the expected locations.
ciaranRoche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Summary
Defines the standard conventions for Prometheus metrics across all HyperFleet components (API, Sentinel, Adapters).
New Standard Document
Creates
hyperfleet/standards/metrics.mddefining:hyperfleet_<component>_<metric>_<unit>component,version(MUST for all metrics)build_info,up/metrics, OpenMetrics compatibleUpdated Documents
adapter_*tohyperfleet_adapter_*Follow-up Tickets
Test Plan
/standards)Related
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.