Skip to content

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Dec 30, 2025

Summary

Defines the standard conventions for Prometheus metrics across all HyperFleet components (API, Sentinel, Adapters).

New Standard Document

Creates hyperfleet/standards/metrics.md defining:

  • Naming convention: hyperfleet_<component>_<metric>_<unit>
  • Required labels: component, version (MUST for all metrics)
  • Standard metrics: build_info, up
  • Metric types: Counter, Gauge, Histogram usage guidelines
  • Histogram buckets: Recommendations for API, event processing, database
  • Exposition: Port 9090, path /metrics, OpenMetrics compatible

Updated Documents

  • adapter-metrics.md: Updated prefix from adapter_* to hyperfleet_adapter_*
  • sentinel-deployment.md: Added reference to metrics standard

Follow-up Tickets

Test Plan

Related

Summary by CodeRabbit

  • Documentation
    • Added the HyperFleet Metrics Standard: naming format, required labels (component, version), metric types, histogram guidance, exposure port/path, and examples.
    • Renamed adapter metrics to hyperfleet_adapter_* and updated all examples, PromQL, dashboards, alerts, and health-endpoint references.
    • Expanded metric label sets and value constraints (ready_state, operation, broker_type); replaced adapter error label with error_component and added cross-component conventions references.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Walkthrough

Adds a HyperFleet Metrics Standard and updates component docs to adopt it: renames adapter metrics from adapter_ to hyperfleet_adapter_; mandates component and version labels on all metrics; replaces component-based error labeling with error_component; standardizes metrics exposition (port/path) and health endpoint references; expands sentinel metrics label sets (adds ready_state, operation, broker_type) and constrains their allowed values; updates all examples, PromQL, dashboards, alerting rules, implementation guidance, and references to reflect new naming and labeling conventions. Minor formatting and sample adjustments applied.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • xueli181114
  • 86254860
  • ciaranRoche

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a metrics conventions standard to HyperFleet, which aligns with the PR's primary objective of introducing hyperfleet/standards/metrics.md with standardized Prometheus naming conventions and required labels.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)

67-78: Update all metric examples to include required component and version labels per the metrics standard.

The metrics standard (hyperfleet/standards/metrics.md lines 60–65) mandates that all metrics include component and version labels. However, all examples in this file use adapter_name instead of component and omit the version label entirely.

Apply the following pattern to all metric examples throughout the document:

🔎 Example fix for `hyperfleet_adapter_events_processed_total` (lines 74–77)
- hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="success"} 1523
- hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="error"} 12
- hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="Cluster",status="skipped"} 89
- hyperfleet_adapter_events_processed_total{adapter_name="validation",resource_kind="NodePool",status="success"} 342
+ hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 1523
+ hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="error"} 12
+ hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="skipped"} 89
+ hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="NodePool",status="success"} 342

Apply the same pattern to all other metric examples (lines 101–105, 130–132, 182–185, 210–213, 237–239, 264–267, 291–293, etc.).

🧹 Nitpick comments (1)
hyperfleet/standards/metrics.md (1)

28-30: Add language identifier to code fences (markdown lint MD040).

Lines 28–30 and 90–94 have fenced code blocks without a language identifier. These should specify a language for syntax highlighting and linting compliance.

🔎 Suggested fixes

Line 28–30 (format example):

-```
+```text
 hyperfleet_<component>_<metric_name>_<unit>
-```
+```

Line 90–94 (endpoint sanitization):

-```
+```text
 /clusters/cls-abc123              → /clusters/{id}
 /clusters/cls-abc/nodepools/np-1  → /clusters/{id}/nodepools/{id}
 /namespaces/ns-123/jobs/job-456   → /namespaces/{ns}/jobs/{name}
-```
+```

Also applies to: 90-94

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb4ff0f and 236e656.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
hyperfleet/standards/metrics.md

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


90-90: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (3)
hyperfleet/components/sentinel/sentinel-deployment.md (1)

220-221: Reference path and placement are correct.

The link to the metrics standard is properly formatted and the relative path is correct. This addition appropriately directs implementers to the standard.

hyperfleet/components/adapter/framework/adapter-metrics.md (2)

11-11: Metric naming convention updates are consistently applied.

The renaming from adapter_* to hyperfleet_adapter_* is thorough and consistent across:

  • All metric definitions and their documentation
  • Example outputs and PromQL queries
  • Alerting rules
  • Dashboard queries

This aligns well with the new naming standard.

Also applies to: 62-78, 87-106, 117-156, 168-214, 225-240, 251-268, 279-300, 305-320, 329-340


354-357: Cross-reference to metrics standard is correct.

Line 354 appropriately directs to the standard for naming conventions. The path at line 11 ([HyperFleet Metrics Standard](../../../standards/metrics.md)) is correctly formatted.

@rafabene rafabene force-pushed the HYPERFLEET-382-metrics-standard branch from 236e656 to 02045b2 Compare December 30, 2025 19:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 236e656 and 02045b2.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
hyperfleet/standards/metrics.md

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


90-90: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (20)
hyperfleet/components/adapter/framework/adapter-metrics.md (19)

47-56: Excellent header and format documentation.

The metrics format section clearly specifies Prometheus format, OpenMetrics compatibility, and explicitly mandates component and version labels with a reference to the Metrics Standard. Health endpoints reference is clear.


64-80: Event processing metric naming and labeling is consistent.

hyperfleet_adapter_events_processed_total is properly named with the hyperfleet_adapter_ prefix, and all three examples correctly include component and version labels alongside adapter-specific dimensions (adapter_name, resource_kind, status).


89-108: Histogram metric examples properly labeled.

hyperfleet_adapter_event_processing_duration_seconds examples correctly show the metric name, bucket structure, and required labels (component, version) along with domain-specific labels (adapter_name, resource_kind, status). Buckets (0.1, 0.5, 1, 2, 5, 10, 30, 60, 120) are well-reasoned for event processing.


119-135: Resource management metrics consistent with standard.

Both hyperfleet_adapter_resources_created_total and hyperfleet_adapter_resources_deleted_total include required labels (component, version) and resource-specific labels (adapter_name, resource_type, namespace, status). Naming and labeling conventions are uniform.


170-188: API metrics properly labeled and sanitized.

hyperfleet_adapter_api_requests_total examples demonstrate correct prefix, required labels (component, version), and sanitized endpoint paths (e.g., /clusters/{id}, /namespaces/{ns}/jobs) with no high-cardinality IDs. PromQL examples are updated to reflect new metric names.


197-216: API request duration histogram is well-structured.

hyperfleet_adapter_api_request_duration_seconds examples show proper bucket values (0.01, 0.05, 0.1, 0.5, 1, 2, 5) tuned for API latency, and all examples include required labels (component, version). Histogram sum and count are correctly formatted.


227-242: Precondition evaluation metric follows conventions.

hyperfleet_adapter_preconditions_evaluated_total properly names, labels (component, version, adapter_name, precondition_name, result), and documents label values (pass, fail, error).


253-270: Status reporting metric is comprehensive.

hyperfleet_adapter_status_reports_total examples include required labels (component, version) and contextual boolean labels (applied, available). Labels are kept to low cardinality.


281-296: Error metric labels refined for clarity.

hyperfleet_adapter_errors_total now uses error_component (line 289) instead of a generic "component" for internal error location (event_processor, precondition_evaluator, resource_manager, status_reporter). This correctly disambiguates the HyperFleet-standard component label (adapter name) from internal error source. Naming is clear and examples are correct.


307-322: Workload monitoring metric is properly structured.

hyperfleet_adapter_workload_status_total includes required labels (component, version) and domain labels (adapter_name, workload_type, status). Label values are documented (Job, Deployment, StatefulSet; running, succeeded, failed, unknown).


331-342: Health metric (dead man's switch) correctly positioned.

hyperfleet_adapter_last_processed_timestamp_seconds is a gauge with Unix timestamp and includes required labels (component, version, adapter_name). The purpose—detecting silent failures via timestamp staleness—is clear and examples are correct.


351-360: Implementation guidelines correctly reference Metrics Standard.

Section 1 instructs developers to follow "Prometheus naming best practices and HyperFleet standards," with explicit reference to the Metrics Standard (line 356) and the hyperfleet_adapter_ prefix requirement. Guidance on snake_case, metric names, and label consistency is aligned with the standard.


361-382: Label best practices section is well-documented.

DO/DON'T list covers cardinality, consistency, and sanitization. The "Example of Sanitized Endpoints" (lines 374-381) is concrete and prevents high-cardinality issues. Guidance aligns with the standard's endpoint sanitization section.


385-415: Go code example lacks component and version label injection.

The metric collection example (lines 387-414) demonstrates instrumenting hyperfleet_adapter_events_processed_total and hyperfleet_adapter_event_processing_duration_seconds, but it only shows WithLabelValues(a.config.Name, event.Data.Kind, status). It does not illustrate how component and version labels are injected. This creates a gap: the code example doesn't show how to satisfy the mandatory label requirement.

Consider adding a comment or brief explanation of how component and version labels are set (e.g., via middleware, client library initialization, or a wrapper). Alternatively, confirm that label injection is handled elsewhere in the codebase.


509-533: PromQL queries updated for new metric names.

Event processing rate (line 511), latency percentiles (lines 525-532), and average processing time queries all use the new hyperfleet_adapter_ metric names. Queries are syntactically correct and reference the appropriate metric variants (_bucket, _sum, _count).


537-559: Resource and API query examples are correct.

Resource creation rate (line 539), success rate (lines 542-546), API latency (lines 553-555), and error rate (line 558) queries all reference updated metric names and filter appropriately. Status code patterns (e.g., 5.. for server errors) are idiomatic PromQL.


564-580: Precondition and error rate queries are well-formed.

Precondition pass rate (lines 565-569) and error rate queries (lines 576-579) use the updated metric names and aggregation patterns. The sum by(component) error aggregation (line 579) correctly references the HyperFleet-standard component label.


589-650: Alerting rules correctly reference updated metrics.

All three alert definitions (lines 589-650):

  • AdapterNotProcessing (line 591): References hyperfleet_adapter_last_processed_timestamp_seconds with correct time-since-update logic
  • AdapterHighErrorRate (lines 606-608): Uses updated metric names with correct rate aggregation
  • AdapterSlowEventProcessing (line 624): References histogram bucket metric with correct percentile calculation
  • AdapterHighAPIErrorRate (lines 640-642): Uses updated metric name and status code patterns

All rules correctly reference $labels.adapter_name for per-adapter alerting. Severities and thresholds are reasonable.


719-724: References section comprehensively updated.

Cross-component reference to Metrics Standard (line 721) is added. Prometheus, OpenMetrics, and Go client library references remain. No broken or stale links are evident.

hyperfleet/components/sentinel/sentinel-deployment.md (1)

220-220: No action needed. The path reference ../../docs/health-endpoints.md on line 220 is correct—the file exists in the docs directory, and the relative path from the sentinel-deployment.md file is accurate.

@rafabene rafabene force-pushed the HYPERFLEET-382-metrics-standard branch from 02045b2 to c81ce68 Compare December 30, 2025 19:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
hyperfleet/standards/metrics.md (1)

116-128: Build info and health metrics are well-defined.

The examples correctly include the required component and version labels. However, consider documenting how these labels are injected at runtime (e.g., via middleware, client library configuration, or environment variables), as this is a prerequisite for consistent implementation across components.

hyperfleet/components/sentinel/sentinel-deployment.md (1)

210-218: Implementation requirements correctly mandate standard labels.

Line 212 explicitly states that all metrics must include component and version labels per the Metrics Standard, and lines 215-217 define label value constraints. This provides clear implementation guidance. One minor suggestion: consider adding a note about label value injection mechanism (e.g., via middleware or environment variables) to ease implementation.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02045b2 and c81ce68.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🔇 Additional comments (12)
hyperfleet/standards/metrics.md (3)

50-73: LGTM — Metric examples correctly include required labels.

All metric examples include both mandatory component and version labels alongside dimension-specific labels. Examples clearly demonstrate the naming convention and label structure for Counter, Gauge, and Histogram metrics. This provides a solid reference for component teams implementing metrics.


75-94: Label best practices guidance is thorough and well-organized.

The "DO" and "DON'T" guidelines address common pitfalls (cardinality, high-cardinality IDs, endpoint sanitization). The sanitization examples are particularly helpful for preventing high-cardinality metrics. This section will serve as a valuable reference for component teams.


233-234: Cross-component reference links are valid.

The anchor link #metrics-and-observability exists in sentinel-deployment.md, and adapter-metrics.md is accessible from this location. No changes needed.

hyperfleet/components/sentinel/sentinel-deployment.md (2)

202-208: LGTM — Sentinel metrics now align with standard label requirements.

The metrics table now includes component and version as required labels alongside component-specific labels (resource_selector, resource_type, and operation-specific labels like ready_state, broker_type). This addresses the alignment gap flagged in prior review. The metrics definition clearly maps to the HyperFleet Metrics Standard format.


220-222: No action needed—the path is correct.

Line 220's reference to ../../docs/health-endpoints.md is correct. The file exists in the /docs directory, not in /standards/. While line 222 correctly references ../../standards/metrics.md, these two files reside in different directories by design.

hyperfleet/components/adapter/framework/adapter-metrics.md (7)

11-11: LGTM — Cross-references to Metrics Standard are correctly integrated.

The file now consistently references the HyperFleet Metrics Standard (line 11) and embeds the required label statement (line 54) and health-endpoints reference (line 56). These additions tie the adapter metrics to the broader HyperFleet standards and make implementation requirements explicit.

Also applies to: 54-54, 56-56


64-80: Metric examples consistently include required labels.

All metric examples throughout the file (event processing, resource management, API calls, preconditions, status reporting, errors, and workload monitoring) include both component and version labels alongside domain-specific labels. This ensures alignment with the Metrics Standard and provides a clear implementation template for adapter authors.

Also applies to: 89-108, 119-159


287-296: Error labeling correctly distinguishes error source from component identity.

The error_type and error_component labels serve different purposes: error_type categorizes the error, while error_component identifies the internal adapter component where the error originated. This separation avoids overloading the required component label and provides finer-grained observability.


384-415: LGTM — Implementation example demonstrates correct label injection pattern.

The code example clearly shows how to inject adapter_name, resource_kind, and status labels via WithLabelValues(), following the Prometheus Go client library pattern. This will serve as a helpful reference for adapter implementers.


509-559: Dashboard PromQL queries are updated and syntactically sound.

All PromQL examples have been updated to use the new hyperfleet_adapter_* metric names and include appropriate label filters and aggregations. The queries cover key observability areas: event processing rate/latency, resource creation, API performance, preconditions, and errors. These will form a solid foundation for dashboards.


586-650: Alerting rules correctly map to updated metric names.

All alert rules (silent failure, high error rate, slow event processing, API errors) have been updated to reference the new hyperfleet_adapter_* metric names. The thresholds and alert logic appear reasonable (5m for silent failure, 5% error rate, 60s for slow processing, 1% API error rate).


574-580: Clarify intent of "Error rate by component" query and use consistent label.

Line 579's query sum by(component) aggregates by the global adapter component (e.g., "adapter-validation"), but the metric also defines error_component (line 289) to indicate where errors originated (event_processor, precondition_evaluator, resource_manager, status_reporter). The comment "Error rate by component" is ambiguous—confirm whether you intend to show error rates per adapter (keep component) or per internal error component (use error_component). Update the query and comment accordingly for clarity.

@rafabene rafabene force-pushed the HYPERFLEET-382-metrics-standard branch from c81ce68 to 00062dd Compare December 30, 2025 19:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)

64-80: Add component and version labels to all metric definition label tables.

Every metric example shows component and version labels (e.g., component="adapter-validation",version="v1.0.0",...), and line 54 correctly states that these labels are required by the Metrics Standard. However, the "Labels" section of each metric definition omits them.

For example, hyperfleet_adapter_events_processed_total (lines 69–72) lists only adapter_name, resource_kind, and status, but the example (lines 76–79) shows all five labels including component and version.

This inconsistency between the label tables and the examples makes it unclear to implementers which labels are mandatory. For each metric definition, either:

  1. Add component and version to the label list, or
  2. Add a note such as: "Also includes required labels: component, version (see Metrics Standard)"

This should be applied to all 8 metric definitions (events_processed_total, event_processing_duration_seconds, resources_created_total, resources_deleted_total, api_requests_total, api_request_duration_seconds, preconditions_evaluated_total, status_reports_total, errors_total, workload_status_total, last_processed_timestamp_seconds).

Also applies to: 89-107, 119-135, 144-159, 170-188, 197-216, 227-242, 253-270, 281-296, 307-327, 331-342

🧹 Nitpick comments (2)
hyperfleet/components/sentinel/sentinel-deployment.md (1)

200-208: Clarify that component and version labels are required for all metrics.

The metrics table shows metric definitions with labels like resource_selector, resource_type, ready_state, and operation. However, the table does not explicitly list component and version as columns, even though they are required by the Metrics Standard (referenced on line 212) and appear in the Implementation Requirements section (line 212).

To improve clarity, either: (1) add component and version to each metric's label list in the table, or (2) add a note stating "All metrics include required labels: component and version (see Metrics Standard)".

The current approach of stating the requirement once in Implementation Requirements is reasonable but may cause readers who focus on the table to miss this mandatory requirement.

hyperfleet/components/adapter/framework/adapter-metrics.md (1)

385-415: Clarify how component and version labels are initialized in the Go example.

The code example shows metric instrumentation using WithLabelValues() for metric-specific labels (adapter_name, resource_kind, status), but does not show how the required component and version labels are set. Since every metric example includes these labels, the implementation guidance should clarify:

  • Are component and version labels applied at collector initialization (e.g., via NewCounterVec with these label names)?
  • Or are they added to each WithLabelValues() call?

This detail is important for implementers to follow the standard correctly.

🔎 Example of how to clarify this

Consider expanding the code example to show metrics initialization or adding a comment that explains label initialization strategy:

// Metrics are initialized with component and version labels that are set
// once at startup (e.g., via flag or config) and applied to all observations.
// Metric-specific labels (adapter_name, resource_kind, status, etc.) are 
// added per observation.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c81ce68 and 00062dd.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/standards/metrics.md
🔇 Additional comments (2)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)

56-56: Ensure health-endpoints path is consistent across documentation.

Line 56 correctly references ../../../standards/health-endpoints.md. However, the sentinel-deployment.md file (line 220) references ../../docs/health-endpoints.md instead. Verify that both files use the correct path for the health-endpoints standard (which should be in /standards/ per PR #60).

Both references should point to the same location. Based on the PR dependency, update sentinel-deployment.md to use ../../standards/health-endpoints.md.

hyperfleet/components/sentinel/sentinel-deployment.md (1)

220-220: Path reference is correct; no changes needed.

The current path ../../docs/health-endpoints.md on line 220 correctly resolves to the actual file location at ./hyperfleet/docs/health-endpoints.md. While metrics.md was moved to the /standards/ directory, health-endpoints.md remains in /docs/ and should not be changed.

Likely an incorrect or invalid review comment.

@rafabene rafabene force-pushed the HYPERFLEET-382-metrics-standard branch from 00062dd to 3983e30 Compare January 2, 2026 13:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00062dd and 3983e30.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • hyperfleet/standards/metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
🔇 Additional comments (1)
hyperfleet/components/adapter/framework/adapter-metrics.md (1)

1-56: Excellent consistency in metrics standard adoption.

The refactoring comprehensively and consistently applies the HyperFleet Metrics Standard:

  • All metrics properly renamed to hyperfleet_adapter_<metric>_<unit> format ✓
  • Every example (8 sections, 25+ instances) includes mandatory component and version labels ✓
  • All PromQL queries and alerting rules updated to reference new metric names ✓
  • Cross-component standard references added (lines 11, 56, 352, 717) ✓
  • Error labeling updated from adapter_* to error_component (line 285) ✓
  • Link paths validated (e.g., ../../../standards/metrics.md correctly navigates to shared standard)

The implementation guidance and examples are thorough and align well with the standard.

Also applies to: 347-411, 505-646

- Create hyperfleet/standards/metrics.md with:
  - Naming convention: hyperfleet_<component>_<metric>_<unit>
  - Required labels: component, version
  - Standard metrics: build_info, up
  - Histogram bucket recommendations
  - Metrics exposition: port 9090, path /metrics

- Update adapter-metrics.md:
  - Change prefix from adapter_* to hyperfleet_adapter_*
  - Add reference to metrics standard

- Update sentinel-deployment.md:
  - Add reference to metrics standard

Follow-up tickets created:
- HYPERFLEET-426: Add metrics standard to Claude plugin
- HYPERFLEET-427: Add required labels to Sentinel metrics
@rafabene rafabene force-pushed the HYPERFLEET-382-metrics-standard branch from 3983e30 to aaf82ad Compare January 2, 2026 13:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
hyperfleet/components/adapter/framework/adapter-metrics.md (2)

381-411: Clarify how required labels (component and version) are initialized in the code example.

The metric recording code (lines 396–407) shows WithLabelValues() with 3 label values, but these metrics MUST include component and version labels (per line 54). The example doesn't demonstrate how these mandatory labels are set, which could lead implementers to omit them.

In Prometheus Go client libraries, these labels are typically added via WithConstLabels() at metric initialization. Please add a note or extend the example to show this initialization pattern.

🔎 Example clarification to add (not a code change but documentation)

Add a comment or section explaining metric initialization:

// At adapter initialization:
labels := prometheus.Labels{
    "component": "adapter-validation",  // from deployment/config
    "version":   "v1.0.0",              // from build/config
}

a.metrics.eventsProcessed = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name:        "hyperfleet_adapter_events_processed_total",
        Help:        "...",
        ConstLabels: labels,
    },
    []string{"adapter_name", "resource_kind", "status"},
)

Then at recording time, use the 3 dynamic labels as shown in the current example.


478-497: Clarify how component and version labels are propagated to the ServiceMonitor.

The ServiceMonitor example (lines 478–497) shows how Prometheus discovers and scrapes metrics, but it doesn't explain how the mandatory component and version labels (required by line 54) reach the scraped metrics.

In a typical setup, these labels would be:

  1. Added via pod labels and Prometheus relabeling rules, or
  2. Set via the ServiceMonitor's labels field and relabeling configuration

Please add a note or example showing how to ensure these labels are present in the scraped metrics (e.g., via relabeling rules or pod labels).

🔎 Example relabeling configuration (for reference)

In the Prometheus scrape config or ServiceMonitor relabeling:

relabel_configs:
  # Add component label from pod label or fixed value
  - target_label: component
    replacement: "adapter-validation"
  # Add version label from pod annotation or fixed value
  - source_labels: [__meta_kubernetes_pod_label_version]
    target_label: version

Consider adding this or a similar pattern to the ServiceMonitor section.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3983e30 and aaf82ad.

📒 Files selected for processing (3)
  • hyperfleet/components/adapter/framework/adapter-metrics.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/standards/metrics.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/standards/metrics.md
🔇 Additional comments (5)
hyperfleet/components/sentinel/sentinel-deployment.md (3)

202-208: Metrics table properly updated with required labels.

All metrics now include the mandatory component and version labels as specified in the new metrics standard. The addition of component-specific labels (resource_selector, resource_type, ready_state, operation, broker_type) is well-scoped and documented.


215-217: Label value constraints clearly documented.

The explicit allowed values for ready_state, operation, and broker_type are well-defined and will help ensure consistency across implementations.


212-212: All path references in sentinel-deployment.md are correct.

The relative paths ../../standards/metrics.md and ../../standards/health-endpoints.md at lines 212, 220, and 222 correctly resolve to the expected locations. The directory structure is in place: sentinel-deployment.md is at hyperfleet/components/sentinel/ and the standards files are at hyperfleet/standards/. No changes needed.

hyperfleet/components/adapter/framework/adapter-metrics.md (2)

570-579: Well-resolved query ambiguity with clarifying comments.

The error rate PromQL queries now properly distinguish between:

  • Line 574–575: Error rate by adapter deployment (component label)
  • Line 577–578: Error rate by internal error source (error_component label)

The clarifying comments resolve the prior ambiguity flagged in the previous review. This makes the intent explicit for operators writing dashboards or alerts.


11-11: All cross-file references verified successfully. The HyperFleet Metrics Standard is properly referenced at lines 11, 352, and 720, and the Health Endpoints Specification is correctly referenced at line 56. Both hyperfleet/standards/metrics.md and hyperfleet/standards/health-endpoints.md exist, and all relative paths (../../../standards/) are correctly formed and point to the expected locations.

Copy link
Contributor

@ciaranRoche ciaranRoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ciaranRoche ciaranRoche merged commit ded5d17 into openshift-hyperfleet:main Jan 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants