Skip to content

Proposal: Lightweight observability and Prometheus-compatible metrics service #233

@ksapru

Description

@ksapru

Problem

Currently, there is limited visibility into the performance and health of NemoClaw operations. Specifically, tracking blueprint execution latency, API validation success rates, and sandbox lifecycle operations (such as launch) requires manual log parsing or external wrappers. This lack of telemetry makes it difficult to monitor NemoClaw in automated CI/CD pipelines or production-like environments where performance regressions or intermittent API failures need to be surfaced programmatically.

Proposed Solution

Introduce an optional, lightweight metrics service built directly into the plugin. When enabled via the NEMOCLAW_METRICS_ENABLED environment variable, NemoClaw will:

  • Maintain an internal registry of counters and histograms for key operations.
  • Start a minimal HTTP server (defaulting to port 9090) to export these metrics in Prometheus text format at a /metrics endpoint.
  • Instrument critical paths, including execBlueprint (renamed to blueprint_execution for clarity) and API key validation.

Design Goals

  • Zero Overhead: When disabled (default), the metrics logic is bypassed to ensure no performance impact for standard CLI users.
  • Zero Dependencies: The implementation uses native node:http and process.hrtime to maintain a minimal footprint without adding to the dependency tree.
  • Prometheus Compatibility: Adheres to standard exposition formats for immediate integration with existing monitoring stacks.

Open Questions

  • Does this built-in approach align with the project's long-term roadmap, or is there a preference for moving toward OpenTelemetry despite the additional dependency weight?
  • Are there specific sandbox lifecycle events (e.g., eject, migrate) that should be prioritized for initial instrumentation?

I have opened PR #230 with a working implementation of this proposal. I look forward to your feedback on the architectural direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestobservabilityUse this label to improve NemoClaw logging, metrics, and tracing.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions