diff --git a/docs/METRICS_CATALOG.md b/docs/METRICS_CATALOG.md
new file mode 100644
index 000000000..148d0dad7
--- /dev/null
+++ b/docs/METRICS_CATALOG.md
@@ -0,0 +1,341 @@
+# Metrics Catalog for VTEX IO Node Apps
+
+This document provides a comprehensive catalog of all metrics available in the `@vtex/api` library, organized by their implementation (diagnostics-based vs legacy).
+
+> **Looking for migration guidance?** See [METRICS_OVERVIEW.md](./METRICS_OVERVIEW.md) for migration patterns and best practices.
+
+## Table of Contents
+
+- [Metrics Architecture Overview](#metrics-architecture-overview)
+- [Complete Metrics Visual Summary](#complete-metrics-visual-summary)
+- [Diagnostics-Related Metrics](#diagnostics-related-metrics)
+- [Legacy Metrics (Non-Diagnostics)](#legacy-metrics-non-diagnostics)
+
+---
+
+## Metrics Architecture Overview
+
+The `@vtex/api` library has two coexisting metrics systems during the migration period:
+
+1. **Diagnostics-Based Metrics** (New) - Uses `@vtex/diagnostics-nodejs` with OpenTelemetry
+2. **Legacy Metrics** (Existing) - Uses `prom-client`, `MetricsAccumulator`, and console.log exports
+
+Both systems operate independently and can coexist. The goal is to gradually migrate to diagnostics-based metrics while maintaining backward compatibility.
+
+### Two Categories of Metrics
+
+| Category | Description | Initialization | Customization |
+|----------|-------------|----------------|---------------|
+| **Runtime/Infrastructure** | System-wide metrics for capacity planning and SLOs | Once at startup | Limited (configured at startup) |
+| **App/Middleware** | Operation-specific metrics for debugging and optimization | Per-request/operation | Rich (can add custom attributes) |
+
+---
+
+## Complete Metrics Visual Summary
+
+```
+All Metrics in node-vtex-api
+│
+├── 🆕 Diagnostics-Related Metrics (OpenTelemetry-based)
+│   │
+│   ├── 🏗️ Runtime/Infrastructure Metrics
+│   │   │
+│   │   ├── OTel Request Instruments (service/metrics/metrics.ts)
+│   │   │   ├── io_http_requests_current (Gauge)
+│   │   │   ├── runtime_http_requests_duration_milliseconds (Histogram)
+│   │   │   ├── runtime_http_requests_total (Counter)
+│   │   │   ├── runtime_http_response_size_bytes (Histogram)
+│   │   │   └── runtime_http_aborted_requests_total (Counter)
+│   │   │
+│   │   ├── Auto-instrumentation (telemetry/client.ts)
+│   │   │   ├── http.server.duration (Histogram - HttpInstrumentation)
+│   │   │   ├── http.server.request.size (Histogram)
+│   │   │   ├── http.server.response.size (Histogram)
+│   │   │   ├── http.client.duration (Histogram - HttpInstrumentation)
+│   │   │   ├── http.client.request.size (Histogram)
+│   │   │   ├── http.client.response.size (Histogram)
+│   │   │   └── Koa-enhanced HTTP metrics (KoaInstrumentation)
+│   │   │
+│   │   └── Host Metrics (HostMetricsInstrumentation)
+│   │       ├── process.runtime.nodejs.memory.heap.used (Gauge)
+│   │       ├── process.runtime.nodejs.memory.heap.total (Gauge)
+│   │       ├── process.runtime.nodejs.memory.rss (Gauge)
+│   │       ├── process.runtime.nodejs.memory.external (Gauge)
+│   │       ├── process.runtime.nodejs.memory.arrayBuffers (Gauge)
+│   │       ├── process.runtime.nodejs.event_loop.lag.max (Gauge)
+│   │       ├── process.runtime.nodejs.event_loop.lag.min (Gauge)
+│   │       ├── process.cpu.utilization (Gauge)
+│   │       ├── system.cpu.utilization (Gauge)
+│   │       ├── system.memory.usage (Gauge)
+│   │       ├── system.memory.utilization (Gauge)
+│   │       ├── system.network.io (Counter)
+│   │       └── system.network.errors (Counter)
+│   │
+│   └── 📱 App/Middleware Metrics
+│       │
+│       ├── HTTP Client (HttpClient/middlewares/metrics.ts)
+│       │   ├── latency histogram (via recordLatency)
+│       │   ├── http_client_requests_total (Counter)
+│       │   ├── http_client_cache_total (Counter)
+│       │   └── http_client_requests_retried_total (Counter)
+│       │
+│       ├── HTTP Handler (worker/runtime/http/middlewares/*)
+│       │   ├── latency histogram (via recordLatency)
+│       │   ├── http_handler_requests_total (Counter)
+│       │   ├── http_server_requests_total (Counter)
+│       │   ├── http_server_requests_closed_total (Counter)
+│       │   └── http_server_requests_aborted_total (Counter)
+│       │
+│       ├── GraphQL (worker/runtime/graphql/schema/schemaDirectives/Metric.ts)
+│       │   ├── latency histogram (via recordLatency)
+│       │   └── graphql_field_requests_total (Counter)
+│       │
+│       └── HTTP Agent (HttpClient/middlewares/request/HttpAgentSingleton.ts)
+│           ├── http_agent_sockets_current (Gauge)
+│           ├── http_agent_free_sockets_current (Gauge)
+│           └── http_agent_pending_requests_current (Gauge)
+│
+└── 🏛️ Legacy Metrics (Non-Diagnostics)
+    │
+    ├── 📊 Prometheus Metrics (prom-client, exposed on /metrics)
+    │   │
+    │   ├── Request Metrics (service/tracing/metrics/*)
+    │   │   ├── runtime_http_requests_total (Counter) - labels: status_code, handler
+    │   │   ├── runtime_http_aborted_requests_total (Counter) - labels: handler
+    │   │   ├── runtime_http_requests_duration_milliseconds (Histogram)
+    │   │   ├── runtime_http_response_size_bytes (Histogram)
+    │   │   └── io_http_requests_current (Gauge)
+    │   │
+    │   ├── Event Loop Metrics (service/tracing/metrics/measurers/*)
+    │   │   ├── runtime_event_loop_lag_max_between_scrapes_seconds (Gauge)
+    │   │   └── runtime_event_loop_lag_percentiles_between_scrapes_seconds (Gauge)
+    │   │
+    │   └── Default Node.js Metrics (collectDefaultMetrics)
+    │       ├── nodejs_gc_duration_seconds (Histogram)
+    │       ├── nodejs_active_handles_total (Gauge)
+    │       ├── nodejs_active_requests_total (Gauge)
+    │       ├── nodejs_heap_size_total_bytes (Gauge)
+    │       ├── nodejs_heap_size_used_bytes (Gauge)
+    │       ├── nodejs_external_memory_bytes (Gauge)
+    │       ├── nodejs_version_info (Gauge)
+    │       ├── process_cpu_user_seconds_total (Counter)
+    │       ├── process_cpu_system_seconds_total (Counter)
+    │       ├── process_resident_memory_bytes (Gauge)
+    │       └── process_start_time_seconds (Gauge)
+    │
+    ├── 📝 MetricsAccumulator (console.log exports via trackStatus)
+    │   │
+    │   ├── HTTP Handler Metrics (worker/runtime/http/middlewares/timings.ts)
+    │   │   └── http-handler-{route_id}
+    │   │       ├── Aggregates: count, mean, median, percentile95, percentile99, max
+    │   │       └── Extensions: success, error, timeout, aborted, cancelled
+    │   │
+    │   ├── HTTP Client Metrics (HttpClient/middlewares/metrics.ts)
+    │   │   └── http-client-{metric_name}
+    │   │       ├── Aggregates: count, mean, median, percentile95, percentile99, max
+    │   │       └── Extensions: 
+    │   │           ├── Status: success, error, timeout, aborted, cancelled
+    │   │           ├── Cache: success-hit, success-miss, success-inflight, success-memoized
+    │   │           └── Retry: retry-{status}-{count}
+    │   │
+    │   ├── GraphQL Metrics (worker/runtime/graphql/schema/schemaDirectives/Metric.ts)
+    │   │   └── graphql-metric-{field_name}
+    │   │       ├── Aggregates: count, mean, median, percentile95, percentile99, max
+    │   │       └── Extensions: success, error
+    │   │
+    │   ├── System Metrics (metrics/MetricsAccumulator.ts)
+    │   │   ├── cpu - user (μs), system (μs)
+    │   │   ├── memory - rss, heapTotal, heapUsed, external, arrayBuffers
+    │   │   ├── httpAgent - sockets, freeSockets, pendingRequests
+    │   │   └── incomingRequest - total, closed, aborted
+    │   │
+    │   └── Cache Metrics (via trackCache)
+    │       └── {cache_name}-cache
+    │           ├── LRU: itemCount, length, disposedItems, hitRate, hits, max, total
+    │           ├── Disk: hits, total
+    │           └── Multilayer: hitRate, hits, total
+    │
+    └── 💰 Billing Metrics (console.log with __VTEX_IO_BILLING)
+        └── Process time per handler
+            ├── account, app, handler
+            ├── production, routeType (public_route/private_route)
+            ├── timestamp, value (milliseconds)
+            └── vendor, workspace
+```
+
+---
+
+## Diagnostics-Related Metrics
+
+### Runtime/Infrastructure Metrics
+
+These are system-wide metrics declared at service initialization level.
+
+#### OTel Request Instruments
+
+**Source:** `service/metrics/metrics.ts`
+
+| Metric Name | Type | Description |
+|-------------|------|-------------|
+| `io_http_requests_current` | Gauge | Current number of requests in progress |
+| `runtime_http_requests_duration_milliseconds` | Histogram | Incoming HTTP request duration |
+| `runtime_http_requests_total` | Counter | Total number of HTTP requests |
+| `runtime_http_response_size_bytes` | Histogram | Outgoing response sizes |
+| `runtime_http_aborted_requests_total` | Counter | Total aborted HTTP requests |
+
+#### Auto-instrumentation Metrics
+
+**Source:** `telemetry/client.ts` (via OpenTelemetry instrumentations)
+
+| Metric Name | Type | Source | Description |
+|-------------|------|--------|-------------|
+| `http.server.duration` | Histogram | HttpInstrumentation | HTTP server request duration |
+| `http.client.duration` | Histogram | HttpInstrumentation | HTTP client request duration |
+| `process.runtime.nodejs.memory.*` | Gauge | HostMetrics | Node.js memory metrics |
+| `process.cpu.utilization` | Gauge | HostMetrics | Process CPU utilization |
+| `system.cpu.utilization` | Gauge | HostMetrics | System CPU utilization |
+| `system.memory.usage` | Gauge | HostMetrics | System memory usage |
+
+### App/Middleware Metrics
+
+These are operation-specific metrics recorded in middleware components.
+
+#### HTTP Client Metrics
+
+**Source:** `HttpClient/middlewares/metrics.ts`
+
+| Metric Name | Type | Attributes |
+|-------------|------|------------|
+| Latency histogram | Histogram | `component`, `client_metric`, `status_code`, `status`, `cache_state` |
+| `http_client_requests_total` | Counter | `component`, `client_metric`, `status_code`, `status` |
+| `http_client_cache_total` | Counter | `component`, `client_metric`, `status_code`, `status`, `cache_state` |
+| `http_client_requests_retried_total` | Counter | `component`, `client_metric`, `status_code`, `status`, `retry_count` |
+
+#### HTTP Handler Metrics
+
+**Source:** `worker/runtime/http/middlewares/timings.ts`, `requestStats.ts`
+
+| Metric Name | Type | Attributes |
+|-------------|------|------------|
+| Latency histogram | Histogram | `component`, `route_id`, `route_type`, `status_code`, `status` |
+| `http_handler_requests_total` | Counter | `component`, `route_id`, `route_type`, `status_code`, `status` |
+| `http_server_requests_total` | Counter | `route_id`, `route_type`, `status_code` |
+| `http_server_requests_closed_total` | Counter | `route_id`, `route_type`, `status_code` |
+| `http_server_requests_aborted_total` | Counter | `route_id`, `route_type`, `status_code` |
+
+#### GraphQL Metrics
+
+**Source:** `worker/runtime/graphql/schema/schemaDirectives/Metric.ts`
+
+| Metric Name | Type | Attributes |
+|-------------|------|------------|
+| Latency histogram | Histogram | `component`, `field_name`, `status` |
+| `graphql_field_requests_total` | Counter | `component`, `field_name`, `status` |
+
+#### HTTP Agent Metrics
+
+**Source:** `HttpClient/middlewares/request/HttpAgentSingleton.ts`
+
+| Metric Name | Type | Description |
+|-------------|------|-------------|
+| `http_agent_sockets_current` | Gauge | Active sockets |
+| `http_agent_free_sockets_current` | Gauge | Free sockets in pool |
+| `http_agent_pending_requests_current` | Gauge | Pending requests waiting for socket |
+
+---
+
+## Legacy Metrics (Non-Diagnostics)
+
+### Prometheus Metrics
+
+Exposed on the `/metrics` endpoint via `prom-client`.
+
+#### Request Metrics
+
+**Source:** `service/tracing/metrics/MetricNames.ts`
+
+| Metric Name | Type | Labels | Description |
+|-------------|------|--------|-------------|
+| `runtime_http_requests_total` | Counter | `status_code`, `handler` | Total HTTP requests |
+| `runtime_http_aborted_requests_total` | Counter | `handler` | Aborted HTTP requests |
+| `runtime_http_requests_duration_milliseconds` | Histogram | `handler` | Request duration (buckets: 10-5120ms) |
+| `runtime_http_response_size_bytes` | Histogram | `handler` | Response sizes (buckets: 500B-4MB) |
+| `io_http_requests_current` | Gauge | - | Concurrent requests |
+
+#### Event Loop Metrics
+
+**Source:** `service/tracing/metrics/measurers/EventLoopLagMeasurer.ts`
+
+| Metric Name | Type | Labels | Description |
+|-------------|------|--------|-------------|
+| `runtime_event_loop_lag_max_between_scrapes_seconds` | Gauge | - | Max event loop lag |
+| `runtime_event_loop_lag_percentiles_between_scrapes_seconds` | Gauge | `percentile` | Event loop lag percentiles (95, 99) |
+
+#### Default Node.js Metrics
+
+Via `collectDefaultMetrics()` from `prom-client`:
+
+- `nodejs_gc_duration_seconds` - GC duration histogram
+- `nodejs_active_handles_total` - Active handles
+- `nodejs_active_requests_total` - Active requests
+- `nodejs_heap_size_*_bytes` - Heap metrics
+- `nodejs_external_memory_bytes` - External memory
+- `nodejs_version_info` - Node.js version
+- `process_cpu_*_seconds_total` - CPU counters
+- `process_resident_memory_bytes` - RSS memory
+- `process_start_time_seconds` - Process start time
+
+### MetricsAccumulator
+
+Exported via `console.log` as JSON and collected by Splunk.
+
+**Source:** `metrics/MetricsAccumulator.ts`
+
+#### Aggregated Metrics Format
+
+Each metric includes:
+- `name` - Metric identifier
+- `count` - Number of samples
+- `mean`, `median` - Average and middle values
+- `percentile95`, `percentile99` - Tail latencies
+- `max` - Maximum value
+- `production` - Environment flag
+- Plus any custom extensions
+
+#### System Metrics
+
+| Metric Name | Properties |
+|-------------|------------|
+| `cpu` | `user` (μs), `system` (μs) |
+| `memory` | `rss`, `heapTotal`, `heapUsed`, `external`, `arrayBuffers` |
+| `httpAgent` | `sockets`, `freeSockets`, `pendingRequests` |
+| `incomingRequest` | `total`, `closed`, `aborted` |
+
+### Billing Metrics
+
+**Source:** `worker/runtime/http/middlewares/timings.ts`
+
+Exported with `__VTEX_IO_BILLING` flag for usage tracking:
+
+```json
+{
+  "__VTEX_IO_BILLING": "true",
+  "account": "...",
+  "app": "...",
+  "handler": "...",
+  "production": true,
+  "routeType": "public_route",
+  "timestamp": 1234567890,
+  "type": "process-time",
+  "value": 150,
+  "vendor": "vtex",
+  "workspace": "master"
+}
+```
+
+---
+
+## Related Documentation
+
+- [Migration Guide](./METRICS_OVERVIEW.md) - Patterns and best practices for migrating to diagnostics-based metrics
+
diff --git a/docs/METRICS_OVERVIEW.md b/docs/METRICS_OVERVIEW.md
new file mode 100644
index 000000000..c6aa1a535
--- /dev/null
+++ b/docs/METRICS_OVERVIEW.md
@@ -0,0 +1,573 @@
+# Metrics Migration Guide for VTEX IO Apps
+
+This document provides comprehensive guidance for migrating from the legacy `MetricsAccumulator` API to the new `DiagnosticsMetrics` API, including patterns, best practices, and production-validated examples.
+
+> **Looking for the complete metrics catalog?** See [METRICS_CATALOG.md](./METRICS_CATALOG.md) for a comprehensive list of all available metrics.
+
+## Table of Contents
+
+- [Why Migrate?](#why-migrate)
+- [Quick Start](#quick-start)
+- [Common Migration Patterns](#common-migration-patterns)
+- [What Doesn't Need Migration](#what-doesnt-need-migration)
+- [Additional Examples from Production Apps](#additional-examples-from-production-apps)
+- [Best Practices for Metrics Design](#best-practices-for-metrics-design)
+- [Troubleshooting](#troubleshooting)
+- [FAQ](#faq)
+
+---
+
+## Why Migrate?
+
+The new `DiagnosticsMetrics` API provides:
+
+✅ **Better Performance**: No in-memory aggregation, lower memory overhead  
+✅ **Modern Observability**: OpenTelemetry-based metrics exported to backend  
+✅ **Better Dashboards**: Attribute-based metrics for flexible querying  
+✅ **Cardinality Control**: Built-in limits to prevent metric explosion  
+✅ **Type Safety**: Full TypeScript support with clear APIs  
+
+---
+
+## Quick Start
+
+### Before (Legacy API)
+
+```typescript
+import { MetricsAccumulator } from '@vtex/api'
+
+const metrics = new MetricsAccumulator()
+
+const start = process.hrtime()
+const result = await fetchData()
+metrics.batch('fetch-data', process.hrtime(start), { success: 1 })
+```
+
+### After (New API)
+
+```typescript
+// DiagnosticsMetrics is available globally
+const { diagnosticsMetrics } = global
+
+const start = process.hrtime()
+const result = await fetchData()
+diagnosticsMetrics.recordLatency(process.hrtime(start), {
+  operation: 'fetch-data',
+  status: 'success'
+})
+diagnosticsMetrics.incrementCounter('fetch_data_total', 1, {
+  status: 'success'
+})
+```
+
+---
+
+## Common Migration Patterns
+
+### Pattern 1: Simple Latency Recording
+
+**Before:**
+```typescript
+const start = process.hrtime()
+const result = await apiCall()
+metrics.batch('api-call', process.hrtime(start))
+```
+
+**After:**
+```typescript
+const start = process.hrtime()
+const result = await apiCall()
+global.diagnosticsMetrics.recordLatency(process.hrtime(start), {
+  operation: 'api-call',
+  status: 'success'
+})
+```
+
+> 📌 **Production Example:** See this pattern in action in [render-to-string's `trackOperation()` utility](https://github.com/vtex/render-to-string/blob/master/node/utils/metrics.ts), which wraps operations with `recordLatency()`.
+
+### Pattern 2: Latency with Success/Error Tracking
+
+**Before:**
+```typescript
+const start = process.hrtime()
+try {
+  const result = await apiCall()
+  metrics.batch('api-call', process.hrtime(start), { success: 1 })
+  return result
+} catch (error) {
+  metrics.batch('api-call', process.hrtime(start), { error: 1 })
+  throw error
+}
+```
+
+**After:**
+```typescript
+const start = process.hrtime()
+try {
+  const result = await apiCall()
+  global.diagnosticsMetrics.recordLatency(process.hrtime(start), {
+    operation: 'api-call',
+    status: 'success'
+  })
+  return result
+} catch (error) {
+  global.diagnosticsMetrics.recordLatency(process.hrtime(start), {
+    operation: 'api-call',
+    status: 'error'
+  })
+  throw error
+}
+```
+
+> 📌 **Production Example:** The [render-to-string's `emitMetrics()` function](https://github.com/vtex/render-to-string/blob/master/node/utils/metrics.ts) records latency with `status: 'success'` or `status: 'error'` based on whether the operation succeeded or failed.
+
+### Pattern 3: Mixed Extensions (Numbers and Strings)
+
+**Before:**
+```typescript
+metrics.batch('http-request', elapsed, {
+  success: 1,      // Counter
+  '2xx': 1,        // Counter
+  'cache-hit': 1,  // Counter
+  region: 'us'     // Attribute
+})
+```
+
+**After:**
+```typescript
+// Record latency with attributes
+global.diagnosticsMetrics.recordLatency(elapsed, {
+  operation: 'http-request',
+  status: '2xx',
+  cache: 'hit',
+  region: 'us'
+})
+```
+
+> 📌 **Production Example:** In [render-to-string's render middleware](https://github.com/vtex/render-to-string/blob/master/node/middleware/render.ts), extra attributes like `{ template: templateName }` are passed alongside latency recordings.
+
+### Pattern 4: Cache Tracking
+
+**Before:**
+```typescript
+// Manual cache stats collection
+const stats = cache.getStats()
+console.log(`Cache hits: ${stats.hits}, misses: ${stats.misses}`)
+```
+
+**After:**
+```typescript
+// Direct API calls for cache metrics
+const stats = cache.getStats()
+global.diagnosticsMetrics.incrementCounter('cache_hits_total', stats.hits, {
+  cache: 'my-cache'
+})
+global.diagnosticsMetrics.incrementCounter('cache_misses_total', stats.misses, {
+  cache: 'my-cache'
+})
+global.diagnosticsMetrics.setGauge('cache_items_current', stats.size, {
+  cache: 'my-cache'
+})
+```
+
+> 📌 **Production Example:** The [render-to-string's `recordCacheMetric()` function](https://github.com/vtex/render-to-string/blob/master/node/utils/metrics.ts) uses `incrementCounter('cache_operations_total', 1, { cache, cache_state })` for unified cache tracking.
+
+---
+
+## What Doesn't Need Migration
+
+### HTTP Client with `metric:` Config Option
+
+**No changes needed!** The HTTP client middleware was already migrated internally.
+
+```typescript
+// This already uses DiagnosticsMetrics internally
+this.http.get(`/user/${email}/isAdmin`, {
+  metric: 'sphinx-is-admin'  // ✅ Works automatically
+})
+```
+
+> 📌 **Production Example:** See this in [render-to-string's Assets client](https://github.com/vtex/render-to-string/blob/master/node/clients/assets.ts) using `metric: 'assets-fetch'`.
+
+### GraphQL `@metric` Directive
+
+**No changes needed!** The directive was already migrated internally.
+
+```typescript
+export const resolvers = {
+  Query: {
+    @metric  // ✅ Already using DiagnosticsMetrics
+    async products(_: any, __: any, ctx: Context) {
+      return ctx.clients.catalog.getProducts()
+    }
+  }
+}
+```
+
+---
+
+## Additional Examples from Production Apps
+
+Beyond the basic migration patterns (1-4), apps with complex instrumentation needs can benefit from additional patterns. The **[render-to-string](https://github.com/vtex/render-to-string)** app demonstrates these advanced techniques in production.
+
+### Centralized Metrics Utility
+
+When your app has many operations that need instrumentation, reduce boilerplate by creating a centralized utility that combines metrics, logging, and backward compatibility.
+
+**render-to-string implementation:** [`node/utils/metrics.ts`](https://github.com/vtex/render-to-string/blob/master/node/utils/metrics.ts)
+
+```typescript
+// The trackOperation() utility wraps any operation with standardized instrumentation
+import { hrToMillis } from '@vtex/api'
+
+export interface OperationContext {
+  account: string
+  workspace: string
+  operationId: string
+  logger: {
+    info: (data: Record<string, any>) => void
+    error: (data: Record<string, any>) => void
+  }
+}
+
+export interface TrackOperationOptions {
+  name: string
+  ctx: OperationContext
+  extraAttributes?: Record<string, string>
+}
+
+function emitMetrics(
+  options: TrackOperationOptions,
+  elapsed: [number, number],
+  status: 'success' | 'error'
+): void {
+  const { name, ctx, extraAttributes = {} } = options
+  const { account, workspace, operationId, logger } = ctx
+  const timeMs = hrToMillis(elapsed)
+
+  // Legacy metrics API (backward compatibility)
+  metrics.batchMetric(name, timeMs, { account, workspace, status })
+
+  // New diagnostics metrics API (OTel-compliant)
+  global.diagnosticsMetrics?.recordLatency(elapsed, {
+    operation: name,
+    status,
+    ...extraAttributes,
+  })
+
+  // Structured logging
+  const logData = { message: name, operationId, timeMs, ...extraAttributes }
+  status === 'success' ? logger.info(logData) : logger.error({ ...logData, error: true })
+}
+
+export async function trackOperation<T>(
+  options: TrackOperationOptions,
+  fn: () => T | Promise<T>
+): Promise<T> {
+  const start = process.hrtime()
+  try {
+    const result = await fn()
+    emitMetrics(options, process.hrtime(start), 'success')
+    return result
+  } catch (error) {
+    emitMetrics(options, process.hrtime(start), 'error')
+    throw error
+  }
+}
+```
+
+**Why this works:** Reduces ~250 lines of boilerplate to ~30 lines while maintaining dual API support.
+
+### Tracking Multiple Operations in a Flow
+
+For complex middleware with multiple sub-operations, use the utility for each logical step.
+
+**render-to-string implementation:** [`node/middleware/render.ts`](https://github.com/vtex/render-to-string/blob/master/node/middleware/render.ts)
+
+```typescript
+export async function render(ctx: Context, next: () => Promise<any>) {
+  const { vtex: { account, workspace, operationId } } = ctx
+  const metricsCtx = { account, workspace, operationId, logger: vtex.logger }
+
+  // Each operation is tracked independently
+  const compiledScripts = await trackOperation(
+    { name: 'vm-script', ctx: metricsCtx },
+    () => getCompiledScripts(assetsClient, assets)
+  )
+
+  await trackOperation(
+    { name: 'vm-run-in-context', ctx: metricsCtx },
+    () => compiledScripts.forEach(script => vm.run(script))
+  )
+
+  const rendered = await trackOperation(
+    { name: 'vm-global-rendered', ctx: metricsCtx },
+    () => vm.run('global.rendered')
+  )
+
+  await next()
+}
+```
+
+### Recording Pre-Measured Metrics
+
+When timing data comes from external sources (e.g., VM sandbox, third-party libraries), record it separately.
+
+**render-to-string implementation:** [`node/middleware/render.ts`](https://github.com/vtex/render-to-string/blob/master/node/middleware/render.ts)
+
+```typescript
+// Timings captured inside VM are recorded after extraction
+const { getDataFromTree, renderToString } = renderMetrics[templateName]
+if (getDataFromTree) {
+  recordExternalMetrics(
+    { name: 'data-from-tree-ssr', ctx: metricsCtx, extraAttributes: { template: templateName } },
+    getDataFromTree
+  )
+}
+if (renderToString) {
+  recordExternalMetrics(
+    { name: 'to-string-ssr', ctx: metricsCtx, extraAttributes: { template: templateName } },
+    renderToString
+  )
+}
+```
+
+### Error Classification for Debugging
+
+Classify error types as attributes for better debugging and alerting.
+
+**render-to-string implementation:** [`node/middleware/render.ts`](https://github.com/vtex/render-to-string/blob/master/node/middleware/render.ts)
+
+```typescript
+try {
+  // ... operation
+} catch (e) {
+  // Classify error type for metrics
+  let errorType = 'unknown'
+  if (e instanceof SSRFailError) {
+    errorType = 'ssr-fail'
+  } else if (e.code === 'ETIMEDOUT') {
+    errorType = 'timeout'
+  } else if (e.code === 'ECONNREFUSED') {
+    errorType = 'connection-refused'
+  } else if (e.name === 'TimeoutError') {
+    errorType = 'vm-timeout'
+  }
+
+  global.diagnosticsMetrics?.incrementCounter('render_errors_total', 1, {
+    error_type: errorType,
+  })
+
+  throw e
+}
+```
+
+### Unified Cache Metrics
+
+Create a reusable function for consistent cache tracking across the app.
+
+**render-to-string implementation:** [`node/utils/metrics.ts`](https://github.com/vtex/render-to-string/blob/master/node/utils/metrics.ts)
+
+```typescript
+export function recordCacheMetric(
+  cacheName: string,
+  state: 'hit' | 'miss' | 'bypass'
+): void {
+  global.diagnosticsMetrics?.incrementCounter('cache_operations_total', 1, {
+    cache: cacheName,
+    cache_state: state,
+  })
+}
+```
+
+---
+
+## Best Practices for Metrics Design
+
+These best practices are based on OpenTelemetry guidelines and the DiagnosticsMetrics API design. They have been validated in production through apps like render-to-string.
+
+### 1. Use a Single Histogram with Attributes
+
+**Don't:** Create separate histograms for each operation
+```typescript
+// ❌ Bad - creates cardinality explosion
+global.diagnosticsMetrics.recordLatency(elapsed, { operation: 'fetch-user-123' })
+```
+
+**Do:** Use consistent operation names with attributes
+```typescript
+// ✅ Good - low cardinality
+global.diagnosticsMetrics.recordLatency(elapsed, { 
+  operation: 'fetch-user',
+  status: 'success'
+})
+```
+
+### 2. Maintain Backward Compatibility
+
+**Do:** Emit to both APIs during migration
+```typescript
+// ✅ Emit to both during migration
+metrics.batchMetric(name, timeMs, { account, workspace, status })
+global.diagnosticsMetrics?.recordLatency(elapsed, { operation: name, status })
+```
+
+### 3. Use Optional Chaining for Safety
+
+**Do:** Handle uninitialized state gracefully
+```typescript
+// ✅ Safe - won't crash if not initialized
+global.diagnosticsMetrics?.recordLatency(elapsed, attributes)
+```
+
+### 4. Keep Attributes Low Cardinality
+
+**Don't:**
+```typescript
+// ❌ Millions of unique values
+{ user_id: '12345', request_id: 'abc-123' }
+```
+
+**Do:**
+```typescript
+// ✅ Limited set of values
+{ endpoint: '/users', status: 'success', region: 'us-east' }
+```
+
+### 5. Limit to 5 Attributes Maximum
+
+```typescript
+// ✅ Good - 5 attributes
+global.diagnosticsMetrics.recordLatency(elapsed, {
+  operation: 'api-call',
+  status: 'success',
+  endpoint: '/users',
+  region: 'us-east',
+  cache: 'hit'
+})
+
+// ❌ Too many - extra will be dropped
+global.diagnosticsMetrics.recordLatency(elapsed, {
+  attr1: 'val1', attr2: 'val2', attr3: 'val3',
+  attr4: 'val4', attr5: 'val5', attr6: 'val6', // Dropped!
+  attr7: 'val7'  // Dropped!
+})
+```
+
+### 6. Follow Naming Conventions
+
+| Metric Type | Pattern | Example |
+|-------------|---------|---------|
+| Histogram | `{component}_{measurement}_duration_ms` | `http_client_request_duration_ms` |
+| Counter | `{component}_{event}_total` | `http_requests_total` |
+| Gauge | `{component}_{measurement}_current` | `cache_items_current` |
+
+### 7. Include Context in Logs
+
+```typescript
+// ✅ Good - traceable logs
+logger.info({
+  message: 'operation-name',
+  operationId,    // Correlation ID
+  timeMs,         // Duration
+  ...extraAttributes
+})
+```
+
+---
+
+## Troubleshooting
+
+### Problem: `diagnosticsMetrics is undefined`
+
+**Cause:** Accessing `global.diagnosticsMetrics` before service initialization.
+
+**Solution:** Add a check:
+```typescript
+if (!global.diagnosticsMetrics) {
+  console.warn('DiagnosticsMetrics not initialized')
+  return
+}
+
+global.diagnosticsMetrics.recordLatency(...)
+```
+
+Or use optional chaining:
+```typescript
+global.diagnosticsMetrics?.recordLatency(...)
+```
+
+### Problem: Metrics not appearing in dashboards
+
+**Checklist:**
+1. ✅ Verify `@vtex/api` version is up to date
+2. ✅ Check `DIAGNOSTICS_TELEMETRY_ENABLED` environment variable is set
+3. ✅ Ensure operation names are consistent (no typos)
+4. ✅ Verify attributes have low cardinality (avoid unique IDs)
+5. ✅ Check observability backend for metric ingestion
+
+### Problem: High cardinality warnings
+
+**Cause:** Too many unique attribute combinations.
+
+**Solution:** Normalize attribute values to a limited set:
+```typescript
+// ❌ Bad
+{ user_id: userId }
+
+// ✅ Good
+{ user_type: 'premium' }  // or 'standard', 'guest'
+```
+
+### Problem: More than 5 attributes warning
+
+**Solution:** Reduce to most important attributes. The library will truncate to 5.
+
+---
+
+## FAQ
+
+### Q: Do I need to migrate immediately?
+
+**A:** No. The legacy `MetricsAccumulator` API continues to work. Both APIs coexist independently. Migrate at your own pace.
+
+### Q: Can I use both APIs in the same app?
+
+**A:** Yes! Both `global.metrics` (legacy) and `global.diagnosticsMetrics` (new) are available. You can migrate gradually.
+
+### Q: What happens to my existing dashboards?
+
+**A:** Legacy metrics continue to be exported. New metrics have different names and use attributes. You'll need to update dashboards when you migrate.
+
+### Q: How do I know which metric names to use?
+
+**A:** Follow these conventions:
+- Histograms: `{component}_{measurement}_duration_ms` (e.g., `http_client_request_duration_ms`)
+- Counters: `{component}_{event}_total` (e.g., `http_requests_total`)
+- Gauges: `{component}_{measurement}_current` (e.g., `cache_items_current`)
+
+### Q: What about the `metric:` parameter in HTTP client config?
+
+**A:** It continues to work! The HTTP client was updated internally to use `DiagnosticsMetrics` while maintaining backward compatibility.
+
+### Q: Should I remove `MetricsAccumulator` imports?
+
+**A:** Not required, but recommended for new code. For existing code, migrate when you touch that code.
+
+### Q: What's the performance impact?
+
+**A:** The new API has lower overhead (<500ns per recording) and uses less memory (no in-memory aggregation).
+
+---
+
+## Additional Resources
+
+- [Metrics Catalog](./METRICS_CATALOG.md) - Complete list of all available metrics
+- [VTEX IO Documentation](https://developers.vtex.com/docs/guides/vtex-io-documentation-what-is-vtex-io)
+
+## Support
+
+If you need help with migration:
+- Check the [Troubleshooting](#troubleshooting) section
+- Review the [production examples](#additional-examples-from-production-apps)
+- Open an issue in the node-vtex-api repository