-
Notifications
You must be signed in to change notification settings - Fork 2
monitoring
Pre-Alpha. This page describes behavior that may change.
Ze gives you three different ways to watch a running daemon: an auto-refreshing peer dashboard for the operator at the keyboard, a live event stream for the engineer chasing a bug, and a Prometheus endpoint for everything else. They all read the same in-process state, so a number you see in the dashboard matches a metric you scrape from Prometheus.
ze cli monitor bgpAuto-refreshing every two seconds. Sortable, color-coded peer table with router identity, state, uptime, and update rates. Navigate with j and k, sort with s or S, hit Enter to drop into a peer's detail page, and Esc to exit.
This is the screen you leave open on a side monitor when you are about to touch something risky. It is also the fastest way to confirm at a glance that a session you just brought up is actually moving messages.
ze cli monitor eventThe raw firehose. Filter it before it overwhelms you.
ze cli monitor event peer upstream event update direction received
ze cli monitor event event state # Just up and downThe filters compose. peer <selector> narrows to one peer, event <type>[,<type>] selects which event types to include, and direction received or direction sent narrows to one side of the conversation. There is no exclude capability; you specify which types to include via event. The recognised event types are update, open, notification, keepalive, refresh, state, negotiated, eor, and rpki.
For scripts, pipe the stream through | json and parse it. Every event uses the same envelope: a peer block, a message block with id, direction, and type, and the per-event payload (update carries next-hop, as-path, local-preference, and the per-family NLRI lists; state carries the new FSM state).
Ze exposes a Prometheus endpoint when you set telemetry { prometheus { ... } } in the config. Metrics refresh every ten seconds.
The metrics you actually need most of the time are these.
| Metric | Why you care |
|---|---|
ze_peer_state{peer} |
3 means Established. Anything else is a graphable problem. |
ze_peer_messages_received_total{peer,type} |
Update rate, broken down by message type. |
ze_peer_messages_sent_total{peer,type} |
Same for outbound. |
ze_bgp_prefix_count{peer,family} |
Current per-family prefix count for a peer. |
ze_bgp_prefix_ratio{peer,family} |
count / maximum. Alert above 0.9. |
ze_bgp_prefix_warning_exceeded{peer,family} |
1 if the warning threshold has been crossed. |
ze_bgp_pool_used_ratio |
Forwarding pool utilisation. Anything above 0.8 means you are close to congestion teardown. |
ze_uptime_seconds |
The reactor's uptime. |
ze_info{version,router_id,local_as} |
Tag for joins. |
There are also histograms for connection timing (ze_peer_dial_seconds, ze_peer_connect_attempt_seconds, ze_peer_backoff_seconds), per-peer overflow counters, and prefix-limit teardown counters. The full list lives in the in-tree monitoring guide.
Plugins are first-class consumers of the same event stream. A plugin binding under a peer declares which events it wants:
peer upstream {
process my-plugin {
receive [ update state ]
}
}
The plugin gets each event through its OnEvent callback. The set of recognised event types is the same as the CLI filter list, plus any extra types a plugin registers itself.
- Show commands for the read-only snapshot view.
- Plugins for the event-bus consumer model.
- Logging for log levels and backends.
Adapted from main/docs/guide/monitoring.md.
Unreviewed draft. This wiki was authored in bulk and has not been reviewed. File corrections on the issue tracker.
- Overview
- YANG Model
- Editor Workflow
- Archive and Rollback
- System
- Interfaces
- BFD
- FIB
- Firewall
- Traffic Control
- L2TP/PPP
- VPP Data Plane
- RPKI
- TACACS+ AAA
- Fleet
- BGP
- Starting and Stopping
- Show Commands
- Monitoring
- Logging
- Operational Reports
- Healthcheck
- MRT Analysis
- Upgrade and Restart
- Storage
- Policy
- Core
- Resilience
- Validation
- Capabilities
- Address Families
- Protocol
- Subsystems
- Infrastructure
- Route Server at an IXP
- Transit Edge with RPKI
- Public Looking Glass
- ExaBGP Migration Walkthrough
- FlowSpec Injection
- Chaos-Tested Peering
- AS Path Topology