A network scanner that discovers Prometheus exporters by scanning subnets for configurable TCP ports and serves the results via HTTP for Prometheus/vmagent http_sd_configs.
Multi-port scanning : Scan for any TCP ports (node_exporter, custom exporters, etc.)
Concurrent scanning : Worker pool with configurable parallelism
Rate limiting : Prevent network storms with configurable rate limits
Reverse DNS : Resolve IPs to hostnames via PTR records with caching
http_sd compatible : Native Prometheus HTTP service discovery format
Port filtering : Filter targets by port via query parameter
Prometheus metrics : Built-in /metrics endpoint for monitoring the scanner itself
Single binary : No runtime dependencies
┌─────────────────────────────────────────────────────────────────────────────┐
│ HOW IT WORKS │
└─────────────────────────────────────────────────────────────────────────────┘
Network Subnets Scanner Consumers
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ │ │ │ │ │
│ 10.174.41.0/24 │ │ prometheus-sd- │ │ vmagent │
│ 10.174.64.0/24 │──TCP───▶ │ scanner │──HTTP──│ (Prometheus) │
│ 10.174.65.0/24 │ scan │ │ GET │ │
│ 10.174.71.0/23 │ │ :8080 │ │ http_sd_configs │
│ │ │ │ │ │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Ports scanned: Endpoints: Refresh interval:
9100 (node_exporter) /targets.json 30 seconds
8080 (livesegmenter) /health
8081 (mediamuxer) /metrics
4999, 9090
┌─────────────────────────────────────────────────────────────────────────────┐
│ SCANNER INTERNAL ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ main.go │
│ - CLI flags & environment variable parsing │
│ - Signal handling (SIGINT, SIGTERM) for graceful shutdown │
│ - Orchestrates: initial scan → HTTP server → periodic scans │
└─────────────────────────────────────────────────────────────────────────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ config.go │ │ scanner.go │ │ http.go │
│ │ │ │ │ │
│ • Load config │ │ • CIDR parsing │ │ • /targets.json │
│ • CLI flags │ │ • Worker pool │ │ • /health │
│ • Env overrides │ │ • Rate limiter │ │ • /metrics │
│ • Validation │ │ • TCP connect │ │ • Port filtering │
└───────────────────┘ │ • Result storage │ └───────────────────┘
└─────────┬─────────┘
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ dns.go │ │ metrics.go │
│ │ │ │
│ • PTR lookups │ │ • Prometheus │
│ • Result cache │ │ gauges │
│ • TTL (10 min) │ │ • Counters │
│ • Fallback: IP │ │ • Histograms │
└───────────────────┘ └───────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ SCANNING PROCESS │
└─────────────────────────────────────────────────────────────────────────────┘
Step 1: ENUMERATE HOSTS
───────────────────────────────────────────────────────────────────────────
CIDR: 10.174.71.0/23
Network: 10.174.71.0 ──▶ (skipped)
Broadcast: 10.174.72.255 ──▶ (skipped)
Usable: 10.174.71.1 → 10.174.72.254 ──▶ 510 hosts
Step 2: GENERATE SCAN JOBS
───────────────────────────────────────────────────────────────────────────
For each host × each port:
┌─────────────────────────────────────────────────────────────────────────┐
│ {10.174.71.1, 9100} {10.174.71.1, 8080} {10.174.71.1, 8081} ... │
│ {10.174.71.2, 9100} {10.174.71.2, 8080} {10.174.71.2, 8081} ... │
│ ... │
└─────────────────────────────────────────────────────────────────────────┘
Example: 1272 hosts × 5 ports = 6360 jobs
Step 3: WORKER POOL PROCESSING
───────────────────────────────────────────────────────────────────────────
Jobs Channel Worker Pool (N=64)
┌───────────────┐ ┌─────────────────────────────┐
│Job│Job│Job│...│─────────▶│ W1 │ W2 │ W3 │ ... │ W64 │
└───────────────┘ └─────────────┬───────────────┘
│
▼
Rate Limiter: 200 conn/sec
Step 4: TCP CONNECTION TEST
───────────────────────────────────────────────────────────────────────────
Scanner Target
│ │
│────── SYN ──────────────────▶│
│ │
│◀───── SYN-ACK ──────────────│ Port OPEN ✓
│ │
│────── RST ──────────────────▶│ (close)
│ │
Timeout: 2 seconds
If no response → port closed/filtered
Step 5: DNS RESOLUTION (for open ports)
───────────────────────────────────────────────────────────────────────────
IP: 10.174.71.50
┌─────────────────────────────────────────────────────────────────┐
│ 1. Check cache (TTL: 10 min) │
│ 2. If miss → PTR lookup: dig -x 10.174.71.50 │
│ 3. Result: stsgl01p1.psr-paytv.smf1.mobitv │
│ 4. If PTR fails → use IP as hostname │
└─────────────────────────────────────────────────────────────────┘
Step 6: STORE RESULTS (atomic)
───────────────────────────────────────────────────────────────────────────
type Target struct {
IP string // "10.174.71.50"
Port int // 9100
Hostname string // "stsgl01p1.psr-paytv.smf1.mobitv"
}
Results swapped atomically (mutex-protected)
No partial results visible during scan
┌─────────────────────────────────────────────────────────────────────────────┐
│ GOROUTINE ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘
Main Goroutine
┌─────────────────────┐
│ • Context management│
│ • Signal handling │
│ • Scan ticker (5min)│
└──────────┬──────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
│ Scanner │ │ HTTP Server │ │ Periodic Scan │
│ Goroutine │ │ Goroutine │ │ Goroutine │
└──────┬──────┘ └─────────────┘ └─────────────────┘
│
│ Spawns worker pool for each scan
▼
┌─────────────────────────────────────────────────────────────────┐
│ Worker Pool (64 goroutines) │
│ │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
│ │ W1 │ │ W2 │ │ W3 │ │ W4 │ │ W5 │ . . . │W64 │ │
│ └──┬─┘ └──┬─┘ └──┬─┘ └──┬─┘ └──┬─┘ └──┬─┘ │
│ │ │ │ │ │ │ │
│ └──────┴──────┴──────┴──────┴──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Results Channel │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Collector │ │
│ │ Goroutine │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Synchronization:
• WaitGroup - ensures all workers complete
• Mutex - protects result storage
• Channels - job distribution & result collection
• Context - graceful cancellation
Integration with Monitoring Stack
┌─────────────────────────────────────────────────────────────────────────────┐
│ MONITORING STACK INTEGRATION │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ Target Hosts │
│ │
│ node_exporter │◀────────────────────────────────────────┐
│ :9100 │ │
│ │ │
│ livesegmenter │◀────────────────────────────────────┐ │
│ :8080 │ │ │
│ │ │ │ Scrapes
│ mediamuxer │◀────────────────────────────────┐ │ │ metrics
│ :8081 │ │ │ │
└─────────────────┘ │ │ │
▲ │ │ │
│ TCP scan │ │ │
│ │ │ │
┌───────┴─────────┐ HTTP GET ┌────┴───┴───┴────┐
│ prometheus-sd- │ /targets.json │ │
│ scanner │◀──────────────────────────│ vmagent │
│ │ (every 30s) │ │
│ :8080 │ └────────┬────────┘
└─────────────────┘ │
│ Remote write
▼
┌─────────────────────┐
│ VictoriaMetrics │
│ │
│ vminsert:8480 │
│ vmstorage │
│ vmselect:8481 │
└──────────┬──────────┘
│
┌─────────────────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ VMAlert │ │ Grafana │ │ Icinga │
│ :8880 │ │ (dashboards)│ │ (checks) │
└──────┬──────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│Alertmanager │──────▶ Email / PagerDuty / Slack
│ :9093 │
└─────────────┘
# Build
make build
# Run locally
./bin/prometheus-sd-scanner \
-networks " 10.174.71.0/23" \
-ports " 9100,8080" \
-interval " 5m"
# Test endpoint
curl http://localhost:8080/targets.json
curl http://localhost:8080/targets.json? port=9100
curl http://localhost:8080/health
curl http://localhost:8080/metrics
Flag
Default
Description
-networks
10.174.71.0/23
Comma-separated CIDR ranges to scan
-ports
9100,8080,8081,4999,9090
Comma-separated TCP ports to scan
-interval
5m
Scan interval (requires time unit: 5m, 300s)
-workers
16
Number of concurrent scanning goroutines
-timeout
1s
TCP connection timeout
-http-port
8080
HTTP server port
-rate-limit
50
Maximum connections per second
-log-level
info
Log level: debug, info, warn, error
-skip-broadcast
true
Skip network and broadcast addresses
Environment variables override CLI flags:
export SCANNER_NETWORKS=" 10.174.71.0/23,192.168.1.0/24"
export SCANNER_PORTS=" 9100,8080"
export SCANNER_INTERVAL=" 5m"
export SCANNER_WORKERS=" 32"
export SCANNER_HTTP_PORT=" 8080"
export SCANNER_RATE_LIMIT=" 100"
export SCANNER_LOG_LEVEL=" debug"
┌────────────────────────────────────────────────────────────────────────────┐
│ PERFORMANCE GUIDELINES │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Scan Time Calculation: │
│ ───────────────────────────────────────────────────────────────────── │
│ Total jobs = hosts × ports │
│ Minimum time = Total jobs / rate-limit │
│ │
│ Example: 1272 hosts × 5 ports = 6360 jobs │
│ 6360 / 200 conn/sec = 31.8 seconds (theoretical) │
│ Actual: ~3 minutes (includes DNS, timeouts) │
│ │
├────────────────────────────────────────────────────────────────────────────┤
│ Setting │ Impact │
│ ────────────────┼──────────────────────────────────────────────────── │
│ workers ↑ │ More parallelism, higher memory/network load │
│ rate-limit ↑ │ Faster scans, risk of network congestion │
│ timeout ↓ │ Faster scans, may miss slow-responding hosts │
│ interval ↓ │ More frequent updates, higher resource usage │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Recommended Profiles: │
│ ───────────────────────────────────────────────────────────────────── │
│ Production: -workers 64 -rate-limit 200 -timeout 2s -interval 5m │
│ High-freq: -workers 128 -rate-limit 500 -timeout 1s -interval 2m │
│ Low-impact: -workers 16 -rate-limit 50 -timeout 3s -interval 10m │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Endpoint
Description
/targets.json
All discovered targets in http_sd format
/targets.json?port=9100
Filter by single port
/targets.json?port=9100,8080
Filter by multiple ports
/health
Health check (returns 503 before first scan completes)
/metrics
Prometheus metrics
/
API information
{
"status" : " healthy" ,
"last_scan" : " 2026-02-03T14:32:00Z" ,
"last_scan_ago" : " 2m30s" ,
"last_scan_duration" : " 3m14s" ,
"scan_interval" : " 5m0s" ,
"targets_count" : 218
}
The /targets.json endpoint returns Prometheus http_sd compatible JSON:
[
{
"targets" : [" stsgl01p1.psr-paytv.smf1.mobitv:9100" ],
"labels" : {
"__meta_sd_hostname" : " stsgl01p1.psr-paytv.smf1.mobitv" ,
"__meta_sd_ip" : " 10.174.71.50" ,
"__meta_sd_port" : " 9100"
}
}
]
Label
Description
__meta_sd_hostname
FQDN from reverse DNS (or IP if no PTR record)
__meta_sd_ip
Raw IP address
__meta_sd_port
Port number as string
Prometheus/vmagent Configuration
scrape_configs :
- job_name : ' node_exporter'
http_sd_configs :
- url : ' http://scanner-host:8080/targets.json?port=9100'
refresh_interval : 30s
relabel_configs :
# Extract node_type from hostname prefix
- source_labels : [__meta_sd_hostname]
regex : ' ([a-zA-Z]+)\d+.*'
target_label : node_type
replacement : ' ${1}'
- job_name : ' livesegmenter'
http_sd_configs :
- url : ' http://scanner-host:8080/targets.json?port=8080'
refresh_interval : 30s
# Build for current platform
make build
# Build for Linux (deployment)
make build-linux
# Build for all platforms
make build-all
# Run tests
make test
# Copy binary
sudo cp bin/prometheus-sd-scanner-linux-amd64 /opt/net-discovery/prometheus-sd-scanner
sudo chmod +x /opt/net-discovery/prometheus-sd-scanner
# Copy service file
sudo cp prometheus-sd-scanner.service /etc/systemd/system/
# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus-sd-scanner
# Check status
sudo systemctl status prometheus-sd-scanner
journalctl -u prometheus-sd-scanner -f
make deploy # Deploys to invim01p2
The scanner exposes the following Prometheus metrics at /metrics:
Metric
Type
Description
scanner_scan_duration_seconds
Histogram
Duration of network scans
scanner_scans_total
Counter
Total number of scans performed
scanner_scan_errors_total
Counter
Total number of scan errors
Metric
Type
Description
scanner_targets_total{port}
Gauge
Discovered targets by port
scanner_targets_discovered
Gauge
Total unique targets discovered
Metric
Type
Description
scanner_connection_attempts_total{port}
Counter
TCP connection attempts
scanner_connection_successes_total{port}
Counter
Successful TCP connections
scanner_connection_timeouts_total{port}
Counter
TCP connection timeouts
Metric
Type
Description
scanner_dns_lookups_total
Counter
DNS reverse lookups
scanner_dns_cache_hits_total
Counter
DNS cache hits
scanner_dns_lookup_errors_total
Counter
DNS lookup errors
Metric
Type
Description
scanner_http_requests_total{path,status}
Counter
HTTP requests by path and status
scanner_http_request_duration_seconds{path}
Histogram
Request duration by endpoint
Issue: Health endpoint returns 503
─────────────────────────────────────────────────────────────────────────────
Cause: Initial scan not yet complete
Fix: Wait for first scan to finish (check logs)
Issue: Low target count
─────────────────────────────────────────────────────────────────────────────
Causes: Firewall blocking scanner, exporters not running, wrong subnet
Debug: curl http://scanner:8080/metrics | grep connection_timeouts
Issue: Scan taking too long
─────────────────────────────────────────────────────────────────────────────
Causes: Low workers/rate-limit, network latency, slow DNS
Fix: Increase -workers and -rate-limit
Issue: Missing hostnames (showing IPs)
─────────────────────────────────────────────────────────────────────────────
Cause: PTR records not configured for those IPs
Debug: dig -x <IP_ADDRESS>
# Check service status
systemctl status prometheus-sd-scanner
# View logs
journalctl -u prometheus-sd-scanner -f
# Test endpoints
curl -s http://localhost:8080/health | jq .
curl -s http://localhost:8080/targets.json | jq length
curl -s http://localhost:8080/metrics | grep scanner_
prometheus-sd-scanner/
├── main.go # Entry point, lifecycle management
├── config.go # Configuration loading (CLI + env)
├── scanner.go # Network scanning logic, worker pool
├── dns.go # Reverse DNS resolution with caching
├── http.go # HTTP server and endpoints
├── metrics.go # Prometheus metrics definitions
├── logger.go # Structured logging
├── go.mod
├── go.sum
├── Makefile # Build targets
├── prometheus-sd-scanner.service # Systemd unit file
└── README.md
MIT