A BEAM VM stress testing tool with full observability: Phoenix LiveDashboard, Grafana (Loki, Tempo, Mimir), OpenTelemetry distributed tracing, structured logging, and Prometheus metrics. Exercises memory, CPU, disk, processes, ETS, message passing, ports, GC, and the OTel pipeline itself.
- Elixir ~> 1.19
- Erlang/OTP
- Docker (Desktop on macOS, Engine on Linux) for the Grafana LGTM stack
Setup scripts install all prerequisites, start Docker/Grafana, compile the app, and open the browser — one command to go from zero to running.
./setupMac.shInstalls (if missing): Homebrew → Erlang → Elixir → Docker Desktop check → Zscaler TLS proxy fix → mix deps → Grafana LGTM → Elixir app.
./setupLinux.shInstalls (if missing): build tools → Erlang → Elixir (via apt/dnf/yum or asdf) → Docker Engine → Docker Compose plugin → Grafana LGTM → Elixir app.
| Flag | What it does |
|---|---|
| (no flag) | Full install + start everything |
--start |
Skip installs, just start Docker + app |
--stop |
Stop Elixir app + docker compose down |
--status |
Show what's running and all URLs |
Example:
./setupMac.sh --status # Check if everything is up
./setupMac.sh --stop # Shut it all down
./setupLinux.sh --start # Quick restart (skip installs)If you prefer to set things up step by step:
cd elixir_stress
mix deps.get
mix compileNote: If behind a corporate TLS proxy (e.g. Zscaler) on macOS:
security find-certificate -a -p /Library/Keychains/System.keychain \
/System/Library/Keychains/SystemRootCertificates.keychain > /tmp/all_cas.pem
HEX_CACERTS_PATH=/tmp/all_cas.pem mix deps.getMake sure Docker is running, then:
docker compose up -dThis starts a single container (grafana/otel-lgtm) with:
- Grafana on port 3404 (dashboards UI)
- Mimir on port 9090 (Prometheus-compatible metrics storage)
- Tempo on port 4418 (distributed trace storage)
- Loki on port 3100 (structured log storage)
- OTel Collector on ports 4317 (gRPC) and 4318 (HTTP) — receives traces/metrics/logs from the app and scrapes
/metricsevery 5s
Grafana dashboards are automatically provisioned — they load on container start via volume mounts, no manual import needed.
Wait ~10 seconds for the container to become healthy:
docker compose ps # should show "healthy"mix run --no-haltThis starts three HTTP services:
| Port | Service | URL |
|---|---|---|
| 4001 | Main web app | http://localhost:4001 |
| 4002 | Phoenix LiveDashboard | http://localhost:4002/dashboard |
| 4003 | Worker service (Tier 5) | http://localhost:4003/work/* |
Grafana dashboards are pre-loaded and available at:
- http://localhost:3404/d/elixir-stress-test — BEAM resources, worker activity, traces, logs, OTel pipeline stress
- http://localhost:3404/d/elixir-app-metrics — application-level metrics (latency histograms, throughput, gauges)
Login: admin / admin
Go to http://localhost:4001, select a duration, and click Run Full Stress Test.
Then watch all the panels in Grafana light up in real time.
The dashboard at http://localhost:3404/d/elixir-stress-test/elixir-stress-test has 6 sections:
- Memory Usage — total, process, binary, ETS, atom, system memory over time
- Run Queue Lengths — scheduler pressure (CPU, IO, total)
- Process / Port / Atom Counts — live system resource gauges
- Worker Cycles (rate/s) — how fast each worker type is completing cycles
- Worker Start/Stop Rate — worker lifecycle events
- Stress Runs — total runs started/completed (stat panel)
- Worker Cycle Totals — cumulative cycles per worker (bar gauge)
- Recent Traces — table of traces from the
elixir_stressservice - Click any trace to see the full waterfall with nested child spans, span events, and cross-service calls
- Application Logs — live stream of structured logs with trace correlation
- Log Volume by Severity — stacked time series of DEBUG/INFO/WARNING/ERROR
- Error Logs — filtered view of ERROR-level logs only
- OTel Stress Worker Cycles — rates for span_flood, high_cardinality, large_payloads, metric_flood, log_flood
- Metric Flood Events/s — telemetry event throughput by endpoint
- Distributed Call Cycles — rate of cross-service HTTP calls
- Worker Service Logs — logs from the downstream worker service
- Go to the dashboard or click Explore in the sidebar
- Select Tempo as the datasource
- Search by service name
elixir_stress - Click a trace to open the waterfall view — you'll see:
- Parent span
stress_test.runcontaining all worker spans - Per-worker spans like
stress.worker.memory_hogwith child spans per cycle - Span events like
allocation,gc_forced,processes_spawned - Cross-service spans from
distributed.http_calltoworker_service.compute/store/transform
- Parent span
- Click Explore → select Loki datasource
- Query:
{service_name="elixir_stress"} - Each log entry includes
trace_idandspan_id— click the trace link to jump directly to the correlated trace in Tempo
+------------------+
| Grafana (:3404) |
| Dashboards UI |
+--------+---------+
|
+------------+------------+
| | |
+----+----+ +----+----+ +-----+-----+
| Mimir | | Tempo | | Loki |
| metrics | | traces | | logs |
| (:9090) | | (:4418) | | (:3100) |
+----+----+ +----+----+ +-----+-----+
| | |
+----+------------+------------+----+
| OTel Collector |
| OTLP (:4317/:4318) |
| Prometheus scrape (:4001/metrics) |
+---+-------------------+------------+
| |
OTLP traces/logs Prometheus scrape
| |
+--------------+---+ +------+--------+
| Elixir App | | /metrics |
| (:4001) main +--------+ endpoint |
| (:4002) dashboard| | (Prom format) |
| (:4003) worker | +---------------+
+------------------+
| Port | Service | Purpose |
|---|---|---|
| 4001 | Elixir Plug.Cowboy | Web control panel + /metrics Prometheus endpoint |
| 4002 | Phoenix Endpoint | LiveDashboard at /dashboard |
| 4003 | Worker Service | Downstream microservice for distributed tracing (Tier 5) |
| 3404 | Grafana (Docker) | Dashboards, Explore UI |
| 4317 | OTLP gRPC (Docker) | Receives traces from app |
| 4318 | OTLP HTTP (Docker) | Receives traces/metrics/logs from app |
| 9090 | Mimir (Docker) | Prometheus-compatible metrics DB |
| 3100 | Loki (Docker) | Structured log storage |
All workers have per-cycle child spans, span events, error recording, and structured logging.
| Worker | Count | What it does |
|---|---|---|
| Memory hog | 10 | Hold 50-200MB each with sawtooth pattern, allocating lists/binaries/maps/nested structures |
| CPU saturate | 2x schedulers | Pin all schedulers: fibonacci, sorting, SHA-256 chains, matrix multiply, Ackermann, permutations |
| Disk thrash | 4 | Write/read/hash/delete 20-100MB files per cycle |
| Process explosion | 2 | Spawn 2,000-10,000 processes per cycle, maintain up to 20,000 alive |
| ETS bloat | 2 | 50,000 row inserts, full table scans, concurrent read/write on shared tables |
| GC torture | 4 | Allocate massive garbage + forced GC, spawn 50 sub-processes per cycle |
| Binary abuse | 4 | 2-8MB binaries with sub-binary slices across spawned processes |
| Message queue | 2 | 10,000 messages flooding slow consumers (100ms per message) |
| Port churn | 2 | Open/pump/close 20-60 OS ports per cycle |
| Atom growth | 1 | 500-1,000 unique atoms per batch (never GC'd) |
| Worker | Count | What it stresses |
|---|---|---|
| Span flood | 2 | Thousands of micro-spans per second — tests batch processor and Tempo ingestion |
| High cardinality | 2 | Unique attribute values per span — tests Tempo indexing |
| Large payloads | 1 | Spans with massive event data — tests OTLP exporter throughput |
| Metric flood | 1 | Thousands of telemetry events per second — tests Prometheus scrape and Mimir writes |
| Log flood | 1 | Structured logs flooding Loki via OTLP HTTP |
| Worker | Count | What it does |
|---|---|---|
| Distributed call | 2 | HTTP calls to worker service (:4003) with W3C traceparent header propagation |
The worker service exposes three endpoints, each with nested child spans:
POST /work/compute— fibonacci, sorting, SHA-256 hashingPOST /work/store— ETS insert, scan, cleanupPOST /work/transform— data generation, filtering, aggregation
elixir_stress/
├── config/
│ └── config.exs # Phoenix endpoint + OpenTelemetry config
├── grafana/
│ ├── dashboards/ # Provisioned dashboard JSON (auto-loaded by Grafana)
│ │ ├── stress-test-dashboard.json
│ │ └── app-metrics-dashboard.json
│ ├── provisioning/
│ │ └── dashboards/
│ │ └── dashboards.yml # Grafana provisioning config
│ ├── stress-test-dashboard.json # API-format JSON (for manual import)
│ └── app-metrics-dashboard.json
├── lib/
│ ├── elixir_stress.ex
│ └── elixir_stress/
│ ├── application.ex # OTP app (starts all services)
│ ├── router.ex # Main web routes (:4001)
│ ├── endpoint.ex # Phoenix endpoint for dashboard (:4002)
│ ├── dashboard_router.ex # LiveDashboard route config
│ ├── telemetry.ex # Telemetry metrics definitions
│ ├── prom_metrics.ex # Prometheus metrics exporter
│ ├── otel_logger.ex # Structured logger -> OTLP HTTP -> Loki
│ ├── otel_stress.ex # Tier 4: OTel pipeline stress workers
│ ├── worker_service.ex # Tier 5: Downstream microservice (:4003)
│ └── stress.ex # Main stress test suite with deep tracing
├── docker-compose.yml # Grafana LGTM stack (with dashboard provisioning)
├── otel-collector-config.yml # OTel Collector: receivers, exporters, pipelines
├── setupMac.sh # One-command setup for macOS
├── setupLinux.sh # One-command setup for Linux
├── mix.exs
└── README.md
# Using setup scripts (recommended)
./setupMac.sh --stop # macOS
./setupLinux.sh --stop # Linux
# Or manually
Ctrl+C (twice) # Stop the Elixir app
docker compose down # Stop Grafana/LGTM