Elixir Stress — BEAM + OpenTelemetry Stress Test

A BEAM VM stress testing tool with full observability: Phoenix LiveDashboard, Grafana (Loki, Tempo, Mimir), OpenTelemetry distributed tracing, structured logging, and Prometheus metrics. Exercises memory, CPU, disk, processes, ETS, message passing, ports, GC, and the OTel pipeline itself.

Prerequisites

Elixir ~> 1.19
Erlang/OTP
Docker (Desktop on macOS, Engine on Linux) for the Grafana LGTM stack

Automated Setup (Recommended)

Setup scripts install all prerequisites, start Docker/Grafana, compile the app, and open the browser — one command to go from zero to running.

macOS

./setupMac.sh

Installs (if missing): Homebrew → Erlang → Elixir → Docker Desktop check → Zscaler TLS proxy fix → mix deps → Grafana LGTM → Elixir app.

Linux (Ubuntu/Debian, Fedora/RHEL/CentOS)

./setupLinux.sh

Installs (if missing): build tools → Erlang → Elixir (via apt/dnf/yum or asdf) → Docker Engine → Docker Compose plugin → Grafana LGTM → Elixir app.

Script Flags

Flag	What it does
(no flag)	Full install + start everything
`--start`	Skip installs, just start Docker + app
`--stop`	Stop Elixir app + docker compose down
`--status`	Show what's running and all URLs

Example:

./setupMac.sh --status    # Check if everything is up
./setupMac.sh --stop      # Shut it all down
./setupLinux.sh --start   # Quick restart (skip installs)

Manual Setup

If you prefer to set things up step by step:

1. Install dependencies

cd elixir_stress
mix deps.get
mix compile

Note: If behind a corporate TLS proxy (e.g. Zscaler) on macOS:

security find-certificate -a -p /Library/Keychains/System.keychain \
  /System/Library/Keychains/SystemRootCertificates.keychain > /tmp/all_cas.pem

HEX_CACERTS_PATH=/tmp/all_cas.pem mix deps.get

2. Start the Grafana LGTM stack (Docker)

Make sure Docker is running, then:

docker compose up -d

This starts a single container (grafana/otel-lgtm) with:

Grafana on port 3404 (dashboards UI)
Mimir on port 9090 (Prometheus-compatible metrics storage)
Tempo on port 4418 (distributed trace storage)
Loki on port 3100 (structured log storage)
OTel Collector on ports 4317 (gRPC) and 4318 (HTTP) — receives traces/metrics/logs from the app and scrapes /metrics every 5s

Grafana dashboards are automatically provisioned — they load on container start via volume mounts, no manual import needed.

Wait ~10 seconds for the container to become healthy:

docker compose ps   # should show "healthy"

3. Start the Elixir application

mix run --no-halt

This starts three HTTP services:

Port	Service	URL
4001	Main web app	http://localhost:4001
4002	Phoenix LiveDashboard	http://localhost:4002/dashboard
4003	Worker service (Tier 5)	http://localhost:4003/work/*

4. Open the dashboards

Grafana dashboards are pre-loaded and available at:

http://localhost:3404/d/elixir-stress-test — BEAM resources, worker activity, traces, logs, OTel pipeline stress
http://localhost:3404/d/elixir-app-metrics — application-level metrics (latency histograms, throughput, gauges)

Login: admin / admin

5. Run a stress test

Go to http://localhost:4001, select a duration, and click Run Full Stress Test.

Then watch all the panels in Grafana light up in real time.

Grafana Dashboard Sections

The dashboard at http://localhost:3404/d/elixir-stress-test/elixir-stress-test has 6 sections:

BEAM Resources

Memory Usage — total, process, binary, ETS, atom, system memory over time
Run Queue Lengths — scheduler pressure (CPU, IO, total)
Process / Port / Atom Counts — live system resource gauges

Worker Activity

Worker Cycles (rate/s) — how fast each worker type is completing cycles
Worker Start/Stop Rate — worker lifecycle events
Stress Runs — total runs started/completed (stat panel)
Worker Cycle Totals — cumulative cycles per worker (bar gauge)

Distributed Traces (Tempo)

Recent Traces — table of traces from the elixir_stress service
Click any trace to see the full waterfall with nested child spans, span events, and cross-service calls

Structured Logs (Loki)

Application Logs — live stream of structured logs with trace correlation
Log Volume by Severity — stacked time series of DEBUG/INFO/WARNING/ERROR
Error Logs — filtered view of ERROR-level logs only

OTel Pipeline Stress (Tier 4)

OTel Stress Worker Cycles — rates for span_flood, high_cardinality, large_payloads, metric_flood, log_flood
Metric Flood Events/s — telemetry event throughput by endpoint

Distributed Tracing (Tier 5)

Distributed Call Cycles — rate of cross-service HTTP calls
Worker Service Logs — logs from the downstream worker service

Exploring Traces in Grafana

Go to the dashboard or click Explore in the sidebar
Select Tempo as the datasource
Search by service name elixir_stress
Click a trace to open the waterfall view — you'll see:
- Parent span stress_test.run containing all worker spans
- Per-worker spans like stress.worker.memory_hog with child spans per cycle
- Span events like allocation, gc_forced, processes_spawned
- Cross-service spans from distributed.http_call to worker_service.compute/store/transform

Exploring Logs in Grafana

Click Explore → select Loki datasource
Query: {service_name="elixir_stress"}
Each log entry includes trace_id and span_id — click the trace link to jump directly to the correlated trace in Tempo

Architecture

                        +------------------+
                        |  Grafana (:3404) |
                        |  Dashboards UI   |
                        +--------+---------+
                                 |
                    +------------+------------+
                    |            |            |
               +----+----+ +----+----+ +-----+-----+
               |  Mimir  | |  Tempo  | |   Loki    |
               | metrics | | traces  | |   logs    |
               | (:9090) | | (:4418) | |  (:3100)  |
               +----+----+ +----+----+ +-----+-----+
                    |            |            |
               +----+------------+------------+----+
               |        OTel Collector             |
               |  OTLP (:4317/:4318)               |
               |  Prometheus scrape (:4001/metrics) |
               +---+-------------------+------------+
                   |                   |
          OTLP traces/logs    Prometheus scrape
                   |                   |
    +--------------+---+        +------+--------+
    | Elixir App       |        | /metrics      |
    | (:4001) main     +--------+ endpoint      |
    | (:4002) dashboard|        | (Prom format) |
    | (:4003) worker   |        +---------------+
    +------------------+

Services Summary

Port	Service	Purpose
4001	Elixir Plug.Cowboy	Web control panel + `/metrics` Prometheus endpoint
4002	Phoenix Endpoint	LiveDashboard at `/dashboard`
4003	Worker Service	Downstream microservice for distributed tracing (Tier 5)
3404	Grafana (Docker)	Dashboards, Explore UI
4317	OTLP gRPC (Docker)	Receives traces from app
4318	OTLP HTTP (Docker)	Receives traces/metrics/logs from app
9090	Mimir (Docker)	Prometheus-compatible metrics DB
3100	Loki (Docker)	Structured log storage

Stress Test Workers

BEAM Stress (Tier 1 — Deep Tracing)

All workers have per-cycle child spans, span events, error recording, and structured logging.

Worker	Count	What it does
Memory hog	10	Hold 50-200MB each with sawtooth pattern, allocating lists/binaries/maps/nested structures
CPU saturate	2x schedulers	Pin all schedulers: fibonacci, sorting, SHA-256 chains, matrix multiply, Ackermann, permutations
Disk thrash	4	Write/read/hash/delete 20-100MB files per cycle
Process explosion	2	Spawn 2,000-10,000 processes per cycle, maintain up to 20,000 alive
ETS bloat	2	50,000 row inserts, full table scans, concurrent read/write on shared tables
GC torture	4	Allocate massive garbage + forced GC, spawn 50 sub-processes per cycle
Binary abuse	4	2-8MB binaries with sub-binary slices across spawned processes
Message queue	2	10,000 messages flooding slow consumers (100ms per message)
Port churn	2	Open/pump/close 20-60 OS ports per cycle
Atom growth	1	500-1,000 unique atoms per batch (never GC'd)

OTel Pipeline Stress (Tier 4)

Worker	Count	What it stresses
Span flood	2	Thousands of micro-spans per second — tests batch processor and Tempo ingestion
High cardinality	2	Unique attribute values per span — tests Tempo indexing
Large payloads	1	Spans with massive event data — tests OTLP exporter throughput
Metric flood	1	Thousands of telemetry events per second — tests Prometheus scrape and Mimir writes
Log flood	1	Structured logs flooding Loki via OTLP HTTP

Distributed Tracing (Tier 5)

Worker	Count	What it does
Distributed call	2	HTTP calls to worker service (:4003) with W3C `traceparent` header propagation

The worker service exposes three endpoints, each with nested child spans:

POST /work/compute — fibonacci, sorting, SHA-256 hashing
POST /work/store — ETS insert, scan, cleanup
POST /work/transform — data generation, filtering, aggregation

Project Structure

elixir_stress/
├── config/
│   └── config.exs                 # Phoenix endpoint + OpenTelemetry config
├── grafana/
│   ├── dashboards/                # Provisioned dashboard JSON (auto-loaded by Grafana)
│   │   ├── stress-test-dashboard.json
│   │   └── app-metrics-dashboard.json
│   ├── provisioning/
│   │   └── dashboards/
│   │       └── dashboards.yml     # Grafana provisioning config
│   ├── stress-test-dashboard.json # API-format JSON (for manual import)
│   └── app-metrics-dashboard.json
├── lib/
│   ├── elixir_stress.ex
│   └── elixir_stress/
│       ├── application.ex         # OTP app (starts all services)
│       ├── router.ex              # Main web routes (:4001)
│       ├── endpoint.ex            # Phoenix endpoint for dashboard (:4002)
│       ├── dashboard_router.ex    # LiveDashboard route config
│       ├── telemetry.ex           # Telemetry metrics definitions
│       ├── prom_metrics.ex        # Prometheus metrics exporter
│       ├── otel_logger.ex         # Structured logger -> OTLP HTTP -> Loki
│       ├── otel_stress.ex         # Tier 4: OTel pipeline stress workers
│       ├── worker_service.ex      # Tier 5: Downstream microservice (:4003)
│       └── stress.ex              # Main stress test suite with deep tracing
├── docker-compose.yml             # Grafana LGTM stack (with dashboard provisioning)
├── otel-collector-config.yml      # OTel Collector: receivers, exporters, pipelines
├── setupMac.sh                    # One-command setup for macOS
├── setupLinux.sh                  # One-command setup for Linux
├── mix.exs
└── README.md

Stopping

# Using setup scripts (recommended)
./setupMac.sh --stop     # macOS
./setupLinux.sh --stop   # Linux

# Or manually
Ctrl+C (twice)           # Stop the Elixir app
docker compose down      # Stop Grafana/LGTM

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
grafana		grafana
lib		lib
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
mix.exs		mix.exs
mix.lock		mix.lock
otel-collector-config.yml		otel-collector-config.yml
setupLinux.sh		setupLinux.sh
setupMac.sh		setupMac.sh

Folders and files

Latest commit

History

Repository files navigation

Elixir Stress — BEAM + OpenTelemetry Stress Test

Prerequisites

Automated Setup (Recommended)

macOS

Linux (Ubuntu/Debian, Fedora/RHEL/CentOS)

Script Flags

Manual Setup

1. Install dependencies

2. Start the Grafana LGTM stack (Docker)

3. Start the Elixir application

4. Open the dashboards

5. Run a stress test

Grafana Dashboard Sections

BEAM Resources

Worker Activity

Distributed Traces (Tempo)

Structured Logs (Loki)

OTel Pipeline Stress (Tier 4)

Distributed Tracing (Tier 5)

Exploring Traces in Grafana

Exploring Logs in Grafana

Architecture

Services Summary

Stress Test Workers

BEAM Stress (Tier 1 — Deep Tracing)

OTel Pipeline Stress (Tier 4)

Distributed Tracing (Tier 5)

Project Structure

Stopping

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages