From cfb17641ec525c0edbce588c5166c9c83072cba7 Mon Sep 17 00:00:00 2001 From: Jenny Date: Mon, 20 Apr 2026 17:34:32 -0700 Subject: [PATCH 1/4] Update load-testing skill: relax M2M OAuth requirement, shrink test matrix, add docs draft MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - M2M OAuth no longer required for short tests (<1hr) — U2M from `databricks auth login` works automatically - Shrink recommended test matrix from 8 to 6 configs (Medium w2/w3/w4, Large w6/w8/w10) - Update estimated duration: 6 apps ~45min with defaults, under U2M token lifetime - Example commands: quick test uses U2M, overnight/multi-run examples keep M2M - Add draft docs page for Databricks docs (agent-load-testing/docs/load-test-apps.md) Co-Authored-By: Claude Opus 4.6 --- .claude/skills/load-testing/SKILL.md | 52 ++- .../.claude/skills/load-testing/SKILL.md | 52 ++- .../.claude/skills/load-testing/SKILL.md | 52 ++- agent-load-testing/docs/load-test-apps.md | 411 ++++++++++++++++++ .../.claude/skills/load-testing/SKILL.md | 52 ++- .../.claude/skills/load-testing/SKILL.md | 52 ++- .../.claude/skills/load-testing/SKILL.md | 52 ++- .../.claude/skills/load-testing/SKILL.md | 52 ++- 8 files changed, 649 insertions(+), 126 deletions(-) create mode 100644 agent-load-testing/docs/load-test-apps.md diff --git a/.claude/skills/load-testing/SKILL.md b/.claude/skills/load-testing/SKILL.md index c0538bd9..7c1cfe30 100644 --- a/.claude/skills/load-testing/SKILL.md +++ b/.claude/skills/load-testing/SKILL.md @@ -14,8 +14,10 @@ Before beginning, use the `AskUserQuestion` tool to collect the following from t 1. **Do they already have deployed apps to test, or do they need to set up new apps?** 2. **Do they want to mock LLM calls?** Mocking isolates infrastructure throughput from LLM latency — useful for capacity planning. Testing without mocks measures end-to-end performance. 3. **What compute sizes do they want to test?** (Medium, Large, or both) -4. **How many worker configurations do they want to test?** (e.g., 2, 4, 6, 8 workers) -5. **Do they have M2M OAuth credentials (service principal client_id/client_secret)?** — Recommended for tests longer than ~30 minutes. If not, guide them to create one. +4. **How many worker configurations do they want to test?** (e.g., 2, 3, 4 for Medium; 6, 8, 10 for Large) +5. **How long do they expect the test to run?** + - **Under ~1 hour** (e.g., 6 apps with defaults ≈ 45 min): Just run `databricks auth login` beforehand. The scripts automatically pick up the U2M token via the Databricks CLI — no extra setup needed. + - **Over ~1 hour** (e.g., large matrix, high max-users, multiple runs): Use M2M OAuth with a service principal because U2M tokens can expire mid-run. If they don't have one, guide them to create a service principal. 6. **What is their `DATABRICKS_HOST`?** (workspace URL) --- @@ -47,7 +49,7 @@ Create a `load-test-scripts/` directory in the project with the following files. - Sends `POST /invocations` with `{"input": [...], "stream": true}` to the app - Parses SSE stream (`data: {json}` lines) and counts chunks until `data: [DONE]` - Tracks **TTFT** (time to first `data:` line) as a custom Locust metric -- Uses M2M OAuth token exchange (`client_credentials` grant to `{host}/oidc/v1/token`) with auto-refresh +- Authenticates via U2M (default, from `databricks auth login`) or M2M OAuth (`client_credentials` grant to `{host}/oidc/v1/token`) with auto-refresh - Implements `StepRampShape` — ramps users from `step_size` to `max_users`, holding each level for `step_duration` seconds **`run_load_test.py`** — CLI orchestrator that: @@ -129,13 +131,13 @@ Deploy multiple Databricks Apps with varying compute sizes and worker counts. | Compute Size | Workers | App Name | |-------------|---------|----------| | Medium | 2 | `-medium-w2` | +| Medium | 3 | `-medium-w3` | | Medium | 4 | `-medium-w4` | -| Medium | 6 | `-medium-w6` | -| Medium | 8 | `-medium-w8` | | Large | 6 | `-large-w6` | | Large | 8 | `-large-w8` | | Large | 10 | `-large-w10` | -| Large | 12 | `-large-w12` | + +This 6-config matrix covers the sweet spot identified from prior testing. If the user wants broader coverage, they can add more configurations (e.g., Medium w6/w8, Large w12). ### Configuring Compute Size @@ -199,9 +201,12 @@ databricks apps get --output json | jq '{app_status, compute_status, ## Step 4: Run Load Tests -### Authentication — M2M OAuth (Required for Long Tests) +### Authentication + +The load testing scripts authenticate to your Databricks App automatically: -Load tests can run for hours. **U2M OAuth tokens expire** and break your test mid-run. Use M2M (machine-to-machine) OAuth with a service principal instead. +- **Short tests (under ~1 hour):** Just run `databricks auth login` beforehand. The scripts automatically pick up the U2M token via `databricks auth token` — no extra setup needed. +- **Long tests (over ~1 hour):** U2M tokens can expire mid-run. Use M2M OAuth with a service principal instead: ```bash export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com @@ -209,13 +214,15 @@ export DATABRICKS_CLIENT_ID= export DATABRICKS_CLIENT_SECRET= ``` +With the default parameters (6 apps, `--max-users 300`, `--step-duration 30`), a full run takes ~45 minutes — well under the U2M token lifetime. + ### Parameters Reference | Parameter | Required | Default | Description | |-----------|----------|---------|-------------| | `--app-url` | Yes | — | App URL(s) to test (repeatable) | -| `--client-id` | Recommended | `DATABRICKS_CLIENT_ID` env | Service principal client ID | -| `--client-secret` | Recommended | `DATABRICKS_CLIENT_SECRET` env | Service principal client secret | +| `--client-id` | No | `DATABRICKS_CLIENT_ID` env | Service principal client ID (for long tests) | +| `--client-secret` | No | `DATABRICKS_CLIENT_SECRET` env | Service principal client secret (for long tests) | | `--label` | No | Auto-derived from URL | Human-readable label per app (repeatable) | | `--compute-size` | No | Auto-detected or `medium` | Compute size tag per app: `medium`, `large` (repeatable) | | `--max-users` | No | `300` | Maximum concurrent simulated users | @@ -230,24 +237,33 @@ export DATABRICKS_CLIENT_SECRET= ```bash cd load-test-scripts/ -# Quick single-app test: +# Quick single-app test (uses U2M auth from `databricks auth login`): uv run run_load_test.py \ --app-url https://my-app.aws.databricksapps.com \ - --client-id --client-secret \ --dashboard --run-name quick-test -# Full matrix — 8 apps, overnight: +# Full 6-app matrix (~45 min with defaults, U2M is fine): uv run run_load_test.py \ --app-url https://my-app-medium-w2.aws.databricksapps.com \ + --app-url https://my-app-medium-w3.aws.databricksapps.com \ --app-url https://my-app-medium-w4.aws.databricksapps.com \ + --app-url https://my-app-large-w6.aws.databricksapps.com \ --app-url https://my-app-large-w8.aws.databricksapps.com \ --app-url https://my-app-large-w10.aws.databricksapps.com \ - --compute-size medium --compute-size medium \ - --compute-size large --compute-size large \ + --compute-size medium --compute-size medium --compute-size medium \ + --compute-size large --compute-size large --compute-size large \ + --dashboard --run-name full-sweep + +# Overnight high-concurrency test (use M2M OAuth — duration > 1 hour): +uv run run_load_test.py \ + --app-url https://my-app-medium-w2.aws.databricksapps.com \ + --app-url https://my-app-large-w8.aws.databricksapps.com \ + --compute-size medium --compute-size large \ + --client-id --client-secret \ --max-users 1000 --step-size 20 --step-duration 10 \ --dashboard --run-name overnight-sweep -# Multiple runs for statistical consistency: +# Multiple runs for statistical consistency (use M2M if conducting prolonged 1hr+ load testing): for RUN in r1 r2 r3 r4 r5; do uv run run_load_test.py \ --app-url ... \ @@ -268,7 +284,7 @@ done - `(max_users / step_size) * step_duration` seconds per app - With defaults: `(300 / 20) * 30 = 15 steps * 30s = ~7.5 min` per app -- For 4 apps: ~30 min per run +- For 6 apps (recommended matrix): ~45 min per run — well under the ~1 hour U2M token lifetime --- @@ -311,7 +327,7 @@ uv run dashboard_template.py ../load-test-runs// | Issue | Solution | |-------|----------| -| Auth token expired mid-test | Use M2M OAuth (`--client-id`/`--client-secret`) instead of static tokens | +| Auth token expired mid-test | Test is likely over ~1 hour. Use M2M OAuth (`--client-id`/`--client-secret`) for long tests | | Healthcheck fails | Verify app is ACTIVE: `databricks apps get --output json` | | 0 QPS / no results | Check `load-test-runs//