Skip to content

Various Performance Improvements#932

Merged
pranaygp merged 18 commits intomainfrom
pranaygp/step-handler-parallelization
Feb 6, 2026
Merged

Various Performance Improvements#932
pranaygp merged 18 commits intomainfrom
pranaygp/step-handler-parallelization

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented Feb 4, 2026

Summary

This PR includes performance optimizations for the step handler and comprehensive OpenTelemetry tracing improvements.

Performance Optimizations

  1. Skip initial HTTP call - Remove redundant world.steps.get() call and rely on step_started to return the step entity
  2. Parallelize completion operations - Run step_completed event creation and serializeTraceCarrier() concurrently
  3. Add server-side retryAfter validation - Local and postgres worlds now validate retryAfter timestamps and return HTTP 425 when not reached
  4. Lazy ref resolution for fire-and-forget events - Use remoteRefBehavior: 'lazy' for event types where the client discards the response data (e.g. step_created, step_completed, step_failed, run_completed), skipping expensive S3 ref resolution (~200-460ms savings per event)

Bug Fixes

  1. Fix race condition - Await step_started before hydration to ensure correct attempt count in error handlers
  2. Fix HTTP status codes - Use 409 (Conflict) for terminal state steps instead of 410 across all worlds

OpenTelemetry Improvements

  1. W3C trace context propagation - Add traceparent/tracestate headers to step queue messages for cross-service trace linking
  2. Service attribution - Add peer.service and RPC semantic conventions for Datadog service maps (workflow-server, VQS)
  3. Enhanced span coverage - Add step.hydrate, step.dehydrate, workflow.replay spans
  4. Improved naming - Rename queueMessagequeue.publish, use uppercase WORKFLOW/STEP span names
  5. Baggage propagation - Propagate workflow context via OTEL baggage (workflow.run_id, workflow.name)
  6. Milestone events - Add span events: retry.scheduled, step.skipped, step.delayed
  7. Error telemetry - Use recordException() with error categorization (fatal/retryable/transient)
  8. Event type in span names - world.events.create spans now include event type (e.g., world.events.create step_started)
  9. world-local instrumentation - Add OTEL instrumentation matching world-vercel

Estimated savings: 50-80ms per step (one fewer HTTP round-trip) + 200-460ms per fire-and-forget event (skip S3 ref resolution)

Related

Test plan

  • Unit tests pass (pnpm test - 304 tests passing)
  • TypeScript compiles
  • E2E tests with sleep workflow
  • Test edge cases:
    • Step already completed → 409 → re-enqueue workflow
    • retryAfter not reached → 425 → return timeout
    • Max retries exceeded → fail step

Observability Impact

After these changes, Datadog traces will show:

  • Complete trace linking across workflow and step executions
  • Service maps with workflow-server and VQS as external services
  • Detailed timing breakdown for hydration, execution, and dehydration
  • Error categorization for easier debugging (fatal vs retryable vs transient)
  • Event type visibility in span names for filtering

Copilot AI review requested due to automatic review settings February 4, 2026 17:38
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 4, 2026

🦋 Changeset detected

Latest commit: 0058bb4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 18 packages
Name Type
@workflow/core Patch
@workflow/world-local Patch
@workflow/world-postgres Patch
@workflow/world-vercel Patch
@workflow/builders Patch
@workflow/cli Patch
@workflow/docs-typecheck Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/web-shared Patch
workflow Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/world-testing Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Feb 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 4, 2026

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 0.040s (-7.9% 🟢) 1.018s (~) 0.978s 10 1.00x
💻 Local Nitro 0.042s (-9.9% 🟢) 1.007s (~) 0.965s 10 1.05x
💻 Local Express 0.044s (~) 1.008s (~) 0.964s 10 1.10x
🐘 Postgres Express 0.234s (-41.3% 🟢) 1.015s (~) 0.781s 10 5.88x
🐘 Postgres Nitro 0.292s (-27.1% 🟢) 1.015s (-2.5%) 0.723s 10 7.34x
🐘 Postgres Next.js (Turbopack) 0.420s (+113.0% 🔺) 1.019s (~) 0.599s 10 10.56x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 0.575s (+0.6%) 1.458s (-4.0%) 0.883s 10 1.00x
▲ Vercel Express 0.621s (-13.3% 🟢) 1.676s (+4.2%) 1.055s 10 1.08x
▲ Vercel Next.js (Turbopack) 0.784s (-2.4%) 1.635s (-7.3% 🟢) 0.850s 10 1.36x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 1.107s (+1.1%) 2.015s (~) 0.908s 10 1.00x
💻 Local Nitro 1.115s (~) 2.006s (~) 0.891s 10 1.01x
💻 Local Express 1.117s (~) 2.008s (~) 0.892s 10 1.01x
🐘 Postgres Nitro 2.135s (-7.0% 🟢) 3.015s (~) 0.880s 10 1.93x
🐘 Postgres Express 2.176s (-7.2% 🟢) 3.015s (~) 0.839s 10 1.97x
🐘 Postgres Next.js (Turbopack) 2.218s (+20.6% 🔺) 3.020s (+42.6% 🔺) 0.803s 10 2.00x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 2.463s (-4.8%) 3.605s (-2.6%) 1.142s 10 1.00x
▲ Vercel Nitro 2.551s (-1.9%) 3.694s (+2.5%) 1.143s 10 1.04x
▲ Vercel Express 2.589s (+0.8%) 3.526s (-0.8%) 0.937s 10 1.05x

🔍 Observability: Next.js (Turbopack) | Nitro | Express

workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 10.715s (~) 11.023s (~) 0.308s 3 1.00x
💻 Local Nitro 10.822s (~) 11.013s (~) 0.191s 3 1.01x
💻 Local Express 10.835s (~) 11.016s (~) 0.181s 3 1.01x
🐘 Postgres Next.js (Turbopack) 20.078s (+30.9% 🔺) 21.059s (+31.3% 🔺) 0.981s 2 1.87x
🐘 Postgres Nitro 20.296s (+0.6%) 21.034s (~) 0.739s 2 1.89x
🐘 Postgres Express 20.451s (+0.8%) 21.033s (~) 0.582s 2 1.91x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 20.300s (-7.4% 🟢) 21.021s (-6.9% 🟢) 0.721s 2 1.00x
▲ Vercel Express 22.044s (+2.2%) 22.657s (+1.7%) 0.612s 2 1.09x
▲ Vercel Next.js (Turbopack) 22.790s (+7.3% 🔺) 23.409s (+7.1% 🔺) 0.620s 2 1.12x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 27.177s (~) 28.044s (~) 0.867s 3 1.00x
💻 Local Nitro 27.426s (~) 28.030s (~) 0.604s 3 1.01x
💻 Local Express 27.495s (~) 28.028s (~) 0.533s 3 1.01x
🐘 Postgres Next.js (Turbopack) 49.191s (+30.2% 🔺) 49.586s (+30.3% 🔺) 0.395s 2 1.81x
🐘 Postgres Express 50.325s (~) 51.086s (~) 0.761s 2 1.85x
🐘 Postgres Nitro 50.459s (+31.7% 🔺) 51.083s (+30.8% 🔺) 0.624s 2 1.86x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 48.126s (-12.8% 🟢) 48.760s (-12.1% 🟢) 0.634s 2 1.00x
▲ Vercel Express 48.919s (-9.1% 🟢) 50.411s (-7.6% 🟢) 1.492s 2 1.02x
▲ Vercel Next.js (Turbopack) 49.059s (-8.1% 🟢) 50.154s (-7.6% 🟢) 1.094s 2 1.02x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 56.538s (~) 57.066s (~) 0.529s 2 1.00x
💻 Local Nitro 57.136s (~) 58.048s (~) 0.912s 2 1.01x
💻 Local Express 57.200s (~) 58.045s (~) 0.845s 2 1.01x
🐘 Postgres Express 75.591s (-20.6% 🟢) 76.097s (-20.8% 🟢) 0.507s 2 1.34x
🐘 Postgres Next.js (Turbopack) 75.648s (-4.7%) 76.134s (-5.0%) 0.486s 2 1.34x
🐘 Postgres Nitro 100.291s (+33.0% 🔺) 101.188s (+32.9% 🔺) 0.897s 1 1.77x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 104.568s (-10.1% 🟢) 105.637s (-9.7% 🟢) 1.069s 1 1.00x
▲ Vercel Express 106.814s (-1.7%) 108.347s (-0.6%) 1.533s 1 1.02x
▲ Vercel Next.js (Turbopack) 108.113s (-4.3%) 108.717s (-4.7%) 0.604s 1 1.03x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 1.401s (+1.7%) 2.011s (~) 0.611s 15 1.00x
💻 Local Nitro 1.412s (~) 2.006s (~) 0.594s 15 1.01x
💻 Local Express 1.420s (~) 2.007s (~) 0.587s 15 1.01x
🐘 Postgres Next.js (Turbopack) 1.701s (-2.6%) 2.158s (-3.4%) 0.457s 14 1.21x
🐘 Postgres Express 1.885s (-13.1% 🟢) 2.399s (-12.6% 🟢) 0.514s 13 1.35x
🐘 Postgres Nitro 2.448s (+4.5%) 3.014s (+10.0% 🔺) 0.565s 10 1.75x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.575s (-17.0% 🟢) 3.669s (-8.9% 🟢) 1.093s 9 1.00x
▲ Vercel Nitro 2.603s (-17.6% 🟢) 3.407s (-17.9% 🟢) 0.804s 9 1.01x
▲ Vercel Next.js (Turbopack) 2.997s (+9.4% 🔺) 3.903s (+1.0%) 0.906s 8 1.16x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 2.514s (~) 3.043s (-0.8%) 0.529s 10 1.00x
💻 Local Nitro 2.557s (-0.7%) 3.017s (~) 0.460s 10 1.02x
💻 Local Express 2.595s (+1.1%) 3.019s (~) 0.424s 10 1.03x
🐘 Postgres Nitro 7.894s (-18.3% 🟢) 8.587s (-14.5% 🟢) 0.694s 4 3.14x
🐘 Postgres Express 9.543s (-2.8%) 10.068s (~) 0.525s 3 3.80x
🐘 Postgres Next.js (Turbopack) 11.123s (+1.5%) 11.753s (~) 0.631s 3 4.42x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.802s (-14.6% 🟢) 3.801s (-5.3% 🟢) 1.000s 8 1.00x
▲ Vercel Next.js (Turbopack) 3.098s (+5.5% 🔺) 3.871s (-1.1%) 0.774s 8 1.11x
▲ Vercel Express 3.406s (+9.4% 🔺) 4.417s (+12.9% 🔺) 1.011s 8 1.22x

🔍 Observability: Nitro | Next.js (Turbopack) | Express

Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 7.126s (-2.9%) 7.615s (-6.0% 🟢) 0.489s 5 1.00x
💻 Local Nitro 7.269s (+0.9%) 8.278s (+6.1% 🔺) 1.009s 4 1.02x
💻 Local Next.js (Turbopack) 7.851s (+13.7% 🔺) 8.305s (+8.1% 🔺) 0.455s 4 1.10x
🐘 Postgres Express 44.327s (-10.8% 🟢) 45.179s (-9.9% 🟢) 0.852s 1 6.22x
🐘 Postgres Nitro 50.072s (+9.5% 🔺) 50.326s (+8.9% 🔺) 0.254s 1 7.03x
🐘 Postgres Next.js (Turbopack) 51.462s (-1.3%) 52.299s (-1.6%) 0.837s 1 7.22x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.599s (+5.7% 🔺) 4.289s (~) 0.690s 8 1.00x
▲ Vercel Nitro 3.978s (+1.2%) 4.693s (~) 0.716s 7 1.11x
▲ Vercel Next.js (Turbopack) 4.426s (+34.0% 🔺) 5.257s (+25.4% 🔺) 0.831s 6 1.23x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 1.443s (+0.9%) 2.006s (~) 0.563s 15 1.00x
💻 Local Express 1.454s (+0.7%) 2.007s (~) 0.553s 15 1.01x
💻 Local Next.js (Turbopack) 1.457s (+3.5%) 2.012s (~) 0.555s 15 1.01x
🐘 Postgres Nitro 2.063s (-1.8%) 2.602s (+3.5%) 0.539s 12 1.43x
🐘 Postgres Next.js (Turbopack) 2.151s (-4.1%) 2.842s (+3.3%) 0.691s 11 1.49x
🐘 Postgres Express 2.171s (-1.1%) 2.595s (~) 0.424s 12 1.50x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.733s (+6.8% 🔺) 3.619s (+2.5%) 0.886s 9 1.00x
▲ Vercel Next.js (Turbopack) 2.737s (+4.0%) 3.704s (-0.8%) 0.967s 9 1.00x
▲ Vercel Express 2.827s (-3.8%) 3.753s (-4.2%) 0.926s 8 1.03x

🔍 Observability: Nitro | Next.js (Turbopack) | Express

Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 2.603s (-6.8% 🟢) 3.033s (-8.7% 🟢) 0.430s 10 1.00x
💻 Local Express 2.732s (+4.3%) 3.032s (+0.7%) 0.299s 10 1.05x
💻 Local Nitro 2.766s (+3.0%) 3.067s (+1.9%) 0.301s 10 1.06x
🐘 Postgres Nitro 10.692s (-10.2% 🟢) 11.397s (-7.9% 🟢) 0.706s 3 4.11x
🐘 Postgres Express 11.676s (-1.0%) 12.024s (~) 0.349s 3 4.49x
🐘 Postgres Next.js (Turbopack) 11.725s (-12.7% 🟢) 12.028s (-12.3% 🟢) 0.304s 3 4.50x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.706s (-6.1% 🟢) 3.561s (-8.6% 🟢) 0.855s 9 1.00x
▲ Vercel Express 2.716s (-3.0%) 3.750s (+0.9%) 1.035s 8 1.00x
▲ Vercel Next.js (Turbopack) 2.854s (~) 3.663s (-5.3% 🟢) 0.809s 9 1.05x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 7.218s (-12.2% 🟢) 8.657s (-2.2%) 1.440s 4 1.00x
💻 Local Express 7.904s (+5.7% 🔺) 8.804s (+3.8%) 0.900s 4 1.10x
💻 Local Nitro 8.038s (+3.6%) 9.005s (+3.5%) 0.967s 4 1.11x
🐘 Postgres Express 49.435s (-5.4% 🟢) 50.208s (-5.5% 🟢) 0.773s 1 6.85x
🐘 Postgres Next.js (Turbopack) 52.390s (-3.0%) 53.194s (-1.8%) 0.804s 1 7.26x
🐘 Postgres Nitro 53.461s (-2.8%) 54.182s (-1.7%) 0.721s 1 7.41x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.208s (~) 4.042s (+2.1%) 0.834s 8 1.00x
▲ Vercel Nitro 3.463s (+8.8% 🔺) 4.215s (+12.4% 🔺) 0.751s 8 1.08x
▲ Vercel Next.js (Turbopack) 3.566s (+5.2% 🔺) 4.181s (+2.9%) 0.615s 8 1.11x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 0.142s (~) 1.004s (~) 0.015s (-28.6% 🟢) 1.025s (-0.5%) 0.884s 10 1.00x
💻 Local Nitro 0.180s (-2.0%) 0.992s (~) 0.014s (-2.8%) 1.020s (~) 0.840s 10 1.27x
💻 Local Express 0.186s (+0.8%) 0.992s (~) 0.014s (+0.7%) 1.021s (~) 0.835s 10 1.31x
🐘 Postgres Next.js (Turbopack) 1.289s (+22.2% 🔺) 1.754s (+37.0% 🔺) 0.000s (NaN%) 2.017s (+32.8% 🔺) 0.727s 10 9.10x
🐘 Postgres Express 1.318s (-0.9%) 1.790s (+4.6%) 0.000s (NaN%) 2.014s (~) 0.695s 10 9.31x
🐘 Postgres Nitro 2.460s (+120.4% 🔺) 2.579s (+31.4% 🔺) 0.000s (NaN%) 3.016s (+49.8% 🔺) 0.556s 10 17.37x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 2.383s (-9.6% 🟢) 2.835s (-11.8% 🟢) 0.200s (+1.2%) 3.515s (-10.5% 🟢) 1.132s 10 1.00x
▲ Vercel Express 2.425s (-4.9%) 2.645s (-14.9% 🟢) 0.156s (-39.1% 🟢) 3.344s (-11.2% 🟢) 0.919s 10 1.02x
▲ Vercel Nitro 2.506s (-7.4% 🟢) 3.485s (+10.0% 🔺) 0.161s (-44.5% 🟢) 4.135s (+4.8%) 1.629s 10 1.05x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Next.js (Turbopack) 10/12
🐘 Postgres Express 4/12
▲ Vercel Nitro 7/12
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 💻 Local 9/12
Next.js (Turbopack) 💻 Local 10/12
Nitro 💻 Local 9/12
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Starter: Community world (local development)
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)

📋 View full workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 4, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 479 0 38 517
✅ 💻 Local Development 438 0 32 470
✅ 📦 Local Production 438 0 32 470
✅ 🐘 Local Postgres 438 0 32 470
✅ 🪟 Windows 47 0 0 47
❌ 🌍 Community Worlds 31 169 0 200
✅ 📋 Other 129 0 12 141
Total 2000 169 146 2315

❌ Failed Tests

🌍 Community Worlds (169 failed)

mongodb (42 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • readableStreamWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • outputStreamWorkflow
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

redis (42 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • readableStreamWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • outputStreamWorkflow
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

starter (43 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • readableStreamWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • outputStreamWorkflow
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • health check (CLI) - workflow health command reports healthy endpoints
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

turso (42 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • readableStreamWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • outputStreamWorkflow
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 43 0 4
✅ example 43 0 4
✅ express 43 0 4
✅ fastify 43 0 4
✅ hono 43 0 4
✅ nextjs-turbopack 46 0 1
✅ nextjs-webpack 46 0 1
✅ nitro 43 0 4
✅ nuxt 43 0 4
✅ sveltekit 43 0 4
✅ vite 43 0 4
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 43 0 4
✅ express-stable 43 0 4
✅ fastify-stable 43 0 4
✅ hono-stable 43 0 4
✅ nextjs-turbopack-stable 47 0 0
✅ nextjs-webpack-stable 47 0 0
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ sveltekit-stable 43 0 4
✅ vite-stable 43 0 4
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 43 0 4
✅ express-stable 43 0 4
✅ fastify-stable 43 0 4
✅ hono-stable 43 0 4
✅ nextjs-turbopack-stable 47 0 0
✅ nextjs-webpack-stable 47 0 0
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ sveltekit-stable 43 0 4
✅ vite-stable 43 0 4
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 43 0 4
✅ express-stable 43 0 4
✅ fastify-stable 43 0 4
✅ hono-stable 43 0 4
✅ nextjs-turbopack-stable 47 0 0
✅ nextjs-webpack-stable 47 0 0
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ sveltekit-stable 43 0 4
✅ vite-stable 43 0 4
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 47 0 0
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 0
❌ mongodb 5 42 0
✅ redis-dev 3 0 0
❌ redis 5 42 0
✅ starter-dev 3 0 0
❌ starter 4 43 0
✅ turso-dev 3 0 0
❌ turso 5 42 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 43 0 4
✅ e2e-local-postgres-nest-stable 43 0 4
✅ e2e-local-prod-nest-stable 43 0 4

📋 View full workflow run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the step handler performance by parallelizing independent async operations to reduce step execution overhead. The changes introduce three strategic optimizations that overlap HTTP calls with CPU-bound work and run independent async operations concurrently.

Changes:

  • Parallelize initialization of port, span kind, and step entity fetching at handler startup
  • Overlap step_started event creation with CPU-bound argument hydration
  • Run step_completed event and trace serialization concurrently before queueing workflow continuation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/core/src/runtime/step-handler.ts Implements three parallelization optimizations to reduce step execution latency by overlapping independent async operations
.changeset/step-handler-parallelization.md Documents the patch-level change for release notes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/runtime/step-handler.ts Outdated
@pranaygp pranaygp marked this pull request as draft February 4, 2026 18:08
pranaygp and others added 4 commits February 4, 2026 16:14
- Parallelize getPort(), getSpanKind(), and world.steps.get() calls
- Start step_started event creation while hydrating arguments (CPU work)
- Parallelize step_completed event with trace serialization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reverts Optimization 1 to fix a race condition where hydrateStepArguments()
could throw before stepStartedPromise was awaited, causing stale step.attempt
in the catch handler and potentially allowing extra retries.

Optimizations 0 and 2 are preserved as they don't have this issue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Eliminate the world.steps.get() HTTP call by calling step_started first
and relying on server-side validation. This saves 50-80ms per step
execution by removing one HTTP round-trip.

The server (workflow-server) now validates:
- Step not in terminal state (returns 409)
- retryAfter timestamp reached (returns 425 with Retry-After header)
- Workflow still active (returns 410 if completed)

Changes:
- Remove world.steps.get() from initial Promise.all
- Call step_started first to get step entity and validate state
- Handle 409 (terminal state) by re-queueing workflow
- Handle 425 (retryAfter not reached) by returning timeout
- Handle 410 (workflow gone) as no-op

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add server-side retryAfter validation to match workflow-server behavior:
- Check retryAfter timestamp before allowing step_started
- Return HTTP 425 with retryAfter timestamp in response meta
- Clear retryAfter field when step starts successfully

This ensures consistent behavior across all world implementations
and allows the step-handler optimization to work correctly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
pranaygp and others added 13 commits February 4, 2026 16:51
Aligns local and postgres worlds with workflow-server, which returns
409 via InvalidOperationStateError for step in terminal state.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The world-vercel package was creating spans under a separate
'workflow-world-vercel' service name, causing HTTP spans for
workflow-server API calls (step_started, step_completed) to be
filtered out when viewing traces for the main application service.

Now uses the same 'workflow' tracer name as @workflow/core to
ensure all spans are reported under the parent application's service.
…-parallelization

* origin/main:
  Add x-workflow-run-id header to queue messages (#922)
  Bump Next.js and React in workbenches (#944)
  Add subpath export resolution for package IDs (#901)
  Consolidate console logging to structured logger utility (#935)

# Conflicts:
#	packages/core/src/runtime/step-handler.ts
- Start Jaeger container for local trace visualization
- Configure OTEL exporter environment variables for dev server
- Open Jaeger UI automatically
- Add documentation about available trace attributes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add telemetry.ts and instrumentObject.ts to world-local for tracing parity
  with world-vercel (world.runs, world.steps, world.events, world.hooks spans)
- Change workflow span name from uppercase "WORKFLOW" to lowercase "workflow"
  for consistency with step spans and OTEL naming conventions
- Add step.execute child span to trace actual user step function execution
  separately from step handler infrastructure

These changes enable local development to have the same observability as
production deployments, making performance analysis and debugging easier.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…spans

HTTP spans use uppercase methods like "GET /path" and "POST /path".
Following the same convention, workflow and step spans now use:
- WORKFLOW <workflow-name>
- STEP <step-name>

Child spans (workflow.run, workflow.loadEvents, step.execute, world.events.create)
remain lowercase as they represent internal operations, not top-level entries.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Include traceparent and tracestate headers when queueing step
execution messages. This enables automatic trace propagation
by Vercel's infrastructure, potentially linking step invocation
spans to the parent workflow trace.

The trace carrier is now serialized once and included in both:
- Payload: for manual context restoration in step handler
- Headers: for automatic HTTP-based trace propagation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ation

- Add peer.service attributes for workflow-server and VQS for Datadog service maps
- Rename queueMessage span to queue.publish for consistency
- Add step.hydrate, step.dehydrate, and workflow.replay spans
- Include event type in world.events.create span names (e.g., "world.events.create step_started")
- Add span.recordException() for errors with category classification (fatal/retryable/transient)
- Add span events for milestones: retry.scheduled, step.skipped, step.delayed
- Add HTTP semantic conventions with peer.service for world-vercel HTTP calls
- Add baggage propagation for workflow context (run_id, workflow_name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- step-handler-parallelization.md: Add race condition fix and 409 status code fix
- world-vercel-telemetry-tracer.md: Add peer.service and event type in span names
- otel-tracing-improvements.md: New changeset for comprehensive OTEL improvements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Datadog derives the resource name from rpc.method attribute. Updated to
use the full span name (which includes event type) instead of just the
method name, so Datadog shows "world.events.create step_started" instead
of "world.events.create workflow-server".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip expensive S3 ref resolution (~200-460ms) for event types where
the client doesn't use the response entity data (step_created,
step_completed, step_failed, run_completed, etc). Only resolve refs
for run_created, run_started, and step_started where the client
reads the response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-parallelization

* origin/main:
  Remove "beads" config (#952)
  Upload Next.js server logs as artifact for Windows E2E (#946)
  Pass class as `this` context to custom serializer/deserializer methods (#947)
  [docs] Fix image link in NestJS getting started guide (#941)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
}

// ============================================================
// Baggage Propagation Utilities
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about OTEL baggage. Naming is hard

Comment thread packages/world-vercel/src/events.ts Outdated
// Events where the client uses the response entity data need 'resolve' (default).
// Events where the client discards the response can use 'lazy' to skip expensive
// S3 ref resolution on the server, saving ~200-460ms per event.
const eventsNeedingResolve = new Set([
Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be moved outside of the function - maybe event to world package or similar, since the way we except event responses to look like is consistent across worlds. Could be helpful for reference for third-party worlds, mostly, if the comment strings are framed appropriately.

Separately, it looks like we're missing fine-grained control for resolution (e.g. resolve run but not step, resolve step but not run, etc.), which could be a future optimization

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to module scope — good call, no reason to recreate it on every call.

Moving this to @workflow/world so all world implementations can share it, and adding fine-grained per-entity resolution control (e.g. resolve run but not step) are both great follow-up ideas. Will track those separately.

throw new WorkflowAPIError(
`Cannot modify step in terminal state "${validatedStep.status}"`,
{ status: 410 }
{ status: 409 }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of other similar conflicts in the local world still throw 410. Should those be changed too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I audited all three implementations (workflow-server, world-local, world-postgres) to verify status code consistency.

Summary of the error semantics:

Condition workflow-server error HTTP Status Meaning
Run state transition on terminal run InvalidOperationStateError 409 Conflict Can't re-complete/re-fail a finished run
Creating step/hook on terminal run InvalidOperationStateError 409 Conflict Can't create new work on a finished run
Step in terminal state InvalidOperationStateError 409 Conflict Step already completed/failed
Non-running step on terminal run Not applicable (falls through) 410 Gone Run is done, step wasn't even started
Run not in 'running' state WorkflowNotRunningError 410 Gone Run hasn't started or is gone
retryAfter not reached RetryAfterNotReachedError 425 Too Early Wait before retrying
Step/run/hook not found EntityNotFoundError 404 Not Found Entity doesn't exist

Findings: Both world-local and world-postgres were using 410 (Gone) for cases that should be 409 (Conflict), specifically:

  • "Cannot transition run from terminal state" — should be 409 (matches InvalidOperationStateError in workflow-server)
  • "Cannot create new entities on terminal run" — should be 409 (same reason)
  • step_completed/step_failed fallbacks for step in terminal state (world-postgres only) — should be 409

The remaining 410s are correct:

  • "Cannot modify non-running step on run in terminal state" = the run is gone, 410 is appropriate (matches WorkflowNotRunningError semantics)

All mismatches have been fixed in this PR.

Comment thread packages/core/src/runtime/step-handler.ts Outdated
…us code audit

- Move eventsNeedingResolve Set to module scope (PR review #2)
- Rewrite changelog-style OPTIMIZATION comments as current-state docs (PR review #4)
- Fix run terminal state errors: 410 → 409 in world-local and world-postgres to
  match workflow-server's InvalidOperationStateError (409) (PR review #3)
- Fix remaining step terminal state 410 → 409 in world-postgres fallback paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants