Skip to content

Wire encryption into serialization and fix EventsConsumer for async replay#957

Closed
TooTallNate wants to merge 15 commits intofix/async-deserialization-orderingfrom
nate/wire-encryption
Closed

Wire encryption into serialization and fix EventsConsumer for async replay#957
TooTallNate wants to merge 15 commits intofix/async-deserialization-orderingfrom
nate/wire-encryption

Conversation

@TooTallNate
Copy link
Copy Markdown
Member

@TooTallNate TooTallNate commented Feb 6, 2026

Summary

Wires AES-GCM encryption into the serialization layer and redesigns the EventsConsumer to handle async decryption and out-of-order event logs:

Serialization

  • maybeEncrypt/maybeDecrypt in serialization.ts with encr 4-byte format prefix
  • All 8 dehydrate/hydrate functions accept optional key: CryptoKey | undefined
  • Stream encryption/decryption with cached CryptoKey per stream instance
  • 32 encryption unit tests (primitives, maybeEncrypt/maybeDecrypt, isEncrypted, complex type round-trips)

EventsConsumer redesign

  • Scan-forward consume: Scans all remaining events to find one a subscriber can match, instead of processing strictly in order. Handles out-of-order step_created events from async DB writes without deadlocking.
  • Watchdog timer (1s): Replaces the aggressive setTimeout(0) unconsumed event check. Only fires if replay is truly deadlocked — no progress for 1 second.
  • enqueueResolve(): Step/hook/sleep callbacks use this instead of raw setTimeout for async work (decryption). Tracks pendingResolves count to suppress watchdog during async operations.
  • onEventConsumed callback: Passive timestamp observation without participating in event matching.
  • Post-completion corruption check: After workflow completes, verifies no events remain unconsumed (catches duplicate/orphaned events).
  • suppressUnconsumedCheck()/unsuppressUnconsumedCheck(): Used around workflow args decryption to prevent premature watchdog firing.

Dual-mode test suite

  • All 65 workflow.test.ts tests run twice: once without encryption and once with a real AES-256-GCM key (130 total)
  • Storytime-pattern integration test: models production workflow with Promise.all, sequential steps, hooks, and out-of-order step_created events

Copilot AI review requested due to automatic review settings February 6, 2026 02:35
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Error Error Mar 3, 2026 10:34pm
example-nextjs-workflow-webpack Error Error Mar 3, 2026 10:34pm
example-workflow Error Error Mar 3, 2026 10:34pm
workbench-astro-workflow Error Error Mar 3, 2026 10:34pm
workbench-express-workflow Error Error Mar 3, 2026 10:34pm
workbench-fastify-workflow Error Error Mar 3, 2026 10:34pm
workbench-hono-workflow Error Error Mar 3, 2026 10:34pm
workbench-nitro-workflow Error Error Mar 3, 2026 10:34pm
workbench-nuxt-workflow Error Error Mar 3, 2026 10:34pm
workbench-sveltekit-workflow Error Error Mar 3, 2026 10:34pm
workbench-vite-workflow Error Error Mar 3, 2026 10:34pm
workflow-docs Error Error Mar 3, 2026 10:34pm
workflow-nest Error Error Mar 3, 2026 10:34pm
workflow-swc-playground Ready Ready Preview, Comment Mar 3, 2026 10:34pm

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 6, 2026

🦋 Changeset detected

Latest commit: f84db02

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 15 packages
Name Type
@workflow/core Patch
workflow Patch
@workflow/web-shared Patch
@workflow/builders Patch
@workflow/world-vercel Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/world-testing Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 6, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 534 0 38 572
✅ 💻 Local Development 556 0 68 624
✅ 📦 Local Production 556 0 68 624
✅ 🐘 Local Postgres 556 0 68 624
✅ 🪟 Windows 49 0 3 52
❌ 🌍 Community Worlds 110 46 9 165
✅ 📋 Other 135 0 21 156
Total 2496 46 275 2817

❌ Failed Tests

🌍 Community Worlds (46 failed)

mongodb (1 failed):

  • webhookWorkflow

turso (45 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • parallelSleepWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling retry behavior workflow completes despite transient 5xx on step_completed
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument
  • cancelRun - cancelling a running workflow
  • cancelRun via CLI - cancelling a running workflow
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 48 0 4
✅ example 48 0 4
✅ express 48 0 4
✅ fastify 48 0 4
✅ hono 48 0 4
✅ nextjs-turbopack 51 0 1
✅ nextjs-webpack 51 0 1
✅ nitro 48 0 4
✅ nuxt 48 0 4
✅ sveltekit 48 0 4
✅ vite 48 0 4
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 45 0 7
✅ express-stable 45 0 7
✅ fastify-stable 45 0 7
✅ hono-stable 45 0 7
✅ nextjs-turbopack-canary 49 0 3
✅ nextjs-turbopack-stable 49 0 3
✅ nextjs-webpack-canary 49 0 3
✅ nextjs-webpack-stable 49 0 3
✅ nitro-stable 45 0 7
✅ nuxt-stable 45 0 7
✅ sveltekit-stable 45 0 7
✅ vite-stable 45 0 7
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 45 0 7
✅ express-stable 45 0 7
✅ fastify-stable 45 0 7
✅ hono-stable 45 0 7
✅ nextjs-turbopack-canary 49 0 3
✅ nextjs-turbopack-stable 49 0 3
✅ nextjs-webpack-canary 49 0 3
✅ nextjs-webpack-stable 49 0 3
✅ nitro-stable 45 0 7
✅ nuxt-stable 45 0 7
✅ sveltekit-stable 45 0 7
✅ vite-stable 45 0 7
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 45 0 7
✅ express-stable 45 0 7
✅ fastify-stable 45 0 7
✅ hono-stable 45 0 7
✅ nextjs-turbopack-canary 49 0 3
✅ nextjs-turbopack-stable 49 0 3
✅ nextjs-webpack-canary 49 0 3
✅ nextjs-webpack-stable 49 0 3
✅ nitro-stable 45 0 7
✅ nuxt-stable 45 0 7
✅ sveltekit-stable 45 0 7
✅ vite-stable 45 0 7
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 49 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 0
❌ mongodb 48 1 3
✅ redis-dev 3 0 0
✅ redis 49 0 3
✅ turso-dev 3 0 0
❌ turso 4 45 3
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 45 0 7
✅ e2e-local-postgres-nest-stable 45 0 7
✅ e2e-local-prod-nest-stable 45 0 7

📋 View full workflow run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements end-to-end encryption for workflow user data by wiring encryption functionality into the serialization layer. It builds on previous PRs that generated runId client-side (#954), made serialization functions async (#955), and added the Vercel encryption implementation (#956).

Changes:

  • Adds encryption/decryption helper functions (maybeEncrypt, maybeDecrypt, isEncrypted, peekFormatPrefix) and unused stream encryption utilities (getEncryptStream, getDecryptStream)
  • Adds ENCRYPTED ('encr') format prefix to SerializationFormat enum
  • Wires encryption into all 8 dehydrate/hydrate functions by calling maybeEncrypt after serialization and maybeDecrypt before deserialization
  • Implements inline stream encryption in WorkflowServerWritableStream and WorkflowServerReadableStream
  • Passes runId to WorkflowServerReadableStream for decryption context
  • Adds 8 comprehensive integration tests using a mock XOR encryptor
  • Removes unused _ prefixes from encryptor and runId parameters (now actively used)
  • Adds type casts (as any[], as any, as TResult) to hydration call sites to preserve type information

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
packages/core/src/serialization.ts Core encryption implementation: adds format prefix, helper functions, encryption wiring in dehydrate/hydrate functions, and inline stream encryption/decryption
packages/core/src/workflow.ts Adds type cast for hydrated workflow arguments
packages/core/src/runtime/step-handler.ts Adds type cast for hydrated step arguments
packages/core/src/runtime/run.ts Adds type cast for hydrated workflow return value
packages/core/src/serialization.test.ts Adds 8 encryption integration tests with mock XOR encryptor
.changeset/e2e-encryption.md Documents the end-to-end encryption feature addition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/serialization.ts Outdated
Comment thread packages/core/src/workflow.ts Outdated
Comment thread packages/core/src/runtime/step-handler.ts Outdated
Comment thread packages/core/src/runtime/run.ts
Comment thread packages/core/src/serialization.ts Outdated
Comment thread packages/core/src/serialization.ts Outdated
Comment thread packages/core/src/serialization.ts
Comment thread packages/core/src/serialization.ts Outdated
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #957 - Wire encryption into serialization layer

Summary: This is the "light the fuse" PR that actually enables encryption. It wires maybeEncrypt/maybeDecrypt into all 8 dehydrate/hydrate functions, adds stream encryption/decryption support, and adds the ENCRYPTED format prefix.

Strengths

  1. Opt-in design: Encryption is entirely opt-in. When no Encryptor is provided (or it has no encrypt/decrypt methods), data passes through unchanged. Existing unencrypted workflows continue to work.

  2. Backwards compatibility: maybeDecrypt checks for the encr prefix before attempting decryption, so encrypted and unencrypted data can coexist. Old runs with devl prefix are handled correctly. The test "should handle unencrypted data when encryptor is provided" explicitly validates this.

  3. Stream encryption: Both WorkflowServerWritableStream (encrypt on write) and WorkflowServerReadableStream (decrypt on read) are updated. The getEncryptStream/getDecryptStream transform streams are a clean abstraction.

  4. Error messages: Good error messages when encrypted data is encountered but no decryptor is available, pointing users to set VERCEL_DEPLOYMENT_KEY.

Concerns

  1. dehydrateWorkflowReturnValue lost v1Compat parameter: In the diff, dehydrateWorkflowReturnValue no longer accepts v1Compat. The if (v1Compat) { return revive(str); } branch is removed. This means legacy specVersion 1 workflows can no longer serialize return values. Is this intentional? If so, this is a breaking change that should be documented. Currently the caller in workflow.ts doesn't pass v1Compat for return values, but any external callers would be affected.

  2. Stream encryption uses world.encrypt directly: In WorkflowServerWritableStream, the encryption calls world.encrypt!(chunk, { runId }). But the world is obtained from getWorld() at construction time. This means stream encryption always uses the current deployment's encryptor, not a resolved per-run encryptor. For cross-deployment scenarios (reading streams from a different deployment), this could be an issue. Similarly, WorkflowServerReadableStream calls world.decrypt directly. Consider whether stream encryption should also go through resolveEncryptorForRun.

  3. Removed runId guards in step reducers: The PR removes the if (!runId) throw guards from getStepReducers for ReadableStream and WritableStream serialization. While these were likely never hit in practice (runId is always available during step execution), removing them silently means any future bugs would produce confusing errors downstream instead of a clear error message at the serialization boundary.

  4. Type assertions proliferation: Several callers now need as TResult, as any, as any[] casts after the hydrate calls because the return types changed to Promise<unknown>. Consider updating the hydrate function signatures to use generics, e.g.:

    async function hydrateWorkflowReturnValue<T = unknown>(...): Promise<T>

    This would reduce the cast noise at call sites.

  5. Changeset scope: The changeset includes @workflow/web-shared and @workflow/world-testing but the changes in those packages are just updating call sites to pass the encryptor parameter. Consider whether these packages' users need to know about this change or if the changeset entries for them add noise.

Test Coverage

The 8 new encryption integration tests using the XOR mock encryptor are good:

  • Encrypt/decrypt round-trip for workflow args, step args, step return values, workflow return values
  • Wrong runId produces decryption failure
  • No encryptor = no encryption (devl prefix preserved)
  • Encrypted data works with encryptor on hydrate
  • Unencrypted data works with encryptor on hydrate

Missing test coverage:

  • Stream encryption/decryption (getEncryptStream/getDecryptStream and the Writable/Readable stream integration)
  • Error case: encrypted data encountered but no decrypt function available
  • Mixed encrypted/unencrypted chunks in the same stream

Overall

The encryption wiring is well-designed with clean opt-in semantics and good backwards compatibility. The main concerns are the v1Compat removal (potential breaking change), the direct world.encrypt/world.decrypt usage in streams (vs per-run encryptor resolution), and missing stream encryption test coverage.

Comment thread packages/core/src/serialization.ts Outdated
Comment thread packages/core/src/serialization.ts
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Wire encryption into serialization layer

Solid PR — the encryption wiring is clean, opt-in, and backwards-compatible. The maybeEncrypt/maybeDecrypt pattern is simple and effective, and the encr prefix nesting around the inner devl prefix is a nice layered design.

A few issues below, mostly around performance (repeated key lookups on every stream chunk/flush) and a stale comment.

@@ -725,7 +812,7 @@ export function getExternalReducers(
const streamId = ((global as any)[STABLE_ULID] || defaultUlid)();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: encryption key fetched on every stream flush

world.getEncryptionKeyForRun?.(runId) is called inside flush(), which runs every 10ms (STREAM_FLUSH_INTERVAL_MS) when the stream is active. If this is backed by a network call (e.g., KMS or an API), that's a lot of per-flush overhead.

Consider caching the key once at construction time or on first flush:

let cachedKey: Uint8Array | undefined;
// in flush:
const key = cachedKey ??= await world.getEncryptionKeyForRun?.(runId);

Same issue exists in WorkflowServerReadableStream (line ~392) where the key is fetched on every pull() call — each chunk triggers a new key lookup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The writable stream now resolves the key once at construction time (lazy, on first flush) and caches the CryptoKey promise. The readable stream caches the CryptoKey in a private field after the first encrypted chunk. Neither path calls getEncryptionKeyForRun more than once per stream instance.

Comment thread packages/core/src/serialization.ts
const startTime = Date.now();
const encryptionKey =
await world.getEncryptionKeyForRun?.(workflowRunId);
const result = await hydrateStepArguments(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Double as any cast — the inner cast on line 310 (hydrateStepArguments(...) as any) and the outer cast on line 303 ((await trace(...)) as any) are redundant. Only the outer one is needed since trace infers its return type from the callback. Removing the inner cast reduces noise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the inner as any on hydrateStepArguments. The outer cast now uses a descriptive type: as { args: any[]; thisVal?: any; closureVars?: any } instead of as any. Added a comment explaining why the cast is needed (serialization layer returns unknown).

Comment thread .changeset/e2e-encryption.md Outdated
@@ -0,0 +1,6 @@
---
"@workflow/core": patch
"@workflow/world-vercel": patch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes in world-vercel

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no changes in world-vercel

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed @workflow/world-vercel from the changeset.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in this push.

Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with notes

Comment thread .changeset/e2e-encryption.md Outdated
@@ -0,0 +1,6 @@
---
"@workflow/core": patch
"@workflow/world-vercel": patch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no changes in world-vercel

encryptionKey,
ops
);
)) as any;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think any casts could warrant a comment, i.e. what's missing or why is TS confused

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I'm not sure what this is about. We shouldn't be doing as any. Will fix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining the cast and replaced as any with a typed assertion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants