Skip to content

Conversation

@dandavison
Copy link
Contributor

@dandavison dandavison commented Dec 19, 2025

What changed?

Standalone Activity feature branch

Why?

Adds Standalone Activity feature (CHASM Activity archetype)


Note

Adds standalone Activities powered by CHASM, including new Activity APIs, engine/notifier/polling, visibility, persistence/token wiring, and end-to-end tests.

  • Standalone Activity (CHASM):
    • New CHASM Activity component, state machine, tasks, handlers, metrics, config, and FX modules.
    • CHASM engine: add PollComponent, execution notifications, structured refs, transition API changes.
  • APIs & Protos:
    • Frontend: new methods Start/Describe/Poll/Terminate/RequestCancel/DeleteActivityExecution, plus List/CountActivityExecutions.
    • New internal Activity service/protos; extend history/matching requests with component_ref.
    • Persistence/token: add component_ref to TaskInfo and task token.
  • Server Integrations:
    • History: handle Activity start/heartbeat/complete/fail/cancel/terminate via CHASM; notifier for long-poll.
    • Matching: plumb component_ref through add/poll paths.
    • Visibility: activity search attributes (type/status/taskQueue), list/count support & query resolution; SQL scan fixes.
    • Clients/Interceptors/Metrics/Quotas: wire new endpoints, long-poll handling, priorities.
  • Config & Defaults:
    • Dynamic config toggles (activity.enableStandalone, history.enableChasm), long-poll timeouts/buffers.
  • Tests:
    • Extensive standalone activity functional tests (lifecycle, retries, timeouts, heartbeats, list/count).

Written by Cursor Bugbot for commit 5852b15. This will update automatically on new commits. Configure here.

bergundy and others added 30 commits December 19, 2025 11:01
Fixup feature branch after rebasing on `main` and pulling in latest
`api` protos:
- Rename `GetActivityResult` => `GetActivityExecutionResult`
- Add `WorkflowHandler` stubs

Foundations for CHASM activity

- built
Adds a state machine defining valid transitions for a CHASM activity.

The state machine does not include transitions related to cancel and
pause states: we can add those when we implement those features.

Part of foundations for CHASM activity implementation / Standalone
Activity.

- Built (a unit test was written but not committed)
Added support for starting a standalone activity execution.

Required API for standalone activity feature. Focus here is plumbing in
entity creation and handler boiler plate.

- [X] built
- [X] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

Integration tests to be added later when we have all the RPCs handled
and integrated. Tasks not plumbed in yet to support full interaction
with services.

---------

Co-authored-by: Dan Davison <dan.davison@temporal.io>
Co-authored-by: Roey Berman <roey@temporal.io>
Consolidated and refactored activity request validations to the chasm
package.
Added additional tests for the validator.

Existing workflow activities and standalone activities should share the
same validation code. Standalone activities also directly process the
frontend request and therefore has additional fields to validate
compared to embedded activities.

- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [ ] added new functional test(s)

This refactors the existing stack, but the validation is basically
exactly copied over and all relevant tests are passing. The standalone
start activity request is now cloned before sanitizing the request
attributes to preserve idempotent behavior during retries, but can
potentially impact performance if there are large inputs.

---------

Co-authored-by: Roey Berman <roey.berman@gmail.com>
Added standalone activity Chasm tasks. Added handling of start activity
and e2e implementation of standalone activity start execution with
existing services. Updated protos related to standalone activities.

The Chasm tasks are needed to kick off standalone activity execution via
the existing services. Proto changes needed to so that the component ref
can be passed and handled via service stack.

- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Roey Berman <roey.berman@gmail.com>
Co-authored-by: Dan Davison <dan.davison@temporal.io>
## What changed?
Added timeout tasks for standalone activities. Refactored existing
backoff algorithm to common. Added outcome field. Fixed a few errors as
the result of main rebase. Regen go.sum. Fixed broken redirection tests.
Uncommented `LifecycleStateTimedout` and added supporting logic in
`closeTransactionHandleRootLifecycleChange`

## Why?
We need schedule-to-start, schedule-to-close, and start-to-close timeout
support for standalone activities. Also reusing existing activity retry
backoff code, so refactored it to a common package. We need to bring
back LifecycleStateTimedout
because when a timeout lifecycle occurs the workflow status needs to
update to `WORKFLOW_EXECUTION_STATUS_TIMED_OUT`, else we encounter an
invalidate state change: `unable to change workflow state from Created
to Completed, status Failed`

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
E2E timeout tests not added yet as we need complete workflow API plumbed
before we can test this.

---------

Co-authored-by: Roey Berman <roey.berman@gmail.com>
Co-authored-by: Dan Davison <dan.davison@temporal.io>
Co-authored-by: Roey Berman <roey@temporal.io>
)

## What changed?
Added request ID transition option when creating activity entity.

## Why?
The requestID is set by the matching service to a UUID, allowing safe
retries if the response is lost. Here we add the request ID transition
option so chasm engine handles it for us under the hood.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
- Add `history.ChasmNotifier` for subscribing to CHASM execution state transitions
- Implement `chasm.PollComponent`
- Add `PollActivityExecution` API handler

- Needed for standalone activity
- Needed for long-poll of other CHASM archetypes

- [x] built
- [x] added new unit test(s)
- [x] added new functional test(s)
Added standalone activity completion and failure handling. Refactored
existing timeout failure handling. Refactored existing check for retry
method.

Needed to support standalone activities full operation.

- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [X] added new functional test(s)

---------

Co-authored-by: Dan Davison <dan.davison@temporal.io>
## What changed?
Added standalone activity termination handling.

## Why?
Needed to support standalone activities full operation.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [X] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds termination support for standalone activities across API,
frontend/handler, state machine, and tests.
> 
> - **API/Proto**:
> - Add `TerminateActivityExecution` RPC: request/response messages in
`proto/v1/request_response.proto` and service in
`proto/v1/service.proto`.
> - Regenerate clients/servers: `activitypb` request/response, service,
layered client, and gRPC stubs.
> - **Frontend**:
> - Implement `TerminateActivityExecution` in `frontend.go` to resolve
namespace and forward to activity service.
> - **Backend/Handler**:
> - Add handler `TerminateActivityExecution` to update component via
`(*Activity).handleTerminated`.
>   - Add `Activity.handleTerminated` to apply termination transition.
> - **State Machine**:
> - Enhance `TransitionTerminated` to finalize execution, set
`LastWorkerIdentity`, and record `TerminatedFailureInfo` in outcome.
> - **Tests**:
>   - Add unit test `TestTransitionTerminated`.
> - Add functional tests for terminating an activity and preventing
termination after completion.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
de5e4f2. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Dan Davison <dan.davison@temporal.io>
Added standalone activity cancellation request and respond handling.

Needed to support standalone activities full operation.

- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [X] added new functional test(s)

Co-authored-by: Dan Davison <dan.davison@temporal.io>
## What changed?
Add handling of business ID policy for standalone activities. Refactored
standalone activity validations.

## Why?
Required so that the chasm engine will handle the business ID policies
based on the RPC request.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [X] added new functional test(s)

## Potential risks
Will need to add more tests once we rebase standalone-activity with
main.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds business ID reuse/conflict policy handling for standalone
activities, refactors validation to operate on the full request with
sane defaults, and updates tests accordingly.
> 
> - **Activity Backend**:
> - Map `enumspb` activity ID reuse/conflict policies to chasm policies
and pass via `chasm.WithBusinessIDPolicy` in `StartActivityExecution`.
>   - Propagate `RequestId` via `chasm.WithRequestID`.
> - **Validation**:
> - Refactor `ValidateStandaloneActivity` to accept the entire
`workflowservice.StartActivityExecutionRequest` and mutate in place
(request ID, ID policies, input size, search attributes).
> - Add `normalizeAndValidateIDPolicy` with defaults and incompatibility
checks; require request ID when attaching completion callbacks.
>   - Minor tweaks to input/search attributes validation paths.
> - **Frontend**:
> - Update `validateAndPopulateStartRequest` to use new validation
signatures and remove redundant vars.
> - **Tests**:
> - Add/extend unit tests for ID policy defaults/mismatch, input size
warning/error, and request ID length.
> - Add functional tests covering ID reuse (`REJECT_DUPLICATE`,
`ALLOW_DUPLICATE_FAILED_ONLY`) and conflict policy failure on existing
execution.
> - Adjust cancellation tests to use unique `RequestId`; remove default
`RequestId` from `startActivity` helper.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1225ccc. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Dan Davison <dan.davison@temporal.io>
)

## What changed?
- Fix bug: schedule-to-close timer task validator was incorrectly
requiring activity attempt at task execution time to be equal to
activity attempt at task creation
- Add test of schedule-to-close timeout that fails with the bug fix
reverted
- Do not set empty struct as outcome failure on attempt failure when
retries are exhausted.
- Improve doc comments

## Why?
- Standalone activity schedule-to-close was incorrect: would not have
fired after attempt 1 without this fix
- Setting empty struct on attempt failure when retries are exhausted
should not be necessary and it is fragile to introduce special values
that code might start to rely on.

## How did you test it?
- [x] built
- [x] added new functional test(s)
…#8723)

## What changed?
Bug fix: with schedule-to-close timeout not set by the caller, we were
not scheduling retries

## Why?
Required for CHASM activity correctness (e.g. standalone activity)

## How did you test it?
- [x] built
- [x] added new functional test(s)
All changes needed to make tests compile and pass after
merging main into standalone-activity.
…onOutcome (#8771)

- Add new public gRPC methods `DescribeActivityExecution` and
`GetActivityExecutionOutcome`. See
temporalio/api#673
- These replace the previous `PollActivityExecution`
- Respond to additional API changes: inlining of `ActivityOptions`
- Configure quota for the new methods in their blocking and non-blocking
forms
- Update test suite

- Implements agreed Standalone Activity design

- [x] built
- [x] covered by existing tests
- [x] added new functional test(s)
## What changed?
Change dev environment to enable Chasm

## Why?
Chasm should be enabled for development by default so devs don't have to
explicitly set the config.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
Implement heartbeating for CHASM activities (standalone activity)

- [x] built
- [x] added new functional test(s)
## What changed?
Add standalone activity metrics

## Why?
Need standalone activity metrics for observability purposes

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [X] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds comprehensive metrics and payload-size recording for standalone
activities, introducing contextual request wrappers and wiring metrics
through state transitions, timeout executors, and history APIs.
> 
> - **Activity component/state machine**:
> - Introduces `RequestWithContext` carrying `Token`, `MetricsHandler`,
`NamespaceName`, and `BreakdownMetricsByTaskQueue` and replaces prior
request wrappers.
> - Records metrics on schedule, attempts, and terminal transitions
(success, fail, cancel, timeouts), including
`ActivityStartToCloseLatency`, `ActivityScheduleToCloseLatency`,
success/fail/cancel/timeout counters, and per-timeout tags.
> - Adds `recordPayloadSize(...)` to emit payload sizes for input,
heartbeat details, results, and failures.
> - **Timeout and dispatch executors
(`chasm/lib/activity/activity_tasks.go`)**:
> - Add `timeoutTaskExecutorOptions` (dynamic config, metrics, namespace
registry); resolve namespace and emit timeout metrics during
schedule/start/close/heartbeat timeouts and retries.
> - **Activity handler**:
> - Injects `metrics.Handler` and `namespace.Registry`;
`StartActivityExecution` emits input payload size on scheduling.
> - **History APIs (record/respond activity ops)**:
> - Pass `RequestWithContext` (token, metrics handler, namespace,
breakdown setting) into chasm component calls.
> - **Tests**:
> - Update/add unit tests to validate metric emissions and payload-size
recording across transitions and timeouts.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
02c6aba. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Add dynamic config to toggle standalone activity functionality.
Refactored frontend dynamic config into activity config. Removed
standalone activity dc prefix `chasm`

## Why?
We need the ability to toggle standalone activity functionality as we go
to prod. We also agreed to remove the `chasm` prefix from any chasm
originated dynamic config keys

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
## What changed?
After merging `main` into `standalone-activity` (see 88b0e52) a couple
of things were broken:
- The standalone activity protos needed rebasing on `main`'s protos
- `GetActivityExecutionOutcome` needed renaming to
`PollActivityExecution` (respond to upstream proto change)
- The way we obtain the namespace name in the worker API handlers shared
with workflow activities needed changing, to pass the correct business
ID to `api.GetActiveNamespace`

## How did you test it?
- [x] built
- [x] covered by existing tests
## What changed?
Add standalone activity terminate request ID handling. 

## Why?
On termination retry, it's nice if a duplicate req ID won't fail if the
termination happened on first attempt.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [X] added new functional test(s)


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduce request ID–based idempotency for standalone activity
termination, validating/populating RequestId at the frontend and
persisting it in a new terminate_state, with accompanying proto changes
and tests.
> 
> - **Activity/state machine**:
> - Implement idempotent termination in `handleTerminated`: no-op on
same `request_id`, `FailedPrecondition` on different ID.
> - `TransitionTerminated` now sets
`ActivityState.terminate_state.request_id` and records terminated
failure.
> - **Frontend/API**:
> - `TerminateActivityExecution` clones request, validates `request_id`
length, auto-generates UUID when missing, and forwards to service.
> - **Proto/serialization**:
> - Add `message ActivityTerminateState { string request_id = 1; }` and
`ActivityState.terminate_state` field.
> - Regenerate Go code and helper methods; fix "cancelation" →
"cancellation".
> - **Tests**:
>   - Unit: assert `terminate_state.request_id` set on termination.
> - Functional: duplicate `RequestId` termination succeeds; different
`RequestId` fails with `FailedPrecondition`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1452ec2. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Add standalone heartbeat by ID

## Why?
We want to support existing RPCs to activity heartbeat

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [X] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Enable `RecordActivityTaskHeartbeatById` to heartbeat standalone
activities (no `workflowId`) by embedding a `componentRef` in the
activity task token; add a functional test verifying heartbeats keep the
activity alive.
> 
> - **Frontend**
>   - `service/frontend/workflow_handler.go`
> - `RecordActivityTaskHeartbeatById`: accept empty `workflowId` for
standalone activities; construct and serialize `componentRef` via
`chasm.NewComponentRef[*activity.Activity]` and include it in the
activity `taskToken`.
> - **Tests**
>   - `tests/standalone_activity_test.go`
> - Add `RecordHeartbeatByIDStaysAlive` to verify heartbeating-by-ID
prevents timeout and activity completes successfully.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c2a4310. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Implement standalone activity ListActivityExecutions and CountActivityExecutions.
## What changed?
Added support for ID conflict use existing. Added error details on
conflict policy fail.

## Why?
We now have the use existing policy support on chasm engine, so we can
support it. Also need to pass back error details on conflict policy fail
in struct ActivityExecutionAlreadyStartedError

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [X] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is
generating a summary for commit
f264bba. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
In `chasm.PollComponent`, if the passed component ref has empty run ID,
resolve this to the current run ID.

## Why?
The previous code state was broken: it was possible to submit
`PollActivityExecution` with no run ID, which led to `Subscribe(key)`
being called where key lacked a run ID. But the notifications were being
sent to a different key (one with the run ID) which led to the long-poll
never receiving any notification and hence always timing out.

Existing chasm APIs `ReadComponent` and `UpdateComponent` both accept
empty run ID and silently resolve it to current using
`getExecutionLease`. It was a bug that `PollComponent` resolved to
current using `getExecutionLease`, but then subscribed to notifications
using a key that did not contain the resolved run ID.

At the external API level, in several of our non-CHASM user-facing gRPC
APIs, empty run ID means "use latest at the time the request is
processed" (e.g. `GetWorkflowExecutionHistory`). And we want these
semantics for CHASM APIs such as `PollActivityExecution`. This change
gives those semantics.


## How did you test it?
- [x] built
- [x] added new unit test(s)
- [x] added new functional test(s)
## What changed?
Cleaner implementation of logic involved when obtaining activity outcome
for `DescribeActivityExecution` and `PollActivityExecution`.

## Why?
Code cleanup. Suggested at
#8771 (comment)

## How did you test it?
- [x] built
- [x] covered by existing tests
## What changed?
Filled remaining ActivityExecutionInfo fields. Enabled test that was
blocked by chasm bug. Added schedule-to-start-timeout test. Test
refactor.

## Why?
We need to return all the fields for GET RPCs. Test cleanup/completion
before merge to main.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Dan Davison <dan.davison@temporal.io>
fretz12 and others added 2 commits December 19, 2025 11:01
## What changed?
Added alias to imports in activity.go

## Why?
When `make lint-code` is run locally, it is insistent on aliasing the
activity API import to `activitypb "go.temporal.io/api/activity/v1"`.
This is because the activity model
`"go.temporal.io/server/chasm/lib/activity/gen/activitypb/v1"` also has
the name. I've renamed both so lint-code won't keep modifying it.

## How did you test it?
- [X] built
- [X] run locally and tested manually
- [X] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
## What changed?
If the group by key is returned as `[]byte`, convert it to string in
`CountWorkflowExecutions` and `CountCHASMExecutions`.

## Why?
The Go driver for MySQL returns `VARCHAR` columns as `[]byte`. This is
apparently not true of our other persistence backends.

## How did you test it?
- [x] built
- [x] added new functional test(s)

## Potential risks
Could break workflow visibility queries if this change is not correct.
@dandavison dandavison marked this pull request as ready for review December 19, 2025 18:42
@dandavison dandavison requested review from a team as code owners December 19, 2025 18:42
state *activitypb.ActivityState,
parent ActivityStore,
) {
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Exported function has empty body and silently no-ops

The NewEmbeddedActivity function is exported but has an empty body. Any code calling this function to create an embedded activity would silently receive nothing - the function takes parameters (ctx, state, parent) but performs no operations and returns no value. If this is intentionally a placeholder for future implementation, it would be better to panic with "not implemented" or make it unexported to prevent accidental usage.

Fix in Cursor Fix in Web

&activitypb.HeartbeatTimeoutTask{
Attempt: a.LastAttempt.Get(ctx).GetCount(),
},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing heartbeat timeout check causes spurious timeout tasks

RecordHeartbeat unconditionally schedules a HeartbeatTimeoutTask without checking if HeartbeatTimeout is configured (greater than 0). When HeartbeatTimeout is 0 (not set), the task is scheduled for immediate execution at ctx.Now(a).Add(0), which could trigger spurious heartbeat timeout failures. This is inconsistent with the state machine code in statemachine.go which correctly checks if heartbeatTimeout > 0 before scheduling the initial heartbeat timeout task.

Fix in Cursor Fix in Web

FrontendResponse: &workflowservice.DescribeActivityExecutionResponse{},
}, nil
}
return response, err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Long-poll timeout returns nil response instead of empty response

The condition err != nil && ctx.Err() != nil fails to handle the documented PollComponent timeout behavior. According to the docstring in engine.go, PollComponent returns (nil, nil, nil) on server-imposed timeout as "an indication that the caller should continue long-polling." Since err is nil in this case, the condition is false and the handler returns (nil, nil) instead of the intended empty response. This causes callers to receive a nil FrontendResponse, potentially causing nil pointer issues downstream. The condition should check ctx.Err() != nil without requiring err != nil.

Additional Locations (1)

Fix in Cursor Fix in Web

@dandavison dandavison merged commit 5852b15 into main Dec 19, 2025
61 checks passed
@dandavison dandavison deleted the standalone-activity branch December 19, 2025 20:40
@dandavison dandavison changed the title [DO NOT MERGE] Standalone Activity Standalone Activity Dec 19, 2025
dandavison added a commit that referenced this pull request Dec 19, 2025
## What changed?
Update api protos to use api-go main

## Why?
#8875 accidentally left the proto dependency pointing at a non-main
commit in api-go.

## How did you test it?
- [x] built
- [x] covered by existing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants