Skip to content

Conversation

@dandavison
Copy link
Contributor

@dandavison dandavison commented Oct 29, 2025

What changed?

  • Add history.ChasmNotifier for subscribing to CHASM execution state transitions
  • Implement chasm.PollComponent
  • Add PollActivityExecution API handler

Why?

  • Needed for standalone activity
  • Needed for other situations where we long-poll a CHASM execution/entity

How did you test it?

  • built
  • run locally and tested manually (TODO: we are set up to do this with the Python prototype)
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Note

Implements long‑polling for CHASM components and standalone activities via new PollComponent API, PollActivityExecution endpoint, and an execution notifier, with validation, ref handling, and tests.

  • CHASM Core:
    • Add Engine.PollComponent (with NotifyExecution) and helper ExecutionStateChanged; extend Context with structuredRef.
    • Introduce ChasmNotifier for execution-level subscriptions; wire into history engine; emit notifications on CHASM mutations.
    • Tighten ref deserialization with ErrMalformedComponentRef/ErrInvalidComponentRef and validation; expose structuredRef in tree.
  • Activity:
    • Implement PollActivityExecution frontend and history handler using PollComponent; build ActivityExecutionInfo and response assembly.
    • Add dynamic config for long-poll (LongPollTimeout, LongPollBuffer); provide via FX modules.
  • APIs/Protos:
    • Define PollActivityExecution{Request,Response} messages and service RPC; generate client/grpc helpers.
  • Tests:
    • Unit: notifier subscribe/notify, engine PollComponent (no-wait, wait, stale).
    • Functional: standalone activity polling (no-wait, wait any state change, deadline behavior, invalid args, not found); start-to-close timeout path.
  • Mocks/Interfaces:
    • Update engine mocks for new PollComponent signature and NotifyExecution; history engine interface adds NotifyChasmExecution.

Written by Cursor Bugbot for commit 1d59e66. This will update automatically on new commits. Configure here.

Base automatically changed from update-protos to standalone-activity October 30, 2025 20:25
@dandavison dandavison force-pushed the poll-component branch 10 times, most recently from 3281080 to 9c65e47 Compare November 11, 2025 03:18
@dandavison dandavison force-pushed the poll-component branch 11 times, most recently from ce95cbe to 29a0bd9 Compare November 12, 2025 18:32
@dandavison dandavison marked this pull request as ready for review November 12, 2025 18:51
@dandavison dandavison requested review from a team as code owners November 12, 2025 18:51
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the tests very closely.
There are still come open comments, please address before merging but I do not feel like I need another pass here.

Comment on lines 83 to 84
// TODO(dan): include execution key in error message; we may do this at the CHASM
// framework level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't think this is needed. Not blocking the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the comment

Comment on lines +113 to +115
if len(token) == 0 {
return chasm.ReadComponent(ctx, ref, (*Activity).buildPollActivityExecutionResponse, req, nil)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more like an invalid argument to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we will address when we split the API

type (
// ChasmNotifier allows subscribers to receive notifications relating to a CHASM execution.
ChasmNotifier struct {
executions map[chasm.EntityKey]*subscriptionTracker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but maybe put a TODO here to use the sharded map which will guarantee less lock contention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and internally tracking need for an audit of TODOs in code

e.eventNotifier.NotifyNewHistoryEvent(notification)
}

func (e *historyEngineImpl) ChasmEngine() chasm.Engine {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called anywhere? Maybe I missed it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed

)

var (
defaultInput = &commonpb.Payloads{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this before I believe, use payloads.EncodeString() from common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (not specifically related to this PR so it may conflict, but good to do now in case we forget)


func (s *standaloneActivityTestSuite) SetupSuite() {
s.FunctionalTestBase.SetupSuite()
s.tv = testvars.New(s.T())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be done in SetupTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
input := createDefaultInput()
taskQueue := uuid.New().String()
taskQueue := testcore.RandomizeStr(t.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're already using testvars, might as well fully use it. IMHO that utility doesn't give us much but leaving it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched them over

dandavison and others added 4 commits December 4, 2025 19:45
Co-authored-by: Roey Berman <roey@temporal.io>
Co-authored-by: Roey Berman <roey@temporal.io>
Co-authored-by: Roey Berman <roey@temporal.io>
Co-authored-by: Roey Berman <roey@temporal.io>
Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to change all EntityKey -> ExecutionKey

chasm/ref.go Outdated
}
var pRef persistencespb.ChasmComponentRef
if err := pRef.Unmarshal(data); err != nil {
return ComponentRef{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrMalformedComponentRef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

chasm/ref.go Outdated
Comment on lines 11 to 15
// ErrMalformedComponentRef is returned when component ref bytes cannot be deserialized.
var ErrMalformedComponentRef = errors.New("malformed component ref")

// ErrInvalidComponentRef is returned when component ref bytes deserialize to an invalid component ref.
var ErrInvalidComponentRef = errors.New("invalid component ref")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd return invalidRequest here unless we are sure all api handlers have proper error conversion logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, done.


NotifyNewHistoryEvent(event *events.Notification)
NotifyNewTasks(tasks map[tasks.Category][]tasks.Task)
ChasmEngine() chasm.Engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this one required/used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, thanks!

// Notify for current workflow if it has CHASM updates
if len(currentWorkflowMutation.UpsertChasmNodes) > 0 ||
len(currentWorkflowMutation.DeleteChasmNodes) > 0 {
engine.NotifyChasmExecution(chasm.EntityKey{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it in ConflictResolveExecution as well. Create execution is probably fine, I guess there won't be any poller before execution is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I haven't think through if it needs to be in OperationPossiblySucceeded. Can you elaborate your thoughts a bit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's address this in one more PR to target standalone-activity. It will all arrive in main at the same time.

if ref != nil {
return ref, nil
}
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have some tail room here to return an empty response and avoid a timeout error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design that @bergundy and I have settled on, the caller sets the tail room. See PollActivityExecution in chasm/lib/activity/handler.go.

// behind the requested reference. However, getExecutionLease does not currently guarantee that
// execution VT >= ref VT, therefore we call IsStale() again here and return any error (which at
// this point must be ErrStaleState; ErrStaleReference has already been eliminated).
err := chasmTree.IsStale(ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already checked in getExecutionLease? or it's for fixing the bug we discussed before that getExecutionLease needs to do another stale check after reload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right. We can remove this when that bug is fixed.

@dandavison
Copy link
Contributor Author

Also need to change all EntityKey -> ExecutionKey

Right, we will do that shortly when we merge/rebase. It will be nice to have that done.

req *activitypb.PollActivityExecutionRequest,
) (*activitypb.PollActivityExecutionResponse, bool, error) {
// TODO(dan): check for terminal activity states
panic("pollActivityExecutionWaitCompletion is not implemented")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Panic in WaitCompletion handler crashes server

The PollActivityExecution handler contains a panic("pollActivityExecutionWaitCompletion is not implemented") statement in the WaitCompletion case branch. If a user sends a PollActivityExecutionRequest with a WaitCompletion wait policy, this will crash the server. This panic should be replaced with returning a proper error like serviceerror.NewUnimplemented("WaitCompletion is not yet implemented") to avoid server crashes.

Fix in Cursor Fix in Web

waitPolicy := req.GetFrontendRequest().GetWaitPolicy()

if waitPolicy == nil {
return chasm.ReadComponent(ctx, ref, (*Activity).buildPollActivityExecutionResponse, req, nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Deferred error transformation bypassed on early returns

The deferred function at lines 80-85 transforms NotFound errors into a user-friendly message by modifying the named return variable err. However, the direct return chasm.ReadComponent(...) statements on lines 90 and 109 bypass the named return variable entirely. In Go, when using return expr1, expr2 with named returns, the expressions go directly to the caller without updating the named variables. This means NotFound errors from the waitPolicy == nil and len(token) == 0 code paths won't be transformed to "activity execution not found".

Additional Locations (1)

Fix in Cursor Fix in Web

@dandavison dandavison changed the title chasm.PollComponent and PollActivityExecution PollComponent and PollActivityExecution Dec 5, 2025
ref, err := DeserializeComponentRef(refBytes)
if err != nil {
return false, ErrMalformedComponentRef
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: ExecutionStateChanged always returns ErrMalformedComponentRef discarding original error type

The function ExecutionStateChanged unconditionally returns ErrMalformedComponentRef when DeserializeComponentRef fails, but DeserializeComponentRef can return either ErrMalformedComponentRef or ErrInvalidComponentRef (for empty data or missing fields). The function's doc comment claims it "may return ErrInvalidComponentRef or ErrMalformedComponentRef" but the implementation always substitutes ErrMalformedComponentRef on deserialization error. The original error should be returned directly (return false, err) instead of always returning ErrMalformedComponentRef.

Fix in Cursor Fix in Web

@dandavison dandavison merged commit ce5d186 into standalone-activity Dec 5, 2025
50 of 52 checks passed
@dandavison dandavison deleted the poll-component branch December 5, 2025 04:08
dandavison added a commit that referenced this pull request Dec 19, 2025
- Add `history.ChasmNotifier` for subscribing to CHASM execution state transitions
- Implement `chasm.PollComponent`
- Add `PollActivityExecution` API handler

- Needed for standalone activity
- Needed for long-poll of other CHASM archetypes

- [x] built
- [x] added new unit test(s)
- [x] added new functional test(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants