Skip to content

Conversation

@davidmrdavid
Copy link
Collaborator

@davidmrdavid davidmrdavid commented Sep 5, 2023

Replaces: https://github.com/Azure/durabletask/pull/901/files
From this feedback: https://github.com/Azure/durabletask/pull/901/files#r1316325503

Currently, we throw an exception if we receive multiple ExecutionCompletedEvents. That behavior is a bit too strict because it is possible for multiple ExecutionCompletedEvents to be received at once. For example: when a termination signal arrives at the same time as an orchestrator is requested to continue-as-new.

This PR introduces two changes to this behavior:
(1) Instead of throwing an exception like we do today, we now log a warning notifying the reader that we've found multiple execution completed events and how DTFx is resolving this conflict.

(2) If orchestrator is not in the terminated state and the a new event aims to terminate it, the new event overrides the previously encountered ExecutionCompletedEvent planning to continue-as-new but a termination request is received, we transition to the "Terminated" state; this is because termination should be forceful. In all other cases, we discard the new execution completed event.

@davidmrdavid davidmrdavid requested a review from cgillum September 5, 2023 22:44
log += "Discarding new 'ExecutionCompletedEvent'.";
}

TraceHelper.Trace(TraceEventType.Warning, "OrchestrationRuntimeState-DuplicateEvent", log);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the old unstructured logging mechanism, which most users will never see. I think we need to instead find a way to use the LogHelper class here, perhaps by adding overloads to the methods that call into here which accept a LogHelper object.

Also, we'll want to be sure to use a log event that include an InstanceId field so that people filtering for events based on the instance ID can find it.

Copy link
Collaborator Author

@davidmrdavid davidmrdavid Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you mean.
My only concern with this request is that the refactoring to bring LogHelper here isn't trivial from what I can tell (e.g I can't just add a LogHelper parameter to OrchestrationRuntimeState.AddEvent() because AddEvent is public whereas LogHelper is internal), and so I worry about growing the scope of this PR.

Would you be willing to track modernizing the logging in this codepath that as a separate work item, or is this a blocker for you? If it's a blocker, I'll go ahead and starting working on it, but first I'm trying to determine my options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple, if slightly hackly, solution would be to add an internal method or constructor overload that sets a private LogHelper field. if that field is non-null, then we can write the log message. It's probably not worth blocking on this, but at a minimum we should use a TraceHelper method that accepts an orchestration instance ID parameter. Otherwise, the logging is going to be pretty unhelpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your suggestion to create an internal method that sets a LogHelper field.

"Multiple ExecutionCompletedEvent found, potential corruption in state storage");
// It's not generally expected to receive multiple execution completed events for a given orchestrator, but it's possible under certain race conditions.
// For example: when an orchestrator is signaled to terminate at the same time as it attempts to continue-as-new.
var log = $"The orchestration '{this.OrchestrationInstance?.InstanceId ?? ""}' " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message feels a little too verbose. Can we simplify it to something like $"Ignoring {completedEvent.GetType().Name} event since the orchestration is already in the {orchestrationStatus} state.".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, incorporated this in my next commit.

Comment on lines +66 to +67

public const int OrchestrationDebugTrace = 73;
Copy link
Collaborator Author

@davidmrdavid davidmrdavid Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed to me that we were missing a general "debug trace" for logs like this one, so I decided to create one. It did not seem appropriate to create a very specific new category just for this log, so instead I created a more general one. I think this is a good idea considering that, as we recently learned with the entity logs, adding new EventIDs is expensive / can take a long time to deploy in Antares, so I wanted to make sure we have access to more general logging methods.

log += "Discarding new 'ExecutionCompletedEvent'.";
}

LogHelper?.OrchestrationDebugTrace(this.OrchestrationInstance?.InstanceId ?? "", this.OrchestrationInstance?.ExecutionId ?? "", log);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find a pre-existing loghelper method that fit this case.

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// It's not generally expected to receive multiple execution completed events for a given orchestrator, but it's possible under certain race conditions.
// For example: when an orchestrator is signaled to terminate at the same time as it attempts to continue-as-new.
var log = $"Ignoring {completedEvent.GetType().Name} event since the orchestration is already in the {orchestrationStatus} state.";
var log = $"Received new {completedEvent.GetType().Name} event despite the orchestration being already in the {orchestrationStatus} state.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this log change. You're saying we received an event but you're not saying what we're doing with it. The previous log made it clear that we're ignoring the new event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgillum: I think there might be some confusion here. The log grows to include whether we decided to ignore the event later on. We don't always ignore the new event, as seen in the snippet below:

log += "Discarding previous 'ExecutionCompletedEvent' as termination is forceful.";
ExecutionCompletedEvent = completedEvent;
orchestrationStatus = completedEvent.OrchestrationStatus;
}
else
{
// otherwise, we ignore the new event.
log += "Discarding new 'ExecutionCompletedEvent'.";
}

So it's a log message that's created in two parts. First we capture that we're in a situation where we have to make a decision of which ExecutionCompletedEvent to preserve, and we include which one we decide to preserve (the new one or the previous one).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case it's fine, but I think you need to add a space between the period in the first half of the log and the start of the next sentence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in latest commit: c23d3cc

@davidmrdavid davidmrdavid merged commit 009f2e5 into main Dec 6, 2023
@superLionn
Copy link

Hi team, may I ask the official release time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants