Bring entity logic into DurableTask.Core#862
Bring entity logic into DurableTask.Core#862sebastianburckhardt wants to merge 8 commits intomainfrom
Conversation
cgillum
left a comment
There was a problem hiding this comment.
This is exciting to see! I haven't reviewed everything, but have a few initial thoughts below.
| int taskId = completedEvent.TaskScheduledId; | ||
| if (this.openTasks.ContainsKey(taskId)) | ||
| if (this.openTasks.TryGetValue(taskId, out var taskInfo) | ||
| && taskInfo is ActivityTaskInfo info) |
There was a problem hiding this comment.
If TryGetValue succeeds but the cast to ActivityTaskInfo fails, should we raise a non-determinism exception? Previously the code would have tried to play the task anyways, but now it would (incorrectly) log this as a duplicate TaskCompleted event.
There was a problem hiding this comment.
Raising a non-determinism exception would be my preferred solution here.
| try | ||
| { | ||
| var type = Type.GetType(failedEvent.Error, true); | ||
| var deserializedException = (Exception)this.ErrorDataConverter.Deserialize(failedEvent.Result, type); |
There was a problem hiding this comment.
We have two mechanisms for handling exceptions depending on the value of this.ErrorPropagationMode. What you have here works for the original mode but it turns out that in practice serializing and deserializing exceptions can be very problematic since many exception types aren't actually serializeable. For the newer out-of-proc languages (Java and .NET Isoalted) the intent is to use ErrorPropagationMode.UseFailureDetails, where exceptions/errors are converted into a specific runtime-neutral data contract. It would be great if entities could support that as well.
There was a problem hiding this comment.
I am well aware of this issue, and there is a mechanism to handle it automatically: note that the exception is being caught (both when trying to serialize and when trying to deserialize exceptions), with a fallback that propagates the error in a runtime-neutral data contract.
Essentially, what this mechanism does is to first try ErrorPropagationMode.SerializeExceptions, and if that does not work, it resorts to ErrorPropagationMode.UserFailureDetails. The advantage of this automatic fallback is that even if the user knows nothing about this configuration options, they still get reasonable behaviors in all situations.
I understand that it would be nicer to use exactly the same error propagation mechanism and configuration across entities, activities, and suborchestrations; but I had this automatic fallback-mechanism already in place for entities, before the ErrorPropagationMode feature was introduced for activities and suborchestrations, so that is why we have two different mechanisms.
Regardless of what we end up deciding, this PR is purposefully designed to make zero changes to the existing entity semantics so I think changing the errorPropagationMode behavior for entities and/or activities+suborchestrations would need to be a separate PR.
There was a problem hiding this comment.
I only just realize that this code path is never even exercised when running code via the old C# SDK, so it is not a problem at all to copy all the existing conventions.
| /// Gets the operation error (see <see cref="OperationResult.Error"/>) | ||
| /// </summary> | ||
| [DataMember] | ||
| public string? Error { get; private set; } |
There was a problem hiding this comment.
Consider using FailureDetails for error information, which is language agnostic and doesn't rely on error-prone exception serialization. We've also added this property to TaskFailedEvent.
There was a problem hiding this comment.
Error is already a language-agnostic encoding of the error as a string. The language SDK is solely responsible for interpreting the contents. For example, the Python OOProc runtime directly reads and produces these Error strings (I don't even know what it puts in there).
| /// <summary> | ||
| /// The serialized content of the original exception | ||
| /// </summary> | ||
| public string ExceptionContent { get; set; } |
There was a problem hiding this comment.
Consider using FailureDetails, which we've also added to the TaskFailureException exception class.
There was a problem hiding this comment.
I now realize that none of this code actually should be here. The handling of errors is an SDK-specific choice, and this PR does not contain a C# SDK. I will try to remove it. This also means it will be easy to change this method to whatever else we want for other SDKs.
| public override LogLevel Level => LogLevel.Debug; | ||
|
|
||
| protected override string CreateLogMessage() => | ||
| $"{this.InstanceId}: executing batch of {this.OperationCount} operations on state {this.EntityState}."; |
There was a problem hiding this comment.
Could EntityState contain sensitive information, like application state? If so, we should consider not logging it. We recently had the security team reach out to us when potentially sensitive information was discovered in our Kusto logs.
There was a problem hiding this comment.
Yes, it could contain sensitive information. In terms of logging, I would put it into the same category as inputs and outputs to activities and suborchestrations, and I will change it to use the same convention for those (which is to only print the size).
Note that already, this is printing the entity state in Debug build only. However, relying on customers to not run Debug build is probably not a good idea, so I will still change it.
davidmrdavid
left a comment
There was a problem hiding this comment.
Slowly making my way through this. I wasn't sure if you opened this for some early reviews, please let me know if that was not the case.
Left some tiny nitpicks. I also took a look at the sorter behavior and was a tad confused, so I will probably need to give that more detailed pass. In the meantime, any extra comments on that component would help my review. Thanks!
| /// Reducing this number can help to avoid timeouts on consumption plans. If set to 1, batching is disabled, and each operation | ||
| /// message executes and is billed as a separate function invocation. |
There was a problem hiding this comment.
nitpick: I suppose we probably shouldn't include references to the Functions service in DurableTask, right? Can we either generalize this remark, or somehow move it to the Extension layer?
| namespace DurableTask.Core.Entities.OperationFormat | ||
| { | ||
| /// <summary> | ||
| /// Enumeration of orchestrator actions. |
There was a problem hiding this comment.
should this be "entity" actions?
51ff766 to
62e2bb8
Compare
jviau
left a comment
There was a problem hiding this comment.
Some first round of comments as I am still working through the PR as it has a lot to it.
My most immediate concern is the dependency on Newtonsoft.Json here. This would be the first instance of relying on a specific serialization library in the DurableTask.Core. Are we okay with that? Can we remove that dependency? How will it behave with different IOrchestrationService implementations, as well as when customers change their serialization approach?
| /// <param name="input">The serialized input for the operation.</param> | ||
| /// <param name="scheduledTimeUtc">The time to schedule this signal, or null if not a scheduled signal</param> | ||
| /// <returns></returns> | ||
| public static (string eventName, object eventContent) EmitOperationSignal(Guid requestId, string operationName, string input, (DateTime original, DateTime capped)? scheduledTimeUtc) |
There was a problem hiding this comment.
Consider using record class or structs. I don't recommend using value tuples in public APIs as they are harder to adjust down the line without breaking changes.
Although not sure if records are available to us - support can be added via including some internal types. Otherwise, recommend just a class or struct then.
| /// <summary> | ||
| /// The name for this class of entities. | ||
| /// </summary> | ||
| [JsonProperty(PropertyName = "name", Required = Required.Always)] |
There was a problem hiding this comment.
DurableTask.Core has a convention of not taking on a dependency on any single serialization library, instead using data contract annotations. Can we do that here as well?
There was a problem hiding this comment.
I think I already did that in all cases where I am defining new classes.
However, some of the classes (including this one) need to remain compatible with existing task hubs (when DF users upgrade to a newer version of DF that depends on the DT-entity implementation). This is also why the Json field names do not always match the original field name. It is designed to be compatible. I am not sure if this is possible to achieve with generic serialization.
There was a problem hiding this comment.
I think Jacob is asking if you can use [DataMember(...)] instead of [JsonProperty(...)].
There was a problem hiding this comment.
Yes, I agree it makes sense to use the DataContract set of attributes to control serialization /deserialization. I believe Newtonsoft.Json does honor those so it should just work (fingers crossed).
Removing the dependency on Newtonsoft.Json altoghether would be a lot more work. Note that there is a bunch of other code in DurableTask.Core that depends on it already, so we are not introducing a new dependency. Also, I am never using Newtonsoft to serialize/deserialize application-defined data or exceptions; for those, we always use the DataConverter abstraction that is customizable by the user.
| /// </summary> | ||
| /// <param name="instanceId">The instance ID.</param> | ||
| /// <returns>the corresponding entity ID.</returns> | ||
| public static EntityId GetEntityIdFromInstanceId(string instanceId) |
There was a problem hiding this comment.
small nit: consider naming this Parse or FromString, as those names are the convention for this scenario.
There was a problem hiding this comment.
Makes sense. Some of the methods in this section seem superfluous and more complicated than necessary. I will use just ToString and FromString.
| /// </summary> | ||
| /// <param name="entityId">The entity ID.</param> | ||
| /// <returns>The corresponding instance ID.</returns> | ||
| public static string GetInstanceIdFromEntityId(EntityId entityId) |
There was a problem hiding this comment.
can this be an instance method? entityId.GetInstanceId()
There was a problem hiding this comment.
This is now redundant with the ToString() so I will remove it.
| /// <summary> | ||
| /// A unique identifier for an entity, consisting of entity name and entity key. | ||
| /// </summary> | ||
| public struct EntityId : IEquatable<EntityId>, IComparable |
There was a problem hiding this comment.
consider marking this struct readonly (and changing schedulerId to readonly as well). This yields performance benefits for structs as it avoids defensive copies.
There was a problem hiding this comment.
Sounds good. Also, I will remove schedulerId altogether. This was meant to be a caching optimization but I think the value is dubious.
| throw new ArgumentNullException(nameof(entityName), "Invalid entity id: entity name must not be a null or empty string."); | ||
| } | ||
|
|
||
| this.EntityName = entityName.ToLowerInvariant(); |
There was a problem hiding this comment.
Is ToLowerInvariant() necessary? I can't recall if we force instance ID casing in this way for orchestrations. Whatever we do for orchestrations, I'd prefer if we can keep this consistent with it.
There was a problem hiding this comment.
I am not a big fan of this either. The reason it is there is because it made sense for DF. This is because the entity name corresponds not to the instance id, but the function name, which must be case-insensitive for Durable Functions. I agree that it makes sense to remove ToLowerInvariant when using DTFx, where there is no reason to hard-code case-insensitivity for the names (entity names are managed by a INameVersionManager<TaskEntity>)
| /// <returns>the corresponding entity ID.</returns> | ||
| public static EntityId GetEntityIdFromInstanceId(string instanceId) | ||
| { | ||
| var pos = instanceId.IndexOf('@', 1); |
There was a problem hiding this comment.
Consider checking if pos == -1 and throwing an argument exception.
| using Microsoft.WindowsAzure.Storage.Table; | ||
| using System.Runtime.Serialization; | ||
| using System.Threading.Tasks; | ||
| using DurableTask.Core.Entities; |
src/DurableTask.Core/Common/Utils.cs
Outdated
| } | ||
|
|
||
| byte[] hashByteArray; | ||
| using (HashAlgorithm hashAlgorithm = (HashAlgorithm)SHA1.Create()) |
There was a problem hiding this comment.
Is SHA1 consider FIPS-compliant? We had a problem using MD5 in the past for this exact purpose because of FIPS compliance violations: #637.
There was a problem hiding this comment.
Interesting. Never heard about FIPS, but I get your point. In this situation, there is no need for cryptographic quality of the hash. So yes, there should be no problem with replacing the cryptographically-insecure MD5 with an even less secure (and possibly less vetted in terms of statistic qualities) hash algorithm. :)
(cherry picked from commit ca50451425895f79d6def07eb61ccf0608294fb2)
| /// <summary> | ||
| /// Interface for objects that provide entity backend information. | ||
| /// </summary> | ||
| public interface IInformationProvider |
There was a problem hiding this comment.
nit: since this is a public contract we expect others to implement, please make it top-level interface.
There was a problem hiding this comment.
Makes sense. I actually already changed this while revising the backend interface.
| /// <param name="orchestrationService">The orchestration service.</param> | ||
| /// <param name="entityBackendInformation">The options that the provider specifies.</param> | ||
| /// <returns>The entity options</returns> | ||
| public static bool BackendSupportsEntities(IOrchestrationService orchestrationService, out EntityBackendInformation entityBackendInformation) |
There was a problem hiding this comment.
optional suggestion: consider an extension method on IOrchestrationService, so you would be able to call IOrchestrationService.TryGetEntityBackendInformation(out EntityBackendInformation entityBackendInformation)
There was a problem hiding this comment.
While revising the backend interface I have removed this method altogether. There is now an interface IEntityOrchestrationService which is implemented by the backends that support entities.
| /// <param name="orchestrationService">The orchestration service.</param> | ||
| /// <param name="entityBackendInformation">The options that the provider specifies.</param> | ||
| /// <returns>The entity options</returns> | ||
| public static bool BackendSupportsEntities(IOrchestrationService orchestrationService, out EntityBackendInformation entityBackendInformation) |
There was a problem hiding this comment.
nit: please name this TryGet* to align with other bool / out param methods.
There was a problem hiding this comment.
See above, method is no longer there.
| entityBackendInformation = optionsProvider.GetEntityBackendInformation(); | ||
| return true; | ||
| } | ||
| else |
There was a problem hiding this comment.
tiny nit: technically do not need the else statement.
There was a problem hiding this comment.
See above.
| /// <summary> | ||
| /// Information about backend entity support, or null if the configured backend does not support entities. | ||
| /// </summary> | ||
| internal EntityBackendInformation EntityBackendInformation { get; set; } |
There was a problem hiding this comment.
Does this need to be on OrchestrationContext, or can it be on only TaskOrchestrationContext?
There was a problem hiding this comment.
It is called on an OrchestrationContext in at least one place. I could add a cast, but it does not seem preferrable to me.
| /// and for which there is not currently an operation call pending. | ||
| /// </summary> | ||
| /// <returns>An enumeration of all the currently available entities.</returns> | ||
| public abstract IEnumerable<EntityId> GetAvailableEntities(); |
There was a problem hiding this comment.
I am not sure if I am sold on the value of this. The caller knows what entities they passed to LockEntitiesAsync, so they theoretically have access to this already. If we must have this, can we make LockEntitiesAsync return a disposable struct which also contains the available entities on it?
There was a problem hiding this comment.
The list is not just which entities you locked, but the list of entities you locked MINUS all the entities to which you currently have calls pending. This changes over time. Tracking those changes on your own is kind of complicated, so I made it possible to look at the internal list (which is very easy).
| /// The custom status, if any, of the orchestrator. | ||
| /// </summary> | ||
| [JsonProperty("customStatus")] | ||
| [DataMember(Name = "customStatus")] |
There was a problem hiding this comment.
did not realize we had the Newtonsoft.Json properties here. I don't recommend changing this, who knows what it will impact.
There was a problem hiding this comment.
yep, that was by accident. I got overzealous changing all the occurrences and did not realize it. Will change this back.
| /// <summary> | ||
| /// Client used to manage and query entity instances | ||
| /// </summary> | ||
| public sealed class TaskHubEntityClient |
There was a problem hiding this comment.
I don't recommend sealing this. Infact go the opposite direction, make sure it is easily mockable for unit tests.
There was a problem hiding this comment.
This was copy-paste from TaskHubClient. I don't know what their reasoning is. That's why I kept it the same.
| /// <param name="operationInput">The input for the operation.</param> | ||
| /// <exception cref="EntityLockingRulesViolationException">if the orchestration is inside a critical section and the lock for this entity is not available.</exception> | ||
| /// <returns>A task representing the result of the operation.</returns> | ||
| public abstract Task<TResult> CallEntityAsync<TResult>(Entities.EntityId entityId, string operationName, object operationInput = null); |
There was a problem hiding this comment.
This is a breaking change adding abstract members to a class.
https://learn.microsoft.com/en-us/dotnet/core/compatibility/library-change-rules
There was a problem hiding this comment.
yep, needs to have an implementation. I marked this as virtual and added throwing of NotImplementedException.
|
I have pushed a commit which addresses all the feedback given so far.
|
cgillum
left a comment
There was a problem hiding this comment.
I'm much happier with the recently refactoring to simplify the TaskOrchestrationDispatcher changes. Thank you!
I haven't done a full in-depth pass over this yet but took a look at a few areas of interest and left some comments.
We talked about whether we can avoid some of the public surface area debate by making some of the new public surface area internal, or move it into test code. Do you still think that this will be practical? I guess there are at least two categories of "public" APIs: the "TaskHub" APIs for managing orchestrations/entities and the orchestration context APIs for invoking entities from orchestrations.
| /// <summary> | ||
| /// Entity processing characteristics that are controlled by the backend provider, i.e. the orchestration service. | ||
| /// </summary> | ||
| public class EntityBackendInformation |
There was a problem hiding this comment.
For better consistency with existing .NET conventions, consider using Options or Settings rather than Information as the suffix.
There was a problem hiding this comment.
I think the purpose of this class is a bit different from what Options and Settings are typically used for, so I would prefer to not call it that. It is used to describe properties of the backend, not to configure them. For example, whether a backend supports ordered message delivery is not really an option or a setting - it is true for Netherite and MSSQL but false for the Azure Storage provider. There are some properties that are also options, but not all of them, and they cannot be modified in this class. Maybe EntityBackendProperties would be a better name?
There was a problem hiding this comment.
I like Properties better. I reacted to Information partly because “information” sounds like something immutable, but this class contains mutable properties.
There was a problem hiding this comment.
Thanks. I will make this change.
| var pos = instanceId.IndexOf('@', 1); | ||
| if (pos <= 0) | ||
| { | ||
| throw new ArgumentException("instanceId is not a valid entityId", nameof(instanceId)); |
There was a problem hiding this comment.
Consider including the actual instance ID value in the exception to make it easier to debug if we ever encounter this.
| throw new ArgumentException("instanceId is not a valid entityId", nameof(instanceId)); | |
| throw new ArgumentException($"Instance ID '{instanceId}' is not a valid entity ID.", nameof(instanceId)); |
There was a problem hiding this comment.
Yes, makes sense. I will also add a check that the first character should be '@'.
| /// Exception used to describe various issues encountered by the entity scheduler. | ||
| /// </summary> | ||
| [Serializable] | ||
| public class EntitySchedulerException : Exception |
There was a problem hiding this comment.
I think Jacob's comment might be more about whether we need a new EntitySchedulerException class or whether we can use an existing exception class. I don't personally have a preference either way, though I can't immediately think of an existing exception type that would be an obvious choice to use instead.
| /// <summary> | ||
| /// The name of the event. | ||
| /// </summary> | ||
| public readonly string EventName; |
There was a problem hiding this comment.
I did a quick search on this and readonly structs are allowed to have properties. The caveat is that they must be read-only properties. More info here: https://www.educative.io/answers/what-are-readonly-structs-in-c-sharp. I would prefer we use read-only properties instead of read-only fields since that's the standard convention in C#.
| } | ||
| } | ||
|
|
||
| public override Task<TResult> CallEntityAsync<TResult>(EntityId entityId, string operationName, object input = null) |
There was a problem hiding this comment.
Do we need these new public APIs on TaskOrchestrationContext in order to support entities in .NET Isolated? As part of our public surface area discussion, I wonder if we can defer adding these, or hide them behind an interface until we feel ready to expose this functionality publicly in DTFx.
There was a problem hiding this comment.
see comment below. After addressing all the current feedback, I will create a separate PR that does not contain user-facing public SDK for entities.
|
This PR originally had no end-user-facing entity API - i.e. it was exposing only the functionality needed by the DF extension, not intended for direct consumption by DTFx end-users. Copying the user-facing entity API that we have in DF so it is available in its full glory inside DTFx was actually a lot of extra work which is one of the reasons this PR is so big (and it is not even complete, i.e. still missing the dispatch and proxy features). But perhaps it is something that can be delayed for the sake of making progress on the highest priority, which is getting the entity SDK into isolated. @cgillum / @jviau / @davidmrdavid , would it help your reviewing process if I create a second PR by copying this one and removing the DTFx user-facing entity SDK and the tests altogether? So it would contain only the basic entity-mechanics. We haven't discussed those much of that code in this PR yet, so we would not lose much of our discussion here - we can continue using this PR for discussing the user-facing DTFx entity API. We could possibly merge that second PR before this one, which would also unblock the follow-up work items. |
cgillum
left a comment
There was a problem hiding this comment.
A few more comments based on my reading of the most recent commit.
| } | ||
| var pos = instanceId.IndexOf('@', 1); | ||
| if (pos <= 0) | ||
| if ( pos <= 0 || instanceId[0] != '@') |
There was a problem hiding this comment.
nit: we should probably update line 71 above to check for string.IsNullOrEmpty(instanceId) to avoid the potential of index-out-of-range exceptions.
There was a problem hiding this comment.
Makes sense
| this.LockSet = message.LockSet; | ||
| this.Position = message.Position; | ||
|
|
||
| if (message.LockSet != null) |
There was a problem hiding this comment.
Rather than taking a new dependency on a JSON library in this code, I would prefer we simply do the following:
if (message.LockSet != null)
{
this.LockSet = string.Join(",", message.LockSet.Select(id => id.ToString());
}This should also be more efficient.
There was a problem hiding this comment.
I am fine with that. It may not always be parseable (e.g. if the entity ID contains commas) but that is not likely to be a major problem.
| if (orchestrationService is IEntityOrchestrationService entityOrchestrationService) | ||
| { | ||
| this.entityOrchestrationService = entityOrchestrationService; | ||
| entityOrchestrationService.ProcessEntitiesSeparately(); |
There was a problem hiding this comment.
I need to look at this more closely, but I'm wondering why we can't simply infer the desire to process entities separately by just checking if orchestrationService is an IEntityOrchestrationService in other parts of the code. It would be great if that would lead to fewer flags to keep track of.
There was a problem hiding this comment.
FYI, the reason why I call ProcessEntitiesSeparately is because I do not want to process entities separately if the backend is being called by something that does not know about this difference (I did not want to break clients of IOrchestrationService that are unaware of entities and expect things to work as they did before). Maybe that is not a real concern.
This sounds like a great idea to me. The new public surface area changes look a bit intimidating, so separating out that part would make it easier to make progress on cementing the runtime changes. |
|
The reduced PR is now here: #887 For further comments please use that PR. Also, I will address comments in this PR that have not been resolved yet there. |
|
We are no longer actively pursuing this direction (adding entities to the DurableTask SDK). |
Brings the basic logic (protocols and dispatching) used for durable entities into DurableTask.Core.
OperationBatchRequest, result isOperationBatchResultPlaces that require further consideration are marked
\\ DRAFT. This includes tracing. We need to consider how much tracing we want in DurableTask.Core (vs. tracing in the extension that is already in place).