Combines active ack and slot release when both are available.#4624
Combines active ack and slot release when both are available.#4624chetanmeh merged 13 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4624 +/- ##
==========================================
- Coverage 84.44% 78.76% -5.68%
==========================================
Files 183 183
Lines 8306 8346 +40
Branches 572 571 -1
==========================================
- Hits 7014 6574 -440
- Misses 1292 1772 +480
Continue to review full report at Codecov.
|
| activationId: ActivationId, | ||
| isSystemError: Boolean, | ||
| response: Either[ActivationId, WhiskActivation], | ||
| result: Boolean, // true iff the message is a combined active ack and slot released |
There was a problem hiding this comment.
Is this almost the same as:
response: Option[Either[ActivationId,WhiskActivation]]
? I'm not sure that is more or less clear, but want to make sure I'm not missing something.
In other words - is there any case where result==false, and response==Right[WhiskActivation]?
There was a problem hiding this comment.
When result is false:
Left: this is the current situation and can occur during split phase notification.Right: should not happen as this should be an active ack instead but does not change the semantics (treated the same asLeft).
When result is true:
3. Right: this combines the active ack and slot release and occurs during system generate activations for failure scenarios.
4. Left: like the previous case but occurs when the message is too large to cross the event bus and the sender converts the Right to a Left.
Since there are 3 possible scenarios, Either by itself is not enough, and so I chose to encode the extra bit with result as a boolean.
There was a problem hiding this comment.
I don't think response as option works - there is always Some value (the current case is the ActivationId). We need a tri-valued typed.
An alternative would be Either[ActivationId, Either[ActivationId, WhiskActivation] where Left represents the split phase and Right(ActivationId) and Right(WhiskActivation) represents the other two cases.
We can add a require in the constructor to also prevent the impossible case.
There was a problem hiding this comment.
@tysonnorris I thought about this further - another option is to introduce a third type.
- Active Ack. (as today)
- Split phase completion. (what Completion message is today, "result is false" above case 1)
- Combined Completion. (combines active ack and split phase into one message as this PR does, "result is true" above cases 3 and 4).
There would be 1 active ack + 1 split phase (from the container proxy) and 3 combined completions otherwise (for the error cases).
There was a problem hiding this comment.
Would this change cause some compatibility issue when there would be a gradual roll out of new invokers in an existing cluster.
So for we do not have any comparability contract for messages exchange via Kafka
There was a problem hiding this comment.
Based on the discussion above, I would say that three different messages sound like the best option. From a data modelling perspective, it's asymmetric that a Right( WhiskActivation) "should not happen" in the CompletionMessage if result = false.
With the split of the active ack into result / completion ack, we somewhat started a renaming. So when introducing a concept with three messages, I suggest to further follow the new naming and use something like ResultMessage, CompletionMessage and CombinedResultCompletionMessage. To preserve consistency, I would update all comments touched by this PR and replace "active ack" with the new terms.
There was a problem hiding this comment.
the ack method will have to change to support this. This is the type of the ack method:
type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any]
The last parameter in this function indicates if the slot is free or not. So it is ambiguous to know inside the method if the result is combined or not from this signature alone since every call already receives a WhiskActivation.
So I'm OK with making the change to the message types (2 to 3) but note that the change will cause further refactoring in the invoker code.
There was a problem hiding this comment.
@rabbah thanks for your openness to adjust your work to our feedback.
There was a problem hiding this comment.
@rabbah As you change the current design I would also like to highlight a potential change I need for my activation persister service work (#4632)
Per my current understand the sendActiveAck flow is like
isSlotFreefalse - active ack/ResultMessage- blocking -
WhiskActivation - non-blocking - NONE
- blocking -
isSlotFreetrue - result/CompletionMessage- blocking -
CompletionMessage - non-blocking -
CompletionMessage
- blocking -
Later I would like to have a configurable support for sending WhiskActivation for non blocking call as well to a custom topic. Just wanted to make you aware on that
There was a problem hiding this comment.
@chetanmeh I pushed a commit to implement the new type discussed earlier. I did not see your comment earlier. Do the changes help? I think it makes clearer where split-phase acks are being used and where they aren't.
942c0fb to
68df7c8
Compare
common/scala/src/main/scala/org/apache/openwhisk/core/connector/Message.scala
Show resolved
Hide resolved
chetanmeh
left a comment
There was a problem hiding this comment.
Looks good!. Added some minor feedback
common/scala/src/main/scala/org/apache/openwhisk/core/connector/Message.scala
Outdated
Show resolved
Hide resolved
common/scala/src/main/scala/org/apache/openwhisk/core/connector/Message.scala
Outdated
Show resolved
Hide resolved
common/scala/src/main/scala/org/apache/openwhisk/core/connector/Message.scala
Outdated
Show resolved
Hide resolved
| * @param AcknowledegmentMessage the acknowledgement message to send | ||
| */ | ||
| type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any] | ||
| type ActiveAck = |
There was a problem hiding this comment.
Minor observation (or Rant!) - I always struggle in IDE whenever I want to see impl of parameters which are specified as function signature instead of trait (like in ActiveAck). Here things are bit simpler as we specify a type alias which narrows down the search. Otherwise one need to check the whole call hierarchy to understand where is. the actual impl
Having a trait enables easier checking of possible implementations to understand the code flow. This is pre existing stuff ... but may be later we refactor it and use a proper trait for such critical flows
There was a problem hiding this comment.
This ends up being a bit more extensive than at first appearances so we should defer it to a subsequent patch.
common/scala/src/main/scala/org/apache/openwhisk/core/connector/Message.scala
Outdated
Show resolved
Hide resolved
| .getFields("invoker") | ||
| .headOption | ||
| .map(_ => json.convertTo[CompletionMessage]) | ||
| .map(_ => { |
There was a problem hiding this comment.
Just a thought - Instead of applying such heuristics should we also include the message type (aka name) in json. This may be useful for ActivationPersisterService where it would only be interested in 2 out of 3 types (can avoid deserialization to CompletionMessage).
There was a problem hiding this comment.
From a code maintainability and data modelling perspective, it would be great to also serialise a type field that can be used later on to easily de-serialise the JSON. At the same time, this extension increases message size.
I guess the increased size does not really matter whenever a WhiskActivation is embedded because a type field is pretty small compared to a WhiskActivation.
I think the increased size would only matter for the CompletionMessage. Today, small JSONs for CompletionMessage would look like (146 Byte in compact form):
{"transid":"5808a97c269295220a6d8e78508f118b","activationId":"0f3763366aba46a0b763366abae6a0a4","invoker":{"instance":0,"userMemory":"16384 MB"}}
When adding a type field, we get (166 Byte in compact form):
{"type":"completion","transid":"5808a97c269295220a6d8e78508f118b","activationId":"0f3763366aba46a0b763366abae6a0a4","invoker":{"instance":0,"userMemory":"16384 MB"}}
This is a size increase of around 14% - which could be compensated by refactoring user memory handling such that it's no more serialised in the CompletionMessage - but only in pings.
So I support @chetanmeh's proposal.
There was a problem hiding this comment.
We can reduce the size bit further by using {"t":1} if needed i.e. encode it as enum. However as overhead is not that big we can keep the more readable string form
refactoring user memory handling such that it's no more serialised in the CompletionMessage - but only in pings. ->
"invoker":{"instance":0,"userMemory":"16384 MB"}
Would be good to have a issue for this.
There was a problem hiding this comment.
I considered these - as well making sure the field names were unique so that we can disambiguate with one field lookup. Will think about it again.
There was a problem hiding this comment.
I'd rather not change the JSON represntation to include the type. I think it's not necessary at this point but I did simplify the implementation to something more compact.
core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala
Show resolved
Hide resolved
core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/CommonLoadBalancer.scala
Show resolved
Hide resolved
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InvokerReactive.scala
Outdated
Show resolved
Hide resolved
sven-lange-last
left a comment
There was a problem hiding this comment.
Thanks a lot for providing this change which improves performance for activations in error. Only some minor comments.
| * @param AcknowledegmentMessage the acknowledgement message to send | ||
| */ | ||
| type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any] | ||
| type ActiveAck = |
There was a problem hiding this comment.
This ends up being a bit more extensive than at first appearances so we should defer it to a subsequent patch.
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InvokerReactive.scala
Show resolved
Hide resolved
This patch is semantic preserving.
|
@chetanmeh I refactored the active acker's signature/type. |
|
I removed WIP label as I think all comments are addressed at this point. |
| */ | ||
| type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any] | ||
| trait ActiveAck { | ||
| def apply(tid: TransactionId, |
There was a problem hiding this comment.
Nice use of apply! Minimizes the impact of switch
sven-lange-last
left a comment
There was a problem hiding this comment.
Recent changes look good from my side. Thanks a lot for addressing all review feedback. Ready to merge from my point of view.
| s"posted ${if (recovery) "recovery" else "completion"} of activation ${activationResult.activationId}") | ||
| // UserMetrics are sent, when the slot is free again. This ensures, that all metrics are sent. | ||
| if (UserEvents.enabled && acknowledegment.isSlotFree.nonEmpty) { | ||
| acknowledegment.result match { |
There was a problem hiding this comment.
this change is actually wrong - for the completion message, it will return None.
instead, the method should revert to using the passed in activation.
| case Failure(t) => logging.error(this, s"activation event was not sent: $t") | ||
| } | ||
| case _ => | ||
| // all acknowledegment messages should have a result |
There was a problem hiding this comment.
this comment is also wrong - completion messages have no result.
…#4624) Combine active ack and slot release when both are available. This commit changes the types of AcknowledegmentMessage exchanged on `completedxxx` topics to 3 - CombinedCompletionAndResultMessage - Sent when the resource slot and the action result are available at the same time - ResultMessage - Sent once an action result is available for blocking actions - CompletionMessage - Sent once the resource slot in the invoker is free again This would ensure that the controller can quickly cleanup resources for comleted invocation when they result in error (instead of performing slow db polling)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4624 +/- ##
==========================================
- Coverage 84.44% 78.76% -5.68%
==========================================
Files 183 183
Lines 8306 8346 +40
Branches 572 571 -1
==========================================
- Hits 7014 6574 -440
- Misses 1292 1772 +480 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Combines active ack and slot release when both are available. If completion message carries a result, process the active ack.
There are two commits in this patch. The first is semantic preserving/refactoring. The fix is in the second commit.
Closes #4614.
Closes #4636.
Description
Related issue and scope
My changes affect the following components
Types of changes
Checklist: