Skip to content

Combines active ack and slot release when both are available.#4624

Merged
chetanmeh merged 13 commits intoapache:masterfrom
rabbah:completion
Sep 21, 2019
Merged

Combines active ack and slot release when both are available.#4624
chetanmeh merged 13 commits intoapache:masterfrom
rabbah:completion

Conversation

@rabbah
Copy link
Member

@rabbah rabbah commented Sep 13, 2019

Combines active ack and slot release when both are available. If completion message carries a result, process the active ack.

There are two commits in this patch. The first is semantic preserving/refactoring. The fix is in the second commit.

Closes #4614.
Closes #4636.

Description

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-io
Copy link

codecov-io commented Sep 13, 2019

Codecov Report

Merging #4624 into master will decrease coverage by 5.67%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4624      +/-   ##
==========================================
- Coverage   84.44%   78.76%   -5.68%     
==========================================
  Files         183      183              
  Lines        8306     8346      +40     
  Branches      572      571       -1     
==========================================
- Hits         7014     6574     -440     
- Misses       1292     1772     +480
Impacted Files Coverage Δ
.../scala/org/apache/openwhisk/core/entity/Exec.scala 90.62% <ø> (ø) ⬆️
...enwhisk/core/loadBalancer/CommonLoadBalancer.scala 81.08% <100%> (-0.17%) ⬇️
.../openwhisk/core/containerpool/ContainerProxy.scala 92.88% <100%> (+0.1%) ⬆️
...pache/openwhisk/core/invoker/InvokerReactive.scala 75.6% <51.85%> (-4.4%) ⬇️
.../org/apache/openwhisk/core/connector/Message.scala 73.78% <88.46%> (+8.92%) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...ore/database/cosmosdb/cache/CacheInvalidator.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.89%) ⬇️
...tabase/cosmosdb/cache/CacheInvalidatorConfig.scala 0% <0%> (-94.74%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 400a791...052dbaf. Read the comment docs.

activationId: ActivationId,
isSystemError: Boolean,
response: Either[ActivationId, WhiskActivation],
result: Boolean, // true iff the message is a combined active ack and slot released
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this almost the same as:
response: Option[Either[ActivationId,WhiskActivation]]
? I'm not sure that is more or less clear, but want to make sure I'm not missing something.
In other words - is there any case where result==false, and response==Right[WhiskActivation]?

Copy link
Member Author

@rabbah rabbah Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When result is false:

  1. Left: this is the current situation and can occur during split phase notification.
  2. Right: should not happen as this should be an active ack instead but does not change the semantics (treated the same as Left).

When result is true:
3. Right: this combines the active ack and slot release and occurs during system generate activations for failure scenarios.
4. Left: like the previous case but occurs when the message is too large to cross the event bus and the sender converts the Right to a Left.

Since there are 3 possible scenarios, Either by itself is not enough, and so I chose to encode the extra bit with result as a boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think response as option works - there is always Some value (the current case is the ActivationId). We need a tri-valued typed.

An alternative would be Either[ActivationId, Either[ActivationId, WhiskActivation] where Left represents the split phase and Right(ActivationId) and Right(WhiskActivation) represents the other two cases.

We can add a require in the constructor to also prevent the impossible case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tysonnorris I thought about this further - another option is to introduce a third type.

  1. Active Ack. (as today)
  2. Split phase completion. (what Completion message is today, "result is false" above case 1)
  3. Combined Completion. (combines active ack and split phase into one message as this PR does, "result is true" above cases 3 and 4).

There would be 1 active ack + 1 split phase (from the container proxy) and 3 combined completions otherwise (for the error cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this change cause some compatibility issue when there would be a gradual roll out of new invokers in an existing cluster.

So for we do not have any comparability contract for messages exchange via Kafka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion above, I would say that three different messages sound like the best option. From a data modelling perspective, it's asymmetric that a Right( WhiskActivation) "should not happen" in the CompletionMessage if result = false.

With the split of the active ack into result / completion ack, we somewhat started a renaming. So when introducing a concept with three messages, I suggest to further follow the new naming and use something like ResultMessage, CompletionMessage and CombinedResultCompletionMessage. To preserve consistency, I would update all comments touched by this PR and replace "active ack" with the new terms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ack method will have to change to support this. This is the type of the ack method:

  type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any]

The last parameter in this function indicates if the slot is free or not. So it is ambiguous to know inside the method if the result is combined or not from this signature alone since every call already receives a WhiskActivation.

So I'm OK with making the change to the message types (2 to 3) but note that the change will cause further refactoring in the invoker code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabbah thanks for your openness to adjust your work to our feedback.

Copy link
Member

@chetanmeh chetanmeh Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabbah As you change the current design I would also like to highlight a potential change I need for my activation persister service work (#4632)

Per my current understand the sendActiveAck flow is like

  • isSlotFree false - active ack/ResultMessage
    • blocking - WhiskActivation
    • non-blocking - NONE
  • isSlotFree true - result/CompletionMessage
    • blocking - CompletionMessage
    • non-blocking - CompletionMessage

Later I would like to have a configurable support for sending WhiskActivation for non blocking call as well to a custom topic. Just wanted to make you aware on that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh I pushed a commit to implement the new type discussed earlier. I did not see your comment earlier. Do the changes help? I think it makes clearer where split-phase acks are being used and where they aren't.

@rabbah rabbah force-pushed the completion branch 2 times, most recently from 942c0fb to 68df7c8 Compare September 14, 2019 15:30
Copy link
Member

@chetanmeh chetanmeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!. Added some minor feedback

* @param AcknowledegmentMessage the acknowledgement message to send
*/
type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any]
type ActiveAck =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor observation (or Rant!) - I always struggle in IDE whenever I want to see impl of parameters which are specified as function signature instead of trait (like in ActiveAck). Here things are bit simpler as we specify a type alias which narrows down the search. Otherwise one need to check the whole call hierarchy to understand where is. the actual impl

Having a trait enables easier checking of possible implementations to understand the code flow. This is pre existing stuff ... but may be later we refactor it and use a proper trait for such critical flows

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo rant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being a bit more extensive than at first appearances so we should defer it to a subsequent patch.

.getFields("invoker")
.headOption
.map(_ => json.convertTo[CompletionMessage])
.map(_ => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought - Instead of applying such heuristics should we also include the message type (aka name) in json. This may be useful for ActivationPersisterService where it would only be interested in 2 out of 3 types (can avoid deserialization to CompletionMessage).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code maintainability and data modelling perspective, it would be great to also serialise a type field that can be used later on to easily de-serialise the JSON. At the same time, this extension increases message size.

I guess the increased size does not really matter whenever a WhiskActivation is embedded because a type field is pretty small compared to a WhiskActivation.

I think the increased size would only matter for the CompletionMessage. Today, small JSONs for CompletionMessage would look like (146 Byte in compact form):

{"transid":"5808a97c269295220a6d8e78508f118b","activationId":"0f3763366aba46a0b763366abae6a0a4","invoker":{"instance":0,"userMemory":"16384 MB"}}

When adding a type field, we get (166 Byte in compact form):

{"type":"completion","transid":"5808a97c269295220a6d8e78508f118b","activationId":"0f3763366aba46a0b763366abae6a0a4","invoker":{"instance":0,"userMemory":"16384 MB"}}

This is a size increase of around 14% - which could be compensated by refactoring user memory handling such that it's no more serialised in the CompletionMessage - but only in pings.

So I support @chetanmeh's proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the size bit further by using {"t":1} if needed i.e. encode it as enum. However as overhead is not that big we can keep the more readable string form

refactoring user memory handling such that it's no more serialised in the CompletionMessage - but only in pings. -> "invoker":{"instance":0,"userMemory":"16384 MB"}

Would be good to have a issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered these - as well making sure the field names were unique so that we can disambiguate with one field lookup. Will think about it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not change the JSON represntation to include the type. I think it's not necessary at this point but I did simplify the implementation to something more compact.

@chetanmeh chetanmeh mentioned this pull request Sep 19, 2019
21 tasks
Copy link
Member

@sven-lange-last sven-lange-last left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing this change which improves performance for activations in error. Only some minor comments.

* @param AcknowledegmentMessage the acknowledgement message to send
*/
type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any]
type ActiveAck =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being a bit more extensive than at first appearances so we should defer it to a subsequent patch.

@rabbah rabbah removed the wip label Sep 19, 2019
@rabbah
Copy link
Member Author

rabbah commented Sep 19, 2019

@chetanmeh I refactored the active acker's signature/type.
@sven-lange-last we can't remove WhiskActivation - the tests in container proxy rely on it being available and removing it causes a number of issues in testing the proxy (specifically, the tests use non-blocking activations which only generate CompletionMessages and as a result there is no Activation to check as the tests do now.)

@rabbah
Copy link
Member Author

rabbah commented Sep 19, 2019

I removed WIP label as I think all comments are addressed at this point.

*/
type ActiveAck = (TransactionId, WhiskActivation, Boolean, ControllerInstanceId, UUID, Boolean) => Future[Any]
trait ActiveAck {
def apply(tid: TransactionId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of apply! Minimizes the impact of switch

Copy link
Member

@sven-lange-last sven-lange-last left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recent changes look good from my side. Thanks a lot for addressing all review feedback. Ready to merge from my point of view.

@rabbah rabbah added the ready label Sep 20, 2019
@chetanmeh chetanmeh merged commit fac309c into apache:master Sep 21, 2019
@chetanmeh chetanmeh removed the ready label Sep 21, 2019
@rabbah rabbah deleted the completion branch September 22, 2019 13:10
s"posted ${if (recovery) "recovery" else "completion"} of activation ${activationResult.activationId}")
// UserMetrics are sent, when the slot is free again. This ensures, that all metrics are sent.
if (UserEvents.enabled && acknowledegment.isSlotFree.nonEmpty) {
acknowledegment.result match {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is actually wrong - for the completion message, it will return None.
instead, the method should revert to using the passed in activation.

@sven-lange-last

case Failure(t) => logging.error(this, s"activation event was not sent: $t")
}
case _ =>
// all acknowledegment messages should have a result
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is also wrong - completion messages have no result.

BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
…#4624)

Combine active ack and slot release when both are available. This commit changes the types of AcknowledegmentMessage exchanged on `completedxxx` topics to 3

- CombinedCompletionAndResultMessage - Sent when the resource slot and the action result are available at the same time
- ResultMessage - Sent once an action result is available for blocking actions
- CompletionMessage - Sent once the resource slot in the invoker is free again

This would ensure that the controller can quickly cleanup resources for comleted invocation when they result in error 
(instead of performing slow db polling)
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 80.00000% with 19 lines in your changes missing coverage. Please review.

Project coverage is 78.76%. Comparing base (400a791) to head (052dbaf).
Report is 504 commits behind head on master.

Files with missing lines Patch % Lines
...pache/openwhisk/core/invoker/InvokerReactive.scala 51.85% 13 Missing ⚠️
.../org/apache/openwhisk/core/connector/Message.scala 88.46% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4624      +/-   ##
==========================================
- Coverage   84.44%   78.76%   -5.68%     
==========================================
  Files         183      183              
  Lines        8306     8346      +40     
  Branches      572      571       -1     
==========================================
- Hits         7014     6574     -440     
- Misses       1292     1772     +480     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor active ack method signature to be an interface missing active ack from invoker

6 participants