Skip to content

Move batching to CouchDbRestStore#2835

Merged
dubee merged 1 commit intoapache:masterfrom
cbickel:bulk2
Oct 30, 2017
Merged

Move batching to CouchDbRestStore#2835
dubee merged 1 commit intoapache:masterfrom
cbickel:bulk2

Conversation

@cbickel
Copy link
Contributor

@cbickel cbickel commented Oct 5, 2017

Move the mechanism of batching activations to the CouchDbRestStore, to have it also in the Controller in place.
Currently batching is only enabled for activations.

Dependant on #2812

@cbickel cbickel requested a review from rabbah October 5, 2017 10:06
response.convertTo[Seq[BulkEntityResult]].map { singleResult =>
singleResult.error
.map {
case "conflict" => Left(DocumentConflictException("conflict on 'bulk_put'"))
Copy link
Member

@dubee dubee Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is one conflicting entity in the bulk request do all the entities fail? If so, is it possible to remove that conflict and retry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's only a specific entity that fails. Behavior then is just like with any single request today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, all other entries will have no errors. So there is no need to retry, if there is only one document with a conflict.


// This the the amount of allowed parallel requests for each entity, before batching starts. If there are already maxOpenDbRequests
// and more documents need to be stored, then all arriving documents will be put into batches (if enabled) to avoid a long queue.
private val maxOpenDbRequests = system.settings.config.getInt("akka.http.host-connection-pool.max-open-requests") / 2
Copy link
Member

@dubee dubee Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is maxOpenDbRequests half of max-open-requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent fights of other put/reads vs. activation puts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the current design, we have an datastore (and a batcher) for each entity. That means in the controller we have 5 batchers. All of them are sharing the 128 connections, that are configured in akka (max-open-requests).
If the controller has to write a lot of activations now, only 64 of these connections will be used on setting maxOpenDbRequests to half of max-open-requests.
So if someone else comes and wants to invoke an action that is not in the cache, it has not to wait until all the activations are written into the database.
In the invoker that also means, that writing activations will not affect getting actions from the database.
In addition, @markusthoemmes and me did some performance meassurements and there was no big difference between using 64 or 128 requests in parallel to write away activations.

private val maxOpenDbRequests = system.settings.config.getInt("akka.http.host-connection-pool.max-open-requests") / 2

private val batcher: Batcher[JsObject, Either[ArtifactStoreException, DocInfo]] =
new Batcher(500, maxOpenDbRequests)(put(_)(TransactionId.unknown))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the batch size should be configurable instead of hardcoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it should be configurable? Do you have a scenario in mind, where you want to configure this size?
Btw: It is only the maximum allowed batch size.

When I was chatting with Cloudant developpers a while ago, we were talking about such bulk requests. The scenario was a bit different, but very similar (write documents as fast as possible into a database with _bulk). This person suggested a batch size of 500.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the value might change per deployment. Should at least add a comment stating why it's set to 500 here.


def datastore(config: WhiskConfig)(implicit system: ActorSystem, logging: Logging) =
SpiLoader.get[ArtifactStoreProvider].makeStore[WhiskActivation](config, _.dbActivations)
def datastore(config: WhiskConfig)(implicit system: ActorSystem, logging: Logging, materializer: ActorMaterializer) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not batch WhiskEntityStore as well?

private val maxOpenDbRequests = system.settings.config.getInt("akka.http.host-connection-pool.max-open-requests") / 2

private val batcher: Batcher[JsObject, Either[ArtifactStoreException, DocInfo]] =
new Batcher(500, maxOpenDbRequests)(put(_)(TransactionId.unknown))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the value might change per deployment. Should at least add a comment stating why it's set to 500 here.

@markusthoemmes
Copy link
Contributor

PG2 2248 is all good.

@markusthoemmes markusthoemmes added review Review for this PR has been requested and yet needs to be done. pgapproved Pipeline has approved this change. labels Oct 30, 2017
@dubee dubee merged commit ea11ad4 into apache:master Oct 30, 2017
@cbickel cbickel deleted the bulk2 branch November 7, 2017 07:26
houshengbo pushed a commit to houshengbo/openwhisk that referenced this pull request Nov 13, 2017
houshengbo pushed a commit to houshengbo/openwhisk that referenced this pull request Nov 13, 2017
houshengbo pushed a commit to houshengbo/openwhisk that referenced this pull request Nov 14, 2017
houshengbo pushed a commit to houshengbo/openwhisk that referenced this pull request Nov 15, 2017
houshengbo pushed a commit to houshengbo/openwhisk that referenced this pull request Nov 17, 2017
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pgapproved Pipeline has approved this change. review Review for this PR has been requested and yet needs to be done.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants