Skip to content

KAFKA-7719: Improve fairness in SocketServer processors (KIP-402)#6022

Merged
ijuma merged 4 commits intoapache:trunkfrom
rajinisivaram:KAFKA-7719-socketserver-fairness
Feb 1, 2019
Merged

KAFKA-7719: Improve fairness in SocketServer processors (KIP-402)#6022
ijuma merged 4 commits intoapache:trunkfrom
rajinisivaram:KAFKA-7719-socketserver-fairness

Conversation

@rajinisivaram
Copy link
Copy Markdown
Contributor

Limit the number of new connections processed in each iteration in SocketServer on each Processor. Block Acceptor if the connection queue is full on all Processors. Added a metric to track accept idle time percent. See KIP-402 for details.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can take the chance and change this to use string interpolation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the review. Updated all the strings in SocketServer.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. One question.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want a name with a $?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the review. It is using string interpolation, metric name is AcceptorIdlePercent. Hope I haven't missed something.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, it just looked a bit odd in the diff.

@rajinisivaram rajinisivaram force-pushed the KAFKA-7719-socketserver-fairness branch from 240b5c6 to 2d08883 Compare January 21, 2019 13:26
@rajinisivaram rajinisivaram force-pushed the KAFKA-7719-socketserver-fairness branch from 2d08883 to 82c3168 Compare January 21, 2019 15:36
Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, left a couple of minor comments.

val IdlePercentMetricName = "IdlePercent"
val NetworkProcessorMetricTag = "networkProcessor"
val ListenerMetricTag = "listener"
val SocketServerMetricsGroup = "socket-server-metrics"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the SocketServer companion object? It could then just be called MetricsGroup perhaps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the review, updated.

brokerId: Int,
connectionQuotas: ConnectionQuotas) extends AbstractServerThread(connectionQuotas) with KafkaMetricsGroup {
connectionQuotas: ConnectionQuotas,
metricPrefix: String = "") extends AbstractServerThread(connectionQuotas) with KafkaMetricsGroup {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better not to have a default for metricPrefix.

private def createControlPlaneAcceptorAndProcessor(endpointOpt: Option[EndPoint]): Unit = synchronized {
endpointOpt.foreach { endpoint =>
val controlPlaneAcceptor = createAcceptor(endpoint)
val controlPlaneAcceptor = createAcceptor(endpoint, "ControlPlane")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion: we should probably create a constant for this (in the SocketServer companion object probably), move the following two constants to the same place and use them in metric names. If we had used constants when the control plane PR was introduced, it would have made this review easier and avoids bugs where people mistype stuff. We are also hardcoding the prefix in the thread name control-plane-kafka-socket-acceptor even though we pass the constant to KafkaRequestHandlerPool. Quite inconsistent.

val DataPlanePrefix = "data-plane"
val ControlPlanePrefix = "control-plane"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

val blockedPercentMetric = blockedPercentMetrics.head.asInstanceOf[Meter]
val blockedPercent = blockedPercentMetric.meanRate
if (expectBlocked) {
assertTrue(s"Acceptor idle percent not recorded: $blockedPercent", blockedPercent > 0.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we need to fix the text here and the following line.

override def poll(timeout: Long): Unit = {
try {
if (pollBlockMs > 0)
Thread.sleep(pollBlockMs)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unfortunate that we have to do this in what we class as a unit test. Should this be an integration test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the reviews. I have rewritten the test to avoid the sleep.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @rajinisivaram, a couple more comments/questions.

config.numIoThreads, "RequestHandlerAvgIdlePercent", socketServer.DataPlanePrefix)
config.numIoThreads, s"${SocketServer.DataPlaneMetricPrefix}RequestHandlerAvgIdlePercent", SocketServer.DataPlaneThreadPrefix)

config.controlPlaneListener.foreach { _ =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this to do socketServer.controlPlaneRequestChannelOpt.foreach instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that yesterday and wasn't sure why it was that way, so had left it. Updated.

Thread.getAllStackTraces.asScala.exists { case (thread, stackTrace) =>
thread.getName.contains("kafka-socket-acceptor") &&
thread.getState == Thread.State.WAITING &&
stackTrace.toList.toString.contains("ArrayBlockingQueue")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are relying on some details that could change under us. Would the test fail if the thread name changed or if the queue implementation changed? And if so, would it output enough debugging information?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test will fail in those cases. The relevant exceptions would have been processed by Processor and logged at ERROR level. I have changed the code to use assertions containing the relevant debugging info in the main test thread to make it more obvious.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM.

@ijuma ijuma merged commit 4b29487 into apache:trunk Feb 1, 2019
gwenshap pushed a commit that referenced this pull request Mar 14, 2019
…s (KIP-402)

Adds a new listener config `max.connections` to limit the number of active connections on each listener. The config may be prefixed with listener prefix. This limit may be dynamically reconfigured without restarting the broker.

This is one of the PRs for KIP-402 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors). Note that this is currently built on top of PR #6022

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Gwen Shapira <cshapi@gmail.com>

Closes #6034 from rajinisivaram/KAFKA-7730-max-connections
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* AK/trunk:
  fix typo (apache#5150)
  MINOR: Reduce replica.fetch.backoff.ms in ReassignPartitionsClusterTest (apache#5887)
  KAFKA-7766: Fail fast PR builds (apache#6059)
  KAFKA-7798: Expose embedded clientIds (apache#6107)
  KAFKA-7641; Introduce "group.max.size" config to limit group sizes (apache#6163)
  KAFKA-7433; Introduce broker options in TopicCommand to use AdminClient (KIP-377)
  MINOR: Fix some field definitions for ListOffsetReponse (apache#6214)
  KAFKA-7873; Always seek to beginning in KafkaBasedLog (apache#6203)
  KAFKA-7719: Improve fairness in SocketServer processors (KIP-402) (apache#6022)
  MINOR: fix checkstyle suppressions for generated RPC code to work on Windows
  KAFKA-7859: Use automatic RPC generation in LeaveGroups (apache#6188)
  KAFKA-7652: Part II; Add single-point query for SessionStore and use for flushing / getter (apache#6161)
  KAFKA-3522: Add RocksDBTimestampedStore (apache#6149)
  KAFKA-3522: Replace RecordConverter with TimestampedBytesStore (apache#6204)
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…ache#6022)

Limit the number of new connections processed in each iteration of each
Processor. Block Acceptor if the connection queue is full on all Processors.
Added a metric to track accept blocked time percent. See KIP-402 for details.

Reviewers: Ismael Juma <ismael@juma.me.uk>
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…s (KIP-402)

Adds a new listener config `max.connections` to limit the number of active connections on each listener. The config may be prefixed with listener prefix. This limit may be dynamically reconfigured without restarting the broker.

This is one of the PRs for KIP-402 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors). Note that this is currently built on top of PR apache#6022

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Gwen Shapira <cshapi@gmail.com>

Closes apache#6034 from rajinisivaram/KAFKA-7730-max-connections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants