Skip to content

Conversation

@jojochuang
Copy link
Contributor

@jojochuang jojochuang commented Feb 29, 2024

What changes were proposed in this pull request?

Add a new freon tool similar to ozone freon om-echo that can be used to benchmark client to DataNode route-trip latency.

Usage:
ozone freon dn-echo --container-id=1 -n 100000 -t 32 --payload-req=0 --payload-resp=1024

This tool requires SCM superuser privilege.

It sends an echo request and receives the response to DataNodes associated with container specified with --containerID, repeat 100000 times, using 32 threads. The request has a 0KB payload and the response has 1024KB payload.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10442

How was this patch tested?

Two unit tests, also tested in a real cluster.

@kerneltime
Copy link
Contributor

cc @tanvipenumudy @duongkame

@kerneltime
Copy link
Contributor

Probably an additional capability can be a delay in response, so the Datanode will wait for the configured duration before sending a response.

@kerneltime kerneltime closed this Feb 29, 2024
@kerneltime kerneltime reopened this Feb 29, 2024
@jojochuang
Copy link
Contributor Author

I was able to do 11k echos per second using one thread, no payload in request/response:

sudo -u hdfs ozone freon dn-echo --containerID=1 -n 100000

         count = 100000
     mean rate = 11068.82 calls/second
 1-minute rate = 8345.00 calls/second
 5-minute rate = 8345.00 calls/second
15-minute rate = 8345.00 calls/second
           min = 0.30 milliseconds
           max = 17.54 milliseconds
          mean = 0.81 milliseconds
        stddev = 0.91 milliseconds
        median = 0.59 milliseconds
          75% <= 0.83 milliseconds
          95% <= 1.67 milliseconds
          98% <= 2.59 milliseconds
          99% <= 4.34 milliseconds
        99.9% <= 9.80 milliseconds

8 threads:

sudo -u hdfs ozone freon dn-echo --containerID=1 -n 100000 -t 8

         count = 100000
     mean rate = 12448.57 calls/second
 1-minute rate = 13015.80 calls/second
 5-minute rate = 13015.80 calls/second
15-minute rate = 13015.80 calls/second
           min = 0.17 milliseconds
           max = 14.01 milliseconds
          mean = 0.53 milliseconds
        stddev = 0.69 milliseconds
        median = 0.41 milliseconds
          75% <= 0.50 milliseconds
          95% <= 1.07 milliseconds
          98% <= 1.94 milliseconds
          99% <= 3.02 milliseconds
        99.9% <= 9.83 milliseconds

32 threads, one MB each response:

sudo -u hdfs ozone freon dn-echo --containerID=1 -n 100000 -t 32 --payload-req=0 --payload-resp=1024

         count = 100000
     mean rate = 11059.79 calls/second
 1-minute rate = 10148.60 calls/second
 5-minute rate = 10148.60 calls/second
15-minute rate = 10148.60 calls/second
           min = 1.02 milliseconds
           max = 18.49 milliseconds
          mean = 2.70 milliseconds
        stddev = 1.81 milliseconds
        median = 2.12 milliseconds
          75% <= 2.56 milliseconds
          95% <= 7.32 milliseconds
          98% <= 8.53 milliseconds
          99% <= 9.14 milliseconds
        99.9% <= 12.66 milliseconds

32 threads, 10MB each response:

sudo -u hdfs ozone freon dn-echo --containerID=1 -n 100000 -t 32 --payload-req=0 --payload-resp=10240

         count = 100000
     mean rate = 7665.10 calls/second
 1-minute rate = 7479.75 calls/second
 5-minute rate = 7395.86 calls/second
15-minute rate = 7381.33 calls/second
           min = 1.32 milliseconds
           max = 24.93 milliseconds
          mean = 3.69 milliseconds
        stddev = 2.25 milliseconds
        median = 3.02 milliseconds
          75% <= 3.60 milliseconds
          95% <= 9.23 milliseconds
          98% <= 11.00 milliseconds
          99% <= 12.31 milliseconds
        99.9% <= 24.93 milliseconds

Copy link
Contributor

@tanvipenumudy tanvipenumudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jojochuang for the patch, please find a small comment.

return null;
}

private int calculateMaxPayloadSize(int payloadSizeKB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse the methods under org.apache.hadoop.ozone.common.PayloadUtils for calculating the max payload size here.

description = "Write to Ratis log, skip flag for read-only EchoRPC " +
"request")
private boolean writeToRatis = false;
@Option(names = {"--containerID"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Let's keep all flags kebab case for consistency.

@jojochuang jojochuang marked this pull request as ready for review March 4, 2024 16:37
Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Thanks for the patch, Overall change LGTM.
Left some minor suggestions also PR usage description can be updated with --container-id.

/**
* Send an echo to DataNode.
*
* @return GetSmallFileResponseProto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return GetSmallFileResponseProto
* @return EchoResponseProto

private OzoneConfiguration configuration;
private ByteString payloadReqBytes;
private int payloadRespSize;
private ContainerInfo containerInfo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move containerInfo inside call() method

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Thanks for updating patch, change mostly LGTM.


int sleepTimeMs = echoRequest.getSleepTimeMs();
try {
Thread.sleep(sleepTimeMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ignore sleeping if sleepTimeMs is 0.


@Option(names = {"--sleep-time-ms"},
description = "Let DataNode to pause for a duration (in milliseconds) for each request")
private int sleepTimeMs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can initialize with 0 or is it mandatory to input?

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM.

@kerneltime
Copy link
Contributor

The change looks good. We should add a robot test for this to avoid silent breakage.

@jojochuang jojochuang added the hbase HBase on Ozone support label Mar 24, 2024
@Option(names = {"--payload-req"},
description =
"Specifies the size of payload in KB in RPC request. " +
"Max size is 2097151 KB",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a max size? This is a test tool and it is ok to measure the error handling rate when the data payload is more than the container size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the max size.

@kerneltime
Copy link
Contributor

The change looks good to me, just a minor nit, I am ok with freon testing limits for max size or going over it.

@jojochuang
Copy link
Contributor Author

There's a problem with block token in secure mode. Looking into that now.

(cherry picked from commit 29b534d)

Add freon benchmark dn-echo and its test.

(cherry picked from commit 26c72a1)

Support requeset payload and response payload.

(cherry picked from commit 3951228)

Updating to address review comments.

Address review comments.

Change-Id: Id686e2bc5a373e17515dcd66534b30b12f1b970e

Fix compilation error

Change-Id: Ifda423ae5867a5df94c16ff31854811efa5aeaa5

Address review comments.

Added an option to let DataNode handler to pause for a certain duration for each request.

Change-Id: Id5e6c8551da3b42ebd720ceb8ed86b539aa7e9c8

Add a default value for sleepTimeMs.

Change-Id: Iab3f5e4b3dee3b49bc9432d66a7e02f3499e3e6d

Skip sleep if duration is equal or less than zero.

Change-Id: I3fe21bbd5a76335a34c0db7e73eb67c108a9fb40

Add --clients parameter to specify the number of xceiver clients.

In addition, support secure cluster.

Change-Id: Ib7074e45b4874d544636df741f265e9aed9c886f
(cherry picked from commit 3b2e1e1cd43022a3ff8adf8aceb146ec951205da)

Update test

Change-Id: I92e1c63d94a1fc1300a6908aba8f0eec87fb5d70

Fix findbugs

Change-Id: Ie849551820f755a585999c10a1502cfb00de5298

Add robot test.

Change-Id: I8866254a09029f48cf55df3e211b02e6ba9d96a6

Add the missing space in ozone freon robot test.

Change-Id: I36e75c06bae193d56619e49428066251359ab3df
Change-Id: I7a02071fae54532407cca8b6c250845a666f0aa5
Change-Id: Ica18ddb0738cb91fe58f2d1246b73d28a10a9703
Change-Id: I8b4effdb2c89ab4be7a8f2987cfe8037eff0b8aa
Change-Id: I6366ac01cdf9d5d3cb4892acfadb4cc0107a4736
@kerneltime kerneltime merged commit 509c970 into apache:HDDS-7593 Mar 27, 2024
jojochuang added a commit to jojochuang/ozone that referenced this pull request Apr 19, 2024
…und-trip latency (apache#6297)

(cherry picked from commit 509c970)

 Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ContainerCommandResponseBuilders.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/storage/ContainerProtocolCalls.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
	hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
	hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OmRPCLoadGenerator.java

Change-Id: Icbd75e38f6ea39604390627398750a3218e51937
kerneltime pushed a commit that referenced this pull request Apr 23, 2024
…und-trip latency (#6297) (#6562)

* HDDS-10442. [hsync] Add a Freon tool to measure client to DataNode round-trip latency (#6297)

(cherry picked from commit 509c970)
chungen0126 pushed a commit to chungen0126/ozone that referenced this pull request May 3, 2024
chungen0126 pushed a commit to chungen0126/ozone that referenced this pull request May 3, 2024
jojochuang added a commit to jojochuang/ozone that referenced this pull request May 29, 2024
…und-trip latency (apache#6297) (apache#6562)

* HDDS-10442. [hsync] Add a Freon tool to measure client to DataNode round-trip latency (apache#6297)

(cherry picked from commit 509c970)
(cherry picked from commit dec977b)

 Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ContainerCommandResponseBuilders.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/storage/ContainerProtocolCalls.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
	hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
	hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/DNRPCLoadGenerator.java

Change-Id: I07755d356dc5ce8f87c62476f49e7c91549bd93b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hbase HBase on Ozone support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants