-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-10442. [hsync] Add a Freon tool to measure client to DataNode round-trip latency #6297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
Outdated
Show resolved
Hide resolved
|
Probably an additional capability can be a delay in response, so the Datanode will wait for the configured duration before sending a response. |
|
I was able to do 11k echos per second using one thread, no payload in request/response:
8 threads:
32 threads, one MB each response:
32 threads, 10MB each response:
|
tanvipenumudy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jojochuang for the patch, please find a small comment.
| return null; | ||
| } | ||
|
|
||
| private int calculateMaxPayloadSize(int payloadSizeKB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can reuse the methods under org.apache.hadoop.ozone.common.PayloadUtils for calculating the max payload size here.
| description = "Write to Ratis log, skip flag for read-only EchoRPC " + | ||
| "request") | ||
| private boolean writeToRatis = false; | ||
| @Option(names = {"--containerID"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Let's keep all flags kebab case for consistency.
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jojochuang Thanks for the patch, Overall change LGTM.
Left some minor suggestions also PR usage description can be updated with --container-id.
| /** | ||
| * Send an echo to DataNode. | ||
| * | ||
| * @return GetSmallFileResponseProto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return GetSmallFileResponseProto | |
| * @return EchoResponseProto |
| private OzoneConfiguration configuration; | ||
| private ByteString payloadReqBytes; | ||
| private int payloadRespSize; | ||
| private ContainerInfo containerInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move containerInfo inside call() method
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jojochuang Thanks for updating patch, change mostly LGTM.
|
|
||
| int sleepTimeMs = echoRequest.getSleepTimeMs(); | ||
| try { | ||
| Thread.sleep(sleepTimeMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ignore sleeping if sleepTimeMs is 0.
|
|
||
| @Option(names = {"--sleep-time-ms"}, | ||
| description = "Let DataNode to pause for a duration (in milliseconds) for each request") | ||
| private int sleepTimeMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can initialize with 0 or is it mandatory to input?
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/DNRPCLoadGenerator.java
Outdated
Show resolved
Hide resolved
|
The change looks good. We should add a robot test for this to avoid silent breakage. |
| @Option(names = {"--payload-req"}, | ||
| description = | ||
| "Specifies the size of payload in KB in RPC request. " + | ||
| "Max size is 2097151 KB", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a max size? This is a test tool and it is ok to measure the error handling rate when the data payload is more than the container size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the max size.
|
The change looks good to me, just a minor nit, I am ok with freon testing limits for max size or going over it. |
|
There's a problem with block token in secure mode. Looking into that now. |
(cherry picked from commit 29b534d) Add freon benchmark dn-echo and its test. (cherry picked from commit 26c72a1) Support requeset payload and response payload. (cherry picked from commit 3951228) Updating to address review comments. Address review comments. Change-Id: Id686e2bc5a373e17515dcd66534b30b12f1b970e Fix compilation error Change-Id: Ifda423ae5867a5df94c16ff31854811efa5aeaa5 Address review comments. Added an option to let DataNode handler to pause for a certain duration for each request. Change-Id: Id5e6c8551da3b42ebd720ceb8ed86b539aa7e9c8 Add a default value for sleepTimeMs. Change-Id: Iab3f5e4b3dee3b49bc9432d66a7e02f3499e3e6d Skip sleep if duration is equal or less than zero. Change-Id: I3fe21bbd5a76335a34c0db7e73eb67c108a9fb40 Add --clients parameter to specify the number of xceiver clients. In addition, support secure cluster. Change-Id: Ib7074e45b4874d544636df741f265e9aed9c886f (cherry picked from commit 3b2e1e1cd43022a3ff8adf8aceb146ec951205da) Update test Change-Id: I92e1c63d94a1fc1300a6908aba8f0eec87fb5d70 Fix findbugs Change-Id: Ie849551820f755a585999c10a1502cfb00de5298 Add robot test. Change-Id: I8866254a09029f48cf55df3e211b02e6ba9d96a6 Add the missing space in ozone freon robot test. Change-Id: I36e75c06bae193d56619e49428066251359ab3df
Change-Id: I7a02071fae54532407cca8b6c250845a666f0aa5
Change-Id: I8b4effdb2c89ab4be7a8f2987cfe8037eff0b8aa
Change-Id: I6366ac01cdf9d5d3cb4892acfadb4cc0107a4736
…und-trip latency (apache#6297) (cherry picked from commit 509c970) Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ContainerCommandResponseBuilders.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/storage/ContainerProtocolCalls.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OmRPCLoadGenerator.java Change-Id: Icbd75e38f6ea39604390627398750a3218e51937
…und-trip latency (#6297) (#6562) * HDDS-10442. [hsync] Add a Freon tool to measure client to DataNode round-trip latency (#6297) (cherry picked from commit 509c970)
…und-trip latency (apache#6297)
…und-trip latency (apache#6297)
…und-trip latency (apache#6297) (apache#6562) * HDDS-10442. [hsync] Add a Freon tool to measure client to DataNode round-trip latency (apache#6297) (cherry picked from commit 509c970) (cherry picked from commit dec977b) Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ContainerCommandResponseBuilders.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/storage/ContainerProtocolCalls.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/DNRPCLoadGenerator.java Change-Id: I07755d356dc5ce8f87c62476f49e7c91549bd93b
What changes were proposed in this pull request?
Add a new freon tool similar to
ozone freon om-echothat can be used to benchmark client to DataNode route-trip latency.Usage:
ozone freon dn-echo --container-id=1 -n 100000 -t 32 --payload-req=0 --payload-resp=1024This tool requires SCM superuser privilege.
It sends an echo request and receives the response to DataNodes associated with container specified with
--containerID, repeat100000times, using32threads. The request has a0KB payload and the response has1024KB payload.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10442
How was this patch tested?
Two unit tests, also tested in a real cluster.