Skip to content

Conversation

@adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Jul 15, 2020

What changes were proposed in this pull request?

  1. Fix config key mismatch: raft.server.notification.no-leader.timeout vs. raft.server.Notification.no-leader.timeout.
  2. Use type-safe config for Raft client settings in tests. This ensures that we use the correct key (raft.client.rpc.watch.request.timeout) instead of similar but wrong one (raft.client.watch.request.timeout). Get rid of constants to discourage setting raw config.
  3. Fix prefix for RatisClientConfig.RaftConfig. @ConfigGroup defined in inner class (RaftConfig) does not inherit prefix (hdds.ratis) from outer class (RatisClientConfig), so we need to explicitly specify it. Without the patch we got:
    $ grep -A1 raft.client hadoop-hdds/common/target/classes/ozone-default-generated.xml
        <name>raft.client.async.outstanding-requests.max</name>
        <value>32</value>
    --
        <name>raft.client.rpc.request.timeout</name>
        <value>60s</value>
    --
        <name>raft.client.rpc.watch.request.timeout</name>
        <value>180s</value>
    
    so RatisHelper would not apply these settings to RaftProperties.

https://issues.apache.org/jira/browse/HDDS-3964

How was this patch tested?

Verified ozone-default-generated.xml has hdds.ratis prefix:

$ grep -A1 raft.client hadoop-hdds/common/target/classes/ozone-default-generated.xml
    <name>hdds.ratis.raft.client.async.outstanding-requests.max</name>
    <value>32</value>
--
    <name>hdds.ratis.raft.client.rpc.request.timeout</name>
    <value>60s</value>
--
    <name>hdds.ratis.raft.client.rpc.watch.request.timeout</name>
    <value>180s</value>

Ran some affected integration tests locally, verified that custom Ratis settings were applied:

[pool-63-thread-1] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.notification.no-leader.timeout = 10000ms (custom)

(client property printed by temporary log statement):

[Command processor thread] INFO  ratis.RatisHelper (RatisHelper.java:createRaftClientProperties(241)) - ZZZ raft.client.rpc.watch.request.timeout = 3000ms

https://github.com/adoroszlai/hadoop-ozone/runs/873847680

@adoroszlai
Copy link
Contributor Author

@cku328 thanks for finding the watch.request.timeout problem, please review

Copy link
Contributor

@cku328 cku328 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for fixing this. Overall LGTM.

Would it be better to change clientConfig to ratisClientConfig so that it matches ratisServerConfig (#1106)?

@adoroszlai
Copy link
Contributor Author

Would it be better to change clientConfig to ratisClientConfig so that it matches ratisServerConfig (#1106)?

Thanks @cku328 for the suggestion. I'm OK with both names. I'll update the patch based on your suggestion together with any other changes that may be requested by @bshashikant and/or @lokeshj1703, or if they approve it as it is. (I'd like to minimize number of CI checks due to too much flakiness.)

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai Thanks for working on this! The changes look good to me. +1.
Minor comment - We can add suffix like "InMs" for time based configs in classes like RatisClientConfig.
We can do it as part of separate jira though.

@adoroszlai
Copy link
Contributor Author

Thanks @lokeshj1703 for the review.

Minor comment - We can add suffix like "InMs" for time based configs in classes like RatisClientConfig.
We can do it as part of separate jira though.

I was considering changing the parameter to Duration, similar to

https://github.com/apache/hadoop-ozone/blob/de027855798bf3b891b8d3c00dc8e59531f98781/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ReconTaskConfig.java#L41-L49

What do you think about that?

@lokeshj1703
Copy link
Contributor

I was considering changing the parameter to Duration, similar to

https://github.com/apache/hadoop-ozone/blob/de027855798bf3b891b8d3c00dc8e59531f98781/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ReconTaskConfig.java#L41-L49

What do you think about that?

yeah...that seems like a better approach.

@adoroszlai
Copy link
Contributor Author

I was considering changing the parameter to Duration
yeah...that seems like a better approach.

Created HDDS-3975 for this.

@adoroszlai adoroszlai added 0.6.0 bug Something isn't working labels Jul 17, 2020
@adoroszlai adoroszlai merged commit f15b011 into apache:master Jul 17, 2020
@adoroszlai adoroszlai deleted the HDDS-3964 branch July 17, 2020 06:20
@adoroszlai
Copy link
Contributor Author

Thanks @cku328 and @lokeshj1703 for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Jul 17, 2020
* master:
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  HDDS-3612. Allow mounting bucket under other volume (apache#1104)
  HDDS-3926. OM Token Identifier table should use in-house serialization. (apache#1182)
  HDDS-3824: OM read requests should make SCM#refreshPipeline outside BUCKET_LOCK (apache#1164)
errose28 added a commit to errose28/ozone that referenced this pull request Jul 17, 2020
…erface

* upstream/master:
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  HDDS-3612. Allow mounting bucket under other volume (apache#1104)
  HDDS-3926. OM Token Identifier table should use in-house serialization. (apache#1182)
  HDDS-3824: OM read requests should make SCM#refreshPipeline outside BUCKET_LOCK (apache#1164)
  HDDS-3966. Disable flaky TestOMRatisSnapshots
errose28 added a commit to errose28/ozone that referenced this pull request Jul 20, 2020
* master:
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  HDDS-3612. Allow mounting bucket under other volume (apache#1104)
  HDDS-3926. OM Token Identifier table should use in-house serialization. (apache#1182)
  HDDS-3824: OM read requests should make SCM#refreshPipeline outside BUCKET_LOCK (apache#1164)
  HDDS-3966. Disable flaky TestOMRatisSnapshots
errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 21, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Jul 21, 2020
* add-deleted-block-table: (63 commits)
  Make block iterator tests use deleted blocks table, and remove the now unused #deleted#
  Replace uses of #deleted# key prefix with access to new deleted blocks table
  Add deleted blocks table to base level DB wrappers
  Have block deleting service test look for #deleted# keys in metadata table
  Move block delete to correct table and remove debugging print statement
  Import schema version when importing container data from export
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  Move new key value block iterator implementation and tests to new interface
  Fix checkstyle violations
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
  Update comments
  Add comments on added helper method
  Remove seekToLast() from iterator interface, implementation, and tests
  Add more robust unit test with alternating key matches
  All unit tests pass after allowing keys with deleted and deleting prefixes to be made
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  ...
ChenSammi pushed a commit that referenced this pull request Jul 22, 2020
rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants