HDDS-4473. Reduce number of sortDatanodes RPC calls #1610

adoroszlai · 2020-11-23T18:13:58Z

What changes were proposed in this pull request?

KeyManagerImpl#listStatus and the sortDatanodeInPipeline helper method sort datanodes using individual RPC call for each key location info.

Improvements in this change:

Process only the "latest" location version instead of all versions. If I understand correctly, only the "latest" version is used for read.
Keep track of processed pipelines. If another key or location refers to the same pipeline, avoid repeating the query.

Possible improvement left for later: send a single sortDatanodes request for all datanodes in all relevant pipelines, then create the per-pipeline lists locally.

https://issues.apache.org/jira/browse/HDDS-4473

How was this patch tested?

Added unit test for sortDatanodes and added check in existing test case for listStatus.

sodonnel · 2020-11-24T17:02:30Z

I noticed this code before, in that we have a batch call to get the pipelines, but then for a key with multiple blocks, we must make a call per block to sort the datanodes. Am I correct in thinking this is mostly a problem with large keys which have many blocks?

Do you think it would make sense to add the sort flag to the getContainerWithPipeline and getContainerWithPipelineBatch so we can have the pipeline returned sorted with a single RPC call, rather than having to make at least two calls?

adoroszlai · 2020-11-24T17:30:29Z

Am I correct in thinking this is mostly a problem with large keys which have many blocks?

I think so, and also for listStatus with many entries in the result.

Do you think it would make sense to add the sort flag to the getContainerWithPipeline and getContainerWithPipelineBatch so we can have the pipeline returned sorted with a single RPC call, rather than having to make at least two calls?

Actually that's a great idea, better than the third (and not yet implemented) improvement I proposed, which would still require 2 batch requests.

I think we have to pay attention to compatibility, though. Client would need to check if SCM supports this new flag, so that it can get sorted pipeline from both new and old SCM.

swagle · 2020-11-25T16:37:07Z

cc: @avijayanhwx for compat check though upgrades.

adoroszlai · 2020-11-28T08:07:59Z

Filed HDDS-4523 for further improvement suggested by @sodonnel.

sodonnel · 2020-11-30T17:56:05Z

This change looks good to me and I am +1 to commit it. However I would like to highlight one issue that may make this change less effective.

In SCMClientProtocolServer this is the code where the refreshPipelinesBatch call lands.

  private ContainerWithPipeline getContainerWithPipelineCommon(
      long containerID) throws IOException {
    final ContainerID cid = ContainerID.valueof(containerID);
    final ContainerInfo container = scm.getContainerManager()
        .getContainer(cid);

    if (safeModePrecheck.isInSafeMode()) {
      if (container.isOpen()) {
        if (!hasRequiredReplicas(container)) {
          throw new SCMException("Open container " + containerID + " doesn't"
              + " have enough replicas to service this operation in "
              + "Safe mode.", ResultCodes.SAFE_MODE_EXCEPTION);
        }
      }
    }

    Pipeline pipeline;
    try {
      pipeline = container.isOpen() ? scm.getPipelineManager()
          .getPipeline(container.getPipelineID()) : null;
    } catch (PipelineNotFoundException ex) {
      // The pipeline is destroyed.
      pipeline = null;
    }

    if (pipeline == null) {
      pipeline = scm.getPipelineManager().createPipeline(
          HddsProtos.ReplicationType.STAND_ALONE,
          container.getReplicationFactor(),
          scm.getContainerManager()
              .getContainerReplicas(cid).stream()
              .map(ContainerReplica::getDatanodeDetails)
              .collect(Collectors.toList()));
    }

    return new ContainerWithPipeline(container, pipeline);
  }

For open containers the current pipeline is returned, but for a closed container, a new "pipeline" is created. This is not a real ratis pipeline - its just a list of nodes. If you follow the logic to SimplePipelineProvider, the pipeline ID is a random ID:

  public Pipeline create(ReplicationFactor factor,
      List<DatanodeDetails> nodes) {
    return Pipeline.newBuilder()
        .setId(PipelineID.randomId())
        .setState(PipelineState.OPEN)
        .setType(ReplicationType.STAND_ALONE)
        .setFactor(factor)
        .setNodes(nodes)
        .build();
  }

This means, that even if several blocks have the same set of DNs, they will still be distinct pipelines, and hence the caching will not work. The caching will only be effective for open containers.

There are two places this logic is used - listStatus and lookupkey / getOzoneFileStatus.

I guess the place where this could be a win is in listStatus, but if most keys are part of closed containers (which would be the usual case) the sorted pipeline cache introduced here may cause more overhead than it saves.

I also wonder - why does ListStatus need to return the pipeline and optionally sort the pipeline for all keys, all the time?

I guess we can have two uses - simply listing the directory and displaying the contents - usually locations is not needed here. Second, an application list the folders with the intention of reading each key - in this second case the locations are needed.

adoroszlai · 2020-11-30T18:11:39Z

Thanks @sodonnel for digging the details on pipeline refresh, and pointing out pipeline ID being random. In this case I think we should use the set of node IDs, instead of the pipeline ID, as cache key.

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java

sodonnel · 2020-12-01T10:44:57Z

we should use the set of node IDs, instead of the pipeline ID, as cache key.

Good idea - I think that will work much better. I checked the latest version and it looks good to me. I just have one small comment you can consider if you think it makes sense.

sodonnel

LGTM +1.

adoroszlai · 2020-12-01T15:57:10Z

Thanks @sodonnel for the reviews.

* master: (40 commits) HDDS-4473. Reduce number of sortDatanodes RPC calls (apache#1610) HDDS-4485. [DOC] add the authentication rules of the Ozone Ranger. (apache#1603) HDDS-4528. Upgrade slf4j to 1.7.30 (apache#1639) HDDS-4424. Update README with information how to report security issues (apache#1548) HDDS-4484. Use RaftServerImpl isLeader instead of periodic leader update logic in OM and isLeaderReady for read/write requests (apache#1638) HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551) HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588) HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625) HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626) HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618) HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620) HDDS-4512. Remove unused netty3 transitive dependency (apache#1627) HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608) HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621) HDDS-4471. GrpcOutputStream length can overflow (apache#1617) HDDS-4308. Fix issue with quota update (apache#1489) HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602) HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622) HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609) HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420) ...

HDDS-4473. Reduce number of sortDatanodes RPC calls

7a855e7

adoroszlai self-assigned this Nov 23, 2020

adoroszlai requested a review from bharatviswa504 November 23, 2020 20:54

adoroszlai added 4 commits November 30, 2020 19:44

Merge remote-tracking branch 'origin/master' into HDDS-4473

1272b06

Use node ID set as key for sorted pipeline cache

cfd9693

fix checkstyle

457acb6

trigger new CI check

102a57a

sodonnel reviewed Dec 1, 2020

View reviewed changes

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java Show resolved Hide resolved

sodonnel approved these changes Dec 1, 2020

View reviewed changes

adoroszlai merged commit 9094463 into apache:master Dec 1, 2020

adoroszlai deleted the HDDS-4473 branch December 1, 2020 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-4473. Reduce number of sortDatanodes RPC calls #1610

HDDS-4473. Reduce number of sortDatanodes RPC calls #1610

Uh oh!

adoroszlai commented Nov 23, 2020

Uh oh!

sodonnel commented Nov 24, 2020

Uh oh!

adoroszlai commented Nov 24, 2020

Uh oh!

swagle commented Nov 25, 2020

Uh oh!

adoroszlai commented Nov 28, 2020

Uh oh!

sodonnel commented Nov 30, 2020

Uh oh!

adoroszlai commented Nov 30, 2020

Uh oh!

Uh oh!

sodonnel commented Dec 1, 2020

Uh oh!

sodonnel left a comment

Uh oh!

adoroszlai commented Dec 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDDS-4473. Reduce number of sortDatanodes RPC calls #1610

HDDS-4473. Reduce number of sortDatanodes RPC calls #1610

Uh oh!

Conversation

adoroszlai commented Nov 23, 2020

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sodonnel commented Nov 24, 2020

Uh oh!

adoroszlai commented Nov 24, 2020

Uh oh!

swagle commented Nov 25, 2020

Uh oh!

adoroszlai commented Nov 28, 2020

Uh oh!

sodonnel commented Nov 30, 2020

Uh oh!

adoroszlai commented Nov 30, 2020

Uh oh!

Uh oh!

sodonnel commented Dec 1, 2020

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants