Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

KeyManagerImpl#listStatus and the sortDatanodeInPipeline helper method sort datanodes using individual RPC call for each key location info.

Improvements in this change:

  1. Process only the "latest" location version instead of all versions. If I understand correctly, only the "latest" version is used for read.
  2. Keep track of processed pipelines. If another key or location refers to the same pipeline, avoid repeating the query.

Possible improvement left for later: send a single sortDatanodes request for all datanodes in all relevant pipelines, then create the per-pipeline lists locally.

https://issues.apache.org/jira/browse/HDDS-4473

How was this patch tested?

Added unit test for sortDatanodes and added check in existing test case for listStatus.

@adoroszlai adoroszlai self-assigned this Nov 23, 2020
@sodonnel
Copy link
Contributor

I noticed this code before, in that we have a batch call to get the pipelines, but then for a key with multiple blocks, we must make a call per block to sort the datanodes. Am I correct in thinking this is mostly a problem with large keys which have many blocks?

Do you think it would make sense to add the sort flag to the getContainerWithPipeline and getContainerWithPipelineBatch so we can have the pipeline returned sorted with a single RPC call, rather than having to make at least two calls?

@adoroszlai
Copy link
Contributor Author

Am I correct in thinking this is mostly a problem with large keys which have many blocks?

I think so, and also for listStatus with many entries in the result.

Do you think it would make sense to add the sort flag to the getContainerWithPipeline and getContainerWithPipelineBatch so we can have the pipeline returned sorted with a single RPC call, rather than having to make at least two calls?

Actually that's a great idea, better than the third (and not yet implemented) improvement I proposed, which would still require 2 batch requests.

I think we have to pay attention to compatibility, though. Client would need to check if SCM supports this new flag, so that it can get sorted pipeline from both new and old SCM.

@swagle
Copy link
Contributor

swagle commented Nov 25, 2020

cc: @avijayanhwx for compat check though upgrades.

@adoroszlai
Copy link
Contributor Author

Filed HDDS-4523 for further improvement suggested by @sodonnel.

@sodonnel
Copy link
Contributor

This change looks good to me and I am +1 to commit it. However I would like to highlight one issue that may make this change less effective.

In SCMClientProtocolServer this is the code where the refreshPipelinesBatch call lands.

  private ContainerWithPipeline getContainerWithPipelineCommon(
      long containerID) throws IOException {
    final ContainerID cid = ContainerID.valueof(containerID);
    final ContainerInfo container = scm.getContainerManager()
        .getContainer(cid);

    if (safeModePrecheck.isInSafeMode()) {
      if (container.isOpen()) {
        if (!hasRequiredReplicas(container)) {
          throw new SCMException("Open container " + containerID + " doesn't"
              + " have enough replicas to service this operation in "
              + "Safe mode.", ResultCodes.SAFE_MODE_EXCEPTION);
        }
      }
    }

    Pipeline pipeline;
    try {
      pipeline = container.isOpen() ? scm.getPipelineManager()
          .getPipeline(container.getPipelineID()) : null;
    } catch (PipelineNotFoundException ex) {
      // The pipeline is destroyed.
      pipeline = null;
    }

    if (pipeline == null) {
      pipeline = scm.getPipelineManager().createPipeline(
          HddsProtos.ReplicationType.STAND_ALONE,
          container.getReplicationFactor(),
          scm.getContainerManager()
              .getContainerReplicas(cid).stream()
              .map(ContainerReplica::getDatanodeDetails)
              .collect(Collectors.toList()));
    }

    return new ContainerWithPipeline(container, pipeline);
  }

For open containers the current pipeline is returned, but for a closed container, a new "pipeline" is created. This is not a real ratis pipeline - its just a list of nodes. If you follow the logic to SimplePipelineProvider, the pipeline ID is a random ID:

  public Pipeline create(ReplicationFactor factor,
      List<DatanodeDetails> nodes) {
    return Pipeline.newBuilder()
        .setId(PipelineID.randomId())
        .setState(PipelineState.OPEN)
        .setType(ReplicationType.STAND_ALONE)
        .setFactor(factor)
        .setNodes(nodes)
        .build();
  }

This means, that even if several blocks have the same set of DNs, they will still be distinct pipelines, and hence the caching will not work. The caching will only be effective for open containers.

There are two places this logic is used - listStatus and lookupkey / getOzoneFileStatus.

I guess the place where this could be a win is in listStatus, but if most keys are part of closed containers (which would be the usual case) the sorted pipeline cache introduced here may cause more overhead than it saves.

I also wonder - why does ListStatus need to return the pipeline and optionally sort the pipeline for all keys, all the time?

I guess we can have two uses - simply listing the directory and displaying the contents - usually locations is not needed here. Second, an application list the folders with the intention of reading each key - in this second case the locations are needed.

@adoroszlai
Copy link
Contributor Author

Thanks @sodonnel for digging the details on pipeline refresh, and pointing out pipeline ID being random. In this case I think we should use the set of node IDs, instead of the pipeline ID, as cache key.

@sodonnel
Copy link
Contributor

sodonnel commented Dec 1, 2020

we should use the set of node IDs, instead of the pipeline ID, as cache key.

Good idea - I think that will work much better. I checked the latest version and it looks good to me. I just have one small comment you can consider if you think it makes sense.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1.

@adoroszlai
Copy link
Contributor Author

Thanks @sodonnel for the reviews.

@adoroszlai adoroszlai merged commit 9094463 into apache:master Dec 1, 2020
@adoroszlai adoroszlai deleted the HDDS-4473 branch December 1, 2020 15:57
errose28 added a commit to errose28/ozone that referenced this pull request Jan 5, 2021
* master: (40 commits)
  HDDS-4473. Reduce number of sortDatanodes RPC calls (apache#1610)
  HDDS-4485. [DOC] add the authentication rules of the Ozone Ranger. (apache#1603)
  HDDS-4528. Upgrade slf4j to 1.7.30 (apache#1639)
  HDDS-4424. Update README with information how to report security issues (apache#1548)
  HDDS-4484. Use RaftServerImpl isLeader instead of periodic leader update logic in OM and isLeaderReady for read/write requests (apache#1638)
  HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551)
  HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588)
  HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625)
  HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626)
  HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618)
  HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620)
  HDDS-4512. Remove unused netty3 transitive dependency (apache#1627)
  HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608)
  HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621)
  HDDS-4471. GrpcOutputStream length can overflow (apache#1617)
  HDDS-4308. Fix issue with quota update (apache#1489)
  HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602)
  HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622)
  HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609)
  HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants