HDDS-7618. Replication Commands should timeout if not processed on datanodes in time #4069

sodonnel · 2022-12-12T13:22:45Z

What changes were proposed in this pull request?

The new and old replication manager sends commands to the datanodes. If the command has not processed on the datanodes within the replicationManager event.timeout, RM assumes the command has failed for some reason, and may send another one to the same or a different host.

It makes sense to drop any command not processed on the datanode slightly before ReplicationManager gives up on it. Especially with delete container commands, we don't want to have two or more deletes pending in the system for the same container, when RM thinks there is only 1.

To facilitate dropping the commands, we can add a deadline to all commands. Only for commands we want to enforce a deadline on, we can set the deadline in SCM and check for it on the DN side. The deadline is the epoch time that the command should be processed by, otherwise the DN should ignore it. A deadline of zero is the default, and means no deadline set.

This change ensure that all commands sent to a datanode from RM will have a deadline set to 0.9 * event.timeout. On the datanode side, we only enforce the deadline on ReplicationContainer, DeleteContainer and ECReconstruction commands.

This has turned into quite a large change, but it is split into individual commits for each stage so they can be reviewed in isolation.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7618

How was this patch tested?

Various new unit tests added.

…ct to use for time and in tests

kerneltime · 2022-12-12T17:09:42Z

@swamirishi @duongkame can you'll please take a look.

adoroszlai

Thanks @sodonnel for the patch. Looks good overall, few minor items suggested.

adoroszlai · 2022-12-12T17:11:14Z

...oop/ozone/container/common/statemachine/commandhandler/ReplicateContainerCommandHandler.java

+    ReplicationTask task = new ReplicationTask(containerID, sourceDatanodes);
+    task.setDeadline(deadline);
+    supervisor.addTask(task);


For HDDS-7620 I will likely need to add term to ReplicationTask to be able to check it before the task is executed. I was wondering if we should pass the entire command to ReplicationTask instead of duplicating more and more fields.

Yea let me look at this. There is a similar pattern in the ECReconstruction command, where we basically duplicate the entire command into another object which really only has getters the same as the command, so we might be able to make that go away too.

adoroszlai · 2022-12-12T17:14:19Z

...rc/main/java/org/apache/hadoop/ozone/container/replication/ReplicationSupervisorMetrics.java

-            supervisor.getReplicationRequestCount());
+            supervisor.getReplicationRequestCount())
+        .addGauge(Interns.info("numTimeoutReplications",
+            "Number of timeout replications in queue"),


Possible clarification:

Suggested change

"Number of timeout replications in queue"),

"Number of replication requests timed out before being processed"),

adoroszlai · 2022-12-12T17:15:47Z

...ds/container-service/src/main/java/org/apache/hadoop/ozone/protocol/commands/SCMCommand.java

+    if (deadlineMsSinceEpoch > 0 &&
+        currentEpochMs > deadlineMsSinceEpoch) {
+      return true;
+    }
+    return false;


Nit:

Suggested change

if (deadlineMsSinceEpoch > 0 &&

currentEpochMs > deadlineMsSinceEpoch) {

return true;

}

return false;

return deadlineMsSinceEpoch > 0 &&

currentEpochMs > deadlineMsSinceEpoch;

adoroszlai · 2022-12-12T17:24:42Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java

+      if (!(val > 0) || (val > 1)) {
+        throw new IllegalArgumentException(val
+            + " must be greater than 0 and less than equal to 1");
+      }


Validation is better added to a separate method, annotated with @PostConstruct. This ensures validation is performed even if the instance is created and initialized via reflections (also allows multiple config items to be cross-validated).

Some examples:

ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java

Lines 194 to 220 in d0e6824

@PostConstruct

private void validate() {

Preconditions.checkState(streamBufferSize > 0);

Preconditions.checkState(streamBufferFlushSize > 0);

Preconditions.checkState(streamBufferMaxSize > 0);

Preconditions.checkArgument(bufferIncrement < streamBufferSize,

"Buffer increment should be smaller than the size of the stream "

+ "buffer");

Preconditions.checkState(streamBufferMaxSize % streamBufferFlushSize == 0,

"expected max. buffer size (%s) to be a multiple of flush size (%s)",

streamBufferMaxSize, streamBufferFlushSize);

Preconditions.checkState(streamBufferFlushSize % streamBufferSize == 0,

"expected flush size (%s) to be a multiple of buffer size (%s)",

streamBufferFlushSize, streamBufferSize);

if (bytesPerChecksum <

OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE) {

LOG.warn("The checksum size ({}) is not allowed to be less than the " +

"minimum size ({}), resetting to the minimum size.",

bytesPerChecksum,

OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE);

bytesPerChecksum =

OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE;

}

}

ozone/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java

Lines 171 to 179 in d0e6824

@PostConstruct

public void validate() {

if (replicationMaxStreams < 1) {

LOG.warn(REPLICATION_STREAMS_LIMIT_KEY + " must be greater than zero " +

"and was set to {}. Defaulting to {}",

replicationMaxStreams, REPLICATION_MAX_STREAMS_DEFAULT);

replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;

}

}

Ah, I had a feeling there should be something like this. I will add this in.

…e parameters of the command

adoroszlai

Thanks @sodonnel for updating the patch, LGTM.

S O'Donnell added 7 commits December 12, 2022 13:20

Allow deadline to be passed to the DN for any DN command

06df0b7

Reduced ReplicationSupervistor constructors and injected a clock obje…

7d068fb

…ct to use for time and in tests

Use deadline in ReplicateContainerCommand

f650f30

Add deadline to the delete container handler

e907988

Add deadline to the EC Reconstruction command handler

187dc8b

Set deadline on commands sent from ReplicationManager

ae1dcb9

Add test to ensure term and deadline are set in datanode proto server

8f1c8aa

adoroszlai reviewed Dec 12, 2022

View reviewed changes

S O'Donnell added 4 commits December 12, 2022 17:52

Make ReplicationTask take ReplicateContainerCommand in its constructor

844865c

General review comments

1ea5515

Make ECReconstructionCommandInfo take a command object rather than th…

862130d

…e parameters of the command

Fix findbugs dead store

2af6bb9

adoroszlai approved these changes Dec 13, 2022

View reviewed changes

sodonnel merged commit e4a1993 into apache:master Dec 13, 2022

sodonnel mentioned this pull request Dec 13, 2022

HDDS-6848. Ignore timeout replication tasks on datanode #3497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-7618. Replication Commands should timeout if not processed on datanodes in time #4069

HDDS-7618. Replication Commands should timeout if not processed on datanodes in time #4069

Uh oh!

sodonnel commented Dec 12, 2022 •

edited

Loading

Uh oh!

kerneltime commented Dec 12, 2022

Uh oh!

adoroszlai left a comment

Uh oh!

adoroszlai Dec 12, 2022

Uh oh!

sodonnel Dec 12, 2022

Uh oh!

adoroszlai Dec 12, 2022

Uh oh!

adoroszlai Dec 12, 2022

Uh oh!

adoroszlai Dec 12, 2022

Uh oh!

sodonnel Dec 12, 2022

Uh oh!

adoroszlai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	"Number of timeout replications in queue"),
	"Number of replication requests timed out before being processed"),

	@PostConstruct
	private void validate() {
	Preconditions.checkState(streamBufferSize > 0);
	Preconditions.checkState(streamBufferFlushSize > 0);
	Preconditions.checkState(streamBufferMaxSize > 0);

	Preconditions.checkArgument(bufferIncrement < streamBufferSize,
	"Buffer increment should be smaller than the size of the stream "
	+ "buffer");
	Preconditions.checkState(streamBufferMaxSize % streamBufferFlushSize == 0,
	"expected max. buffer size (%s) to be a multiple of flush size (%s)",
	streamBufferMaxSize, streamBufferFlushSize);
	Preconditions.checkState(streamBufferFlushSize % streamBufferSize == 0,
	"expected flush size (%s) to be a multiple of buffer size (%s)",
	streamBufferFlushSize, streamBufferSize);

	if (bytesPerChecksum <
	OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE) {
	LOG.warn("The checksum size ({}) is not allowed to be less than the " +
	"minimum size ({}), resetting to the minimum size.",
	bytesPerChecksum,
	OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE);
	bytesPerChecksum =
	OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE;
	}

	}

	@PostConstruct
	public void validate() {
	if (replicationMaxStreams < 1) {
	LOG.warn(REPLICATION_STREAMS_LIMIT_KEY + " must be greater than zero " +
	"and was set to {}. Defaulting to {}",
	replicationMaxStreams, REPLICATION_MAX_STREAMS_DEFAULT);
	replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
	}
	}

HDDS-7618. Replication Commands should timeout if not processed on datanodes in time #4069

HDDS-7618. Replication Commands should timeout if not processed on datanodes in time #4069

Uh oh!

Conversation

sodonnel commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

kerneltime commented Dec 12, 2022

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

sodonnel Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

sodonnel Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sodonnel commented Dec 12, 2022 •

edited

Loading