Skip to content

Conversation

@sodonnel
Copy link
Contributor

@sodonnel sodonnel commented Dec 12, 2022

What changes were proposed in this pull request?

The new and old replication manager sends commands to the datanodes. If the command has not processed on the datanodes within the replicationManager event.timeout, RM assumes the command has failed for some reason, and may send another one to the same or a different host.

It makes sense to drop any command not processed on the datanode slightly before ReplicationManager gives up on it. Especially with delete container commands, we don't want to have two or more deletes pending in the system for the same container, when RM thinks there is only 1.

To facilitate dropping the commands, we can add a deadline to all commands. Only for commands we want to enforce a deadline on, we can set the deadline in SCM and check for it on the DN side. The deadline is the epoch time that the command should be processed by, otherwise the DN should ignore it. A deadline of zero is the default, and means no deadline set.

This change ensure that all commands sent to a datanode from RM will have a deadline set to 0.9 * event.timeout. On the datanode side, we only enforce the deadline on ReplicationContainer, DeleteContainer and ECReconstruction commands.

This has turned into quite a large change, but it is split into individual commits for each stage so they can be reviewed in isolation.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7618

How was this patch tested?

Various new unit tests added.

@kerneltime
Copy link
Contributor

@swamirishi @duongkame can you'll please take a look.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for the patch. Looks good overall, few minor items suggested.

Comment on lines 75 to 77
ReplicationTask task = new ReplicationTask(containerID, sourceDatanodes);
task.setDeadline(deadline);
supervisor.addTask(task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HDDS-7620 I will likely need to add term to ReplicationTask to be able to check it before the task is executed. I was wondering if we should pass the entire command to ReplicationTask instead of duplicating more and more fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea let me look at this. There is a similar pattern in the ECReconstruction command, where we basically duplicate the entire command into another object which really only has getters the same as the command, so we might be able to make that go away too.

supervisor.getReplicationRequestCount());
supervisor.getReplicationRequestCount())
.addGauge(Interns.info("numTimeoutReplications",
"Number of timeout replications in queue"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible clarification:

Suggested change
"Number of timeout replications in queue"),
"Number of replication requests timed out before being processed"),

Comment on lines 123 to 127
if (deadlineMsSinceEpoch > 0 &&
currentEpochMs > deadlineMsSinceEpoch) {
return true;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
if (deadlineMsSinceEpoch > 0 &&
currentEpochMs > deadlineMsSinceEpoch) {
return true;
}
return false;
return deadlineMsSinceEpoch > 0 &&
currentEpochMs > deadlineMsSinceEpoch;

Comment on lines 791 to 794
if (!(val > 0) || (val > 1)) {
throw new IllegalArgumentException(val
+ " must be greater than 0 and less than equal to 1");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation is better added to a separate method, annotated with @PostConstruct. This ensures validation is performed even if the instance is created and initialized via reflections (also allows multiple config items to be cross-validated).

Some examples:

@PostConstruct
private void validate() {
Preconditions.checkState(streamBufferSize > 0);
Preconditions.checkState(streamBufferFlushSize > 0);
Preconditions.checkState(streamBufferMaxSize > 0);
Preconditions.checkArgument(bufferIncrement < streamBufferSize,
"Buffer increment should be smaller than the size of the stream "
+ "buffer");
Preconditions.checkState(streamBufferMaxSize % streamBufferFlushSize == 0,
"expected max. buffer size (%s) to be a multiple of flush size (%s)",
streamBufferMaxSize, streamBufferFlushSize);
Preconditions.checkState(streamBufferFlushSize % streamBufferSize == 0,
"expected flush size (%s) to be a multiple of buffer size (%s)",
streamBufferFlushSize, streamBufferSize);
if (bytesPerChecksum <
OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE) {
LOG.warn("The checksum size ({}) is not allowed to be less than the " +
"minimum size ({}), resetting to the minimum size.",
bytesPerChecksum,
OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE);
bytesPerChecksum =
OzoneConfigKeys.OZONE_CLIENT_BYTES_PER_CHECKSUM_MIN_SIZE;
}
}

@PostConstruct
public void validate() {
if (replicationMaxStreams < 1) {
LOG.warn(REPLICATION_STREAMS_LIMIT_KEY + " must be greater than zero " +
"and was set to {}. Defaulting to {}",
replicationMaxStreams, REPLICATION_MAX_STREAMS_DEFAULT);
replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had a feeling there should be something like this. I will add this in.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for updating the patch, LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants