-
Notifications
You must be signed in to change notification settings - Fork 331
SAMZA-2605: Make Standby Container Requests Rack Aware #1446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAMZA-2605: Make Standby Container Requests Rack Aware #1446
Conversation
lakshmi-manasa-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass half way through. will come back to the other half in a bit.
samza-core/src/main/java/org/apache/samza/clustermanager/ClusterResourceManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomain.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomain.java
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
lakshmi-manasa-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finished one pass.
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/test/java/org/apache/samza/clustermanager/MockFaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/RackManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/RackManager.java
Outdated
Show resolved
Hide resolved
mynameborat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly agree to Manasa's comments. Added few major ones around FaultDomainManager interface and RackManager.
It is definitely looking much better than the previous version :)
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManager.java
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManager.java
Outdated
Show resolved
Hide resolved
| * @param host the host | ||
| * @return the {@link FaultDomain} | ||
| */ | ||
| FaultDomain getFaultDomainOfNode(String host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 use consistent terminologies.
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManagerFactory.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/RackManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/RackManagerFactory.java
Outdated
Show resolved
Hide resolved
lakshmi-manasa-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for addressing all the feedback.. just one clarification.
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
Outdated
Show resolved
Hide resolved
mynameborat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another pass. Any reason why the new parameters do in the middle of the signature instead of end?
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomain.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomain.java
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainType.java
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaApplicationState.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnFaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnFaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/FaultDomainManager.java
Outdated
Show resolved
Hide resolved
mynameborat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments around the configuration. Once you resolve them, can we update the configuration table docs to reflect details about these?
Specifically, which set of configurations are to be turned on in tandem for the feature to work and what are the defaults provided.
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java
Outdated
Show resolved
Hide resolved
mynameborat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing through this. Few follow ups.
Looks good otherwise.
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnFaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnFaultDomainManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
lakshmi-manasa-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor comments.
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaApplicationState.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/SamzaApplicationState.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnFaultDomainManager.java
Outdated
Show resolved
Hide resolved
|
A few consistency related comments. Please resolve the conflicts as well. |
c5e52cf to
b871543
Compare
mynameborat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM. Can you update where you landed on adding the invariant of non-empty activeContainerId within issueStandByAwareAllocation method?
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-core/src/main/java/org/apache/samza/clustermanager/StandbyContainerManager.java
Outdated
Show resolved
Hide resolved
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
Outdated
Show resolved
Hide resolved
c0960b3 to
8ed4575
Compare
Feature: The aim of this feature is to make all standby container requests rack aware such that all active containers and their corresponding standby containers are always on different racks. This helps with decreased downtime of applications during rack failures. One of the requirements of this feature is that the value of job.standbytasks.replication.factor is at max 2 for the rack awareness functionality to be honored. Changes: This PR uses the FaultDomainManager interface for Yarn to request for rack aware nodes while making standby container requests. Usage Instructions: For a job with host affinity and standby containers, set the config cluster-manager.fault-domain-aware.standby.enabled to true to enable this feature.
Feature: The aim of this feature is to make all standby container requests rack aware such that all active containers and their corresponding standby containers are always on different racks. This helps with decreased downtime of applications during rack failures. One of the requirements of this feature is that the value of job.standbytasks.replication.factor is at max 2 for the rack awareness functionality to be honored. Changes: This PR uses the FaultDomainManager interface for Yarn to request for rack aware nodes while making standby container requests. Usage Instructions: For a job with host affinity and standby containers, set the config cluster-manager.fault-domain-aware.standby.enabled to true to enable this feature.
Feature: The aim of this feature is to make all standby container requests rack aware such that all active containers and their corresponding standby containers are always on different racks. This helps with decreased downtime of applications during rack failures. One of the requirements of this feature is that the value of job.standbytasks.replication.factor is at max 2 for the rack awareness functionality to be honored. Changes: This PR uses the FaultDomainManager interface for Yarn to request for rack aware nodes while making standby container requests. Usage Instructions: For a job with host affinity and standby containers, set the config cluster-manager.fault-domain-aware.standby.enabled to true to enable this feature.
Feature: The aim of this feature is to make all standby container requests rack aware such that all active containers and their corresponding standby containers are always on different racks. This helps with decreased downtime of applications during rack failures.
Changes: This PR uses the FaultDomainManager interface for Yarn to request for rack aware nodes while making standby container requests.
API Changes: None
Tests: Added
Upgrade instructions: TBD
Usage Instructions: For a job with host affinity and standby containers, set the config cluster-manager.fault-domain-aware.standby.enabled to true to enable this feature.