-
Notifications
You must be signed in to change notification settings - Fork 1.5k
aws: Increase default master disk size to 120GB for IO #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws: Increase default master disk size to 120GB for IO #737
Conversation
With colocated masters we are seeing about ~120 IOP/s sustained, and a 30GB gp2 drive is limited to 100 IOP/s. etcd is ~75% of the write workload, but we are seeing some syncs take ~1s and occasional heartbeat latency. Increase master disk by 4x to get slightly more head room - this would in practice result in about $30-40/month more disk on top of the ~200/mo the instances cost.
|
/hold look at the ec2 numbers to verify this has an impact |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/retest may be seeing higher CPU use due to io throttling |
|
/test e2e-aws |
| variable "tectonic_aws_master_root_volume_size" { | ||
| type = "string" | ||
| default = "30" | ||
| default = "120" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to be driving these defaults from the Terraform config (because that doesn't work for folks who want to install our assets themselves without going through Terraform). Ideally we'd set it in this structure here and then push that into Terraform here. But the cluster API doesn't seem to support root volume configs at the moment (I didn't see any open issues about that, but maybe we have someone in sig-cluster-lifecycle that can ask about getting it added). In the meantime, we're pulling this straight from the install-config, although folks that do not get this added via the install-config may not have it set in the Terraform variables at all (in which case your default here will matter). So this probably works as you have it, but only as long as the cluster-API operator doesn't have to get involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to be driving these defaults from the Terraform config (because that doesn't work for folks who want to install our assets themselves without going through Terraform).
A good start would be to not have defaults in terraform and drive all options throught installconfig or cluster-api and slowly we move most into not installconfig.
https://jira.coreos.com/browse/CORS-888
|
/lgtm We are going to merge this as-is so we can hopefully cut down on CI flakes. @enxebre the Machine API will need knobs for adjusting this (if they don't already exist). |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
just the one failure. Big money 🎲 /retest |
which I think is the OpenShift API flake. /retest |
|
e2e-aws had more of the /retest |
|
Maybe openshift/origin@5fa8ee7 has a fix... /retest |
|
Hrm, maybe not. e2e-aws: Let's try again with [edit, actually the origin commit hadn't changed]: $ oc adm release info registry.svc.ci.openshift.org/openshift/origin-release:v4.0 --commits | grep origin | head -n1
cli https://github.com/openshift/origin 5fa8ee77312ed76b49e896122ef64208216d32e1/retest |
|
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
With colocated masters we are seeing about ~120 IOP/s sustained, and
a 30GB gp2 drive is limited to 100 IOP/s. etcd is ~75% of the write
workload, but we are seeing some syncs take ~1s and occasional
heartbeat latency. Increase master disk by 4x to get slightly more
head room - this would in practice result in about $30-40/month more
disk on top of the ~200/mo the instances cost.
openshift/origin#21552 may be related