Skip to content

hunting ResourceQuota and capture the life of a secret#24778

Closed
p0lyn0mial wants to merge 3 commits intoopenshift:masterfrom
p0lyn0mial:investigate-quota-e2e-issue
Closed

hunting ResourceQuota and capture the life of a secret#24778
p0lyn0mial wants to merge 3 commits intoopenshift:masterfrom
p0lyn0mial:investigate-quota-e2e-issue

Conversation

@p0lyn0mial
Copy link
Copy Markdown
Contributor

No description provided.

@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: p0lyn0mial
To complete the pull request process, please assign eparis
You can assign the PR to them by writing /assign @eparis in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the vendor-update Touching vendor dir or related files label Mar 27, 2020
@p0lyn0mial p0lyn0mial force-pushed the investigate-quota-e2e-issue branch from 8cd84cd to bed0aa3 Compare March 27, 2020 10:59
@p0lyn0mial
Copy link
Copy Markdown
Contributor Author

p0lyn0mial commented Mar 27, 2020

@mfojtik @sttts

I think that the way the test counts the expected number of secrets is not deterministic. Especially under moderate load.

I wrote a script rage.sh that runs the test 50 times in parallel on my local machine. The output (the log file ) is attached as a commit.

It shows that the number of expected vs actual secrets diverges quickly. For example "expected 6, actual 7", "expected 6, actual 8", "expected 3, actual 6".

It's worth noting that the default number of secrets in 4.5 cluster is 9.

The secrets are generated by various controllers I suspect they are simply throttled - given the number of namespaces we create during e2e tests. To properly fix the test I think we would need a way of knowing a namespace has been fully initialized and it's ready to be consumed.

Additionally, we could inspect the individual controllers to see if there are no glaring errors that would slow them down, for example, https://github.com/openshift/openshift-controller-manager/blob/master/pkg/serviceaccounts/controllers/create_dockercfg_secrets.go#L68

@p0lyn0mial
Copy link
Copy Markdown
Contributor Author

spoke with @mfojtik, it seems that this PR https://github.com/openshift/openshift-controller-manager/pull/72/files#diff-bc9c443ebaeefc1c4319cb1018e62b8cR17 increased the number of times the test fails substantially.

@openshift-ci-robot
Copy link
Copy Markdown

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp bed0aa3 link /test e2e-gcp
ci/prow/e2e-gcp-builds bed0aa3 link /test e2e-gcp-builds

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants