-
Notifications
You must be signed in to change notification settings - Fork 2.1k
steps: add internal for GCP and Azure #9553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
09d03f3 to
11c9cf5
Compare
|
/retest |
f02dc1f to
296b2f0
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3ffb142 to
e76d7f7
Compare
jstuever
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of forking all of these -commands.sh. When we did this with templates, we ended up with huge variants between the templates. I would be more comfortable if we could somehow wrap this into the original -commands.sh to avoid drift.
I have concern about how the cat calls handle variables, double so if CI is replacing the variables prior to running the scripts (like it did in templates).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we have a tech debt card to handle moving these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have this card also look for other modifications to install-config.yaml that could benefit from yq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the variables in this block being translated during the cat call prior to being written to disk? If so, how does that affect them running on the jump host? The same thing applies to ther other -commands.sh in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the variables in this block being translated during the cat call prior to being written to disk?
https://stackoverflow.com/a/22698106
'EOF'makes it so that NO expanding will happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. I wasn't aware of the functionality change when adding the quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gate if [[ ! -s ${SHARED_DIR}/jump-host.txt" ]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in the other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we would do this? I don't want to skip silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will prevent ssh from failing when ${REMOTE} is an empty string and throwing a possibly misleading error message. Better to gate here with a clear error of why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added with eec6cc0
See the PR description for the reasoning why this is being done this way.
1259 would allow us to symlink the script to this step and use an input and then we don't need the local copy. updating the original install just makes that step too complicated. |
I didn't intend to block on this, I was voicing my concern to help prioritize the work to fix it. |
Currently the CI pod that runs the installer/openshift-test/oc(gathering) needs access to the internet for various things. - the cluster that was created, kube api and the openshift-ingress - the cloud APIs - DNS resolution of both types of endpoints mentioned above. The first one is about getting the CI pod access to internal network ips, the second one is not necessary for internal clusters specifically but for certain testing like AWS GovCloud emulation testing, we need to make sure we are originating all the request from inside the network itself to use PrivateLink service endpoints and the last one is important because the internal cluster does not publish the dns names to Internet, the dns resolution of api.clusterdomain for example is only possible from inside the cluster and also affects the usage of PrivateLink service endpoints. So I started with 3 idea and tried to implement those using what we have. 1. So started looking into complete transparent VPNs like [sshuttle| https://github.com/sshuttle/sshuttle] or OpenVPN Using these should allow all the traffic to be force through the VPN into the internal network and also provide dns resolution. But when trying to use them in container esp on an OpenShift clsuter, - Both of them require sudo/root priviledges + NET_CAP like capabilities in the pod. With openshift this is very difficult to achieve as far as read. - Also these require modyging network devices and I could not find a way to even test this on OpenShift clsuter. 2. Then I started looking into, ssh+rsync For a given step we take the commands.sh and use rsync to copy that plus any binaries we need to the bastian-host's temporary workspace. Plus we also sync all the shared_dir, artifacts_dir and any other environment variables to ensure we can run the step properly. This is approach i have taken in commit but there are a lot of downsides to this approach, - Requires a lot of wrapping and needs an alternate steps with jump handling for each step today. https://issues.redhat.com/browse/DPTP-1259 can help reduce one type of dupliation but there is still which binaries to copy, what env variables should we tranfer and what if there are other files that need to be copied. - The shared_dir, artifacts_dir syncing is a little brittle, the impl should be good enough but there is alittle more to be desired in terms of retries etc. - The test require 3 cpus, 4 gb of ram and that means we can't shared bastians and we might have to bring one up per CI job 3. The last idea i have is using SOCKS5 proxy + pod dnspolicy forced to dns resolver of the internal network. Since most of our binaries are Go binaries. Go support HTTP_PROXY="socks5://<>" and this would allow us to access the internal IPs over the proxy as long as we wrap the commands.sh with a local ssh proxy. The next requirement is to force the DNS to that of the internal network. K8s pods allow defining dnsPolicy and nameservers for the pods. ```yaml dnsPolicy: None dnsConfig: nameservers: - <the chosen one, probably needs to be dynamic per the steps> ``` This should technicaly allow the pod to use bastian dns resolvers already setup to allow dns resolution of internal services of our pre-existing networks. Now this requires that ci-operator can read these values from like SHARED_DIR and then setup the dns for the pod running the step accordingly. There is one down side, no containers in that pod will have access to the CI cluster's services so current implementation of shared-secrets-copier might have to be updated.
e76d7f7 to
eec6cc0
Compare
|
@abhinavdahiya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@abhinavdahiya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@abhinavdahiya: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
@abhinavdahiya: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/uncc |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently the CI pod that runs the installer/openshift-test/oc(gathering) needs access to the internet for various things.
The first one is about getting the CI pod access to internal network ips, the second one is not necessary for internal clusters specifically but for certain testing like AWS GovCloud emulation testing, we need to make sure we are originating all the request from inside the network itself to use PrivateLink service endpoints and the last one is important because the internal cluster does not publish the dns names to Internet, the dns resolution of api.clusterdomain for example is only possible from inside the cluster and also affects the usage of PrivateLink service endpoints.
So I started with 3 idea and tried to implement those using what we have.
Using these should allow all the traffic to be force through the VPN into the internal network and also provide dns resolution.
But when trying to use them in container esp on an OpenShift clsuter,
For a given step we take the commands.sh and use rsync to copy that plus any binaries we need to the bastian-host's temporary workspace. Plus we also sync all the shared_dir, artifacts_dir and any other environment variables to ensure we can run the step properly.
This is approach i have taken in commit but there are a lot of downsides to this approach,
Since most of our binaries are Go binaries. Go support HTTP_PROXY="socks5://<>" and this would allow us to access the internal IPs over the proxy as long as we wrap the commands.sh with a local ssh proxy.
The next requirement is to force the DNS to that of the internal network. K8s pods allow defining dnsPolicy and nameservers for the pods.
This should technicaly allow the pod to use bastian dns resolvers already setup to allow dns resolution of internal services of our pre-existing networks. Now this requires that ci-operator can read these values from like SHARED_DIR and then setup the dns for the pod running the step accordingly.
There is one down side, no containers in that pod will have access to the CI cluster's services so current implementation of shared-secrets-copier might have to be updated.
depends on #9640
long term improvements can be tracked in https://issues.redhat.com/browse/DPTP-1260