Skip to content

Refactor git ssh url parsing#4142

Merged
jedevc merged 4 commits intomoby:masterfrom
jedevc:refactor-git-ssh-url-parsing
Aug 22, 2023
Merged

Refactor git ssh url parsing#4142
jedevc merged 4 commits intomoby:masterfrom
jedevc:refactor-git-ssh-url-parsing

Conversation

@jedevc
Copy link
Copy Markdown
Member

@jedevc jedevc commented Aug 13, 2023

This is an alternative to the solution proposed in #4069, see #4069 (comment) (cc @aaronlehmann)

Wherever possible, we should parse SSH urls using url.Parse. See the updated tests for the exact changes in behavior:

  • Don't strip the username during parsing (I'm not 100% sure why we did this in the first place - maybe I'm missing something obvious?)
  • Normalize SCP-style git urls (username@host:org/repo) into ssh:// URLs.

@jedevc jedevc marked this pull request as ready for review August 13, 2023 15:18
Comment thread util/gitutil/git_protocol.go Outdated
Comment thread util/gitutil/git_protocol.go Outdated
Comment thread util/gitutil/git_protocol.go Outdated
@jedevc jedevc force-pushed the refactor-git-ssh-url-parsing branch from e4df209 to 6c78917 Compare August 14, 2023 13:58
@jedevc jedevc requested a review from AkihiroSuda August 14, 2023 14:02
Comment thread util/gitutil/git_protocol.go Outdated
Comment thread util/gitutil/git_protocol.go Outdated
Comment thread util/gitutil/git_protocol.go Outdated
@jedevc jedevc force-pushed the refactor-git-ssh-url-parsing branch 2 times, most recently from 00a04cb to 1eec8a9 Compare August 14, 2023 15:58
@alexcb
Copy link
Copy Markdown
Collaborator

alexcb commented Aug 14, 2023

Don't strip the username during parsing (I'm not 100% sure why we did this in the first place - maybe I'm missing something obvious?)

This was done in order to construct a consistent ID (that's passed to NewState(...) that didn't depend on the protocol or username, e.g.

"https://github.com/moby/buildkit", "git@github.com/moby/buildkit", and "https://username:token@github.com/moby/buildkit" all refer to the same repo, and in order to re-use the cache they should all have the same ID.

This reason was ever-so-briefly described in the old code's comment:

			// keep remote consistent with http(s) version
			remote = parts[0] + "/" + parts[1]

There might additionally be a security concern now that remote contains a username/password (or in the case of github, they actually allow a token to be passed as the username) -- credentials shouldn't be stored in the cache ID.

@jedevc
Copy link
Copy Markdown
Member Author

jedevc commented Aug 15, 2023

This was done in order to construct a consistent ID (that's passed to NewState(...) that didn't depend on the protocol or username

There might additionally be a security concern now that remote contains a username/password (or in the case of github, they actually allow a token to be passed as the username) -- credentials shouldn't be stored in the cache ID.

This should be ok I think - the cache key for git sources is computed from the git sha (see #2091)

func (gs *gitSourceHandler) CacheKey(ctx context.Context, g session.Group, index int) (string, string, solver.CacheOpts, bool, error) {

The identifier itself isn't used to compute the cache, so it should never end up in the cache id, it's only ever recorded in the op. Even then, we wouldn't be able to execute two different urls concurrently, since the vertex digests used for this are a checksum of the entire source definition (which will include the full URL).

I think we can keep the consistent ID, but I'm fairly sure that it won't actually change the caching/concurrent execution behavior.

@jedevc jedevc force-pushed the refactor-git-ssh-url-parsing branch from 1eec8a9 to 390b183 Compare August 15, 2023 11:21
@jedevc jedevc requested a review from crazy-max August 16, 2023 09:24
@jedevc jedevc force-pushed the refactor-git-ssh-url-parsing branch from 84b510f to 38b8565 Compare August 16, 2023 09:45
@jedevc jedevc force-pushed the refactor-git-ssh-url-parsing branch 2 times, most recently from 8b56a4b to a8d926a Compare August 22, 2023 10:36
jedevc and others added 4 commits August 22, 2023 11:36
Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
This should also resolve the ports parsing issue.

Co-authored-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants