importinto: require S3-like auth for nextgen import (#68231)#68234
Conversation
📝 WalkthroughWalkthroughThis PR implements authentication enforcement for IMPORT INTO on S3-like storage in NextGen clusters. It adds S3 credential constants, normalizes query parameter parsing, rewrites the S3 validation logic to require either access-key/secret-access-key or role-arn, and updates tests across executor, planner, and SEM integration layers. ChangesNextGen S3 Authentication Requirements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/planner/core/planbuilder_test.go`:
- Around line 1147-1157: The test is missing S3 role-ARN cases: update the
supported-cases loop in planbuilder_test.go to include
"s3://bucket?role-arn=arn" and "s3://bucket?role_arn=arn" alongside the existing
S3 AK/SK and OSS role-ARN entries so checkNextGenS3PathWithSem is exercised for
S3 role ARN authentication; locate the loop that parses URLs and calls
checkNextGenS3PathWithSem and add those two S3 strings to the slice.
In `@pkg/planner/core/planbuilder.go`:
- Around line 6323-6341: The code treats whitespace-only auth parameters as
present by checking values.Get(k) != ""; update the checks inside the loop that
set hasAccessKey, hasSecretAccessKey, and hasRoleARN to trim whitespace before
testing non-empty (e.g., use strings.TrimSpace(values.Get(k)) != "") so that
only non-blank values count as provided; keep using
objstore.NormalizeQueryParameterKey(k) and the same s3like constants
(s3like.S3AccessKey, s3like.S3SecretAccessKey, s3like.S3RoleARN) and leave the
final validation logic (the if !hasRoleARN && !(hasAccessKey &&
hasSecretAccessKey) ...) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: af02fd39-e950-4a75-a0f1-3be9aa57900b
📒 Files selected for processing (7)
pkg/executor/import_into_test.gopkg/objstore/parse.gopkg/objstore/s3like/store.gopkg/planner/core/planbuilder.gopkg/planner/core/planbuilder_test.gopkg/util/sem/compat/BUILD.bazelpkg/util/sem/compat/sem_integration_test.go
| for _, str := range []string{ | ||
| "s3://bucket?access-key=ak&secret-access-key=sk", | ||
| "s3://bucket?access_key=ak&secret_access_key=sk", | ||
| "oss://bucket?role-arn=arn", | ||
| "oss://bucket?role_arn=arn", | ||
| } { | ||
| u, err := url.Parse(str) | ||
| require.NoError(t, err) | ||
| err = checkNextGenS3PathWithSem(u) | ||
| require.NoError(t, err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the implementation of checkNextGenS3PathWithSem to see whether role-arn
# is checked per-scheme or scheme-agnostically.
ast-grep --pattern 'func checkNextGenS3PathWithSem($$$) {
$$$
}'Repository: pingcap/tidb
Length of output: 38
🏁 Script executed:
#!/bin/bash
# Search for the function definition using rg
rg "func checkNextGenS3PathWithSem" -A 30Repository: pingcap/tidb
Length of output: 2200
🏁 Script executed:
#!/bin/bash
# Also search for where this function is defined/called
rg "checkNextGenS3PathWithSem" -B 2 -A 10Repository: pingcap/tidb
Length of output: 4638
Add missing test cases for S3 with role ARN authentication.
The implementation of checkNextGenS3PathWithSem checks query parameters scheme-agnostically. Since it accepts role ARN as valid authentication (the logic is if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) returns error), the "supported" test block should include "s3://bucket?role-arn=arn" and "s3://bucket?role_arn=arn" alongside the existing S3 AK+SK cases and OSS role ARN cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/planner/core/planbuilder_test.go` around lines 1147 - 1157, The test is
missing S3 role-ARN cases: update the supported-cases loop in
planbuilder_test.go to include "s3://bucket?role-arn=arn" and
"s3://bucket?role_arn=arn" alongside the existing S3 AK/SK and OSS role-ARN
entries so checkNextGenS3PathWithSem is exercised for S3 role ARN
authentication; locate the loop that parses URLs and calls
checkNextGenS3PathWithSem and add those two S3 strings to the slice.
| hasAccessKey := false | ||
| hasSecretAccessKey := false | ||
| hasRoleARN := false | ||
| for k := range values { | ||
| lowerK := strings.ToLower(k) | ||
| if lowerK == s3like.S3ExternalID { | ||
| normalizedK := objstore.NormalizeQueryParameterKey(k) | ||
| switch normalizedK { | ||
| case s3like.S3ExternalID: | ||
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID") | ||
| case s3like.S3AccessKey: | ||
| hasAccessKey = hasAccessKey || values.Get(k) != "" | ||
| case s3like.S3SecretAccessKey: | ||
| hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != "" | ||
| case s3like.S3RoleARN: | ||
| hasRoleARN = hasRoleARN || values.Get(k) != "" | ||
| } | ||
| } | ||
|
|
||
| if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) { | ||
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN") |
There was a problem hiding this comment.
Trim auth values before treating them as present.
values.Get(k) != "" accepts whitespace-only access-key, secret-access-key, and role-arn, so a URL like ...?role-arn=%20 currently passes this SEM gate even though the new contract requires non-empty explicit auth.
Suggested fix
for k := range values {
normalizedK := objstore.NormalizeQueryParameterKey(k)
switch normalizedK {
case s3like.S3ExternalID:
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID")
case s3like.S3AccessKey:
- hasAccessKey = hasAccessKey || values.Get(k) != ""
+ hasAccessKey = hasAccessKey || strings.TrimSpace(values.Get(k)) != ""
case s3like.S3SecretAccessKey:
- hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != ""
+ hasSecretAccessKey = hasSecretAccessKey || strings.TrimSpace(values.Get(k)) != ""
case s3like.S3RoleARN:
- hasRoleARN = hasRoleARN || values.Get(k) != ""
+ hasRoleARN = hasRoleARN || strings.TrimSpace(values.Get(k)) != ""
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hasAccessKey := false | |
| hasSecretAccessKey := false | |
| hasRoleARN := false | |
| for k := range values { | |
| lowerK := strings.ToLower(k) | |
| if lowerK == s3like.S3ExternalID { | |
| normalizedK := objstore.NormalizeQueryParameterKey(k) | |
| switch normalizedK { | |
| case s3like.S3ExternalID: | |
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID") | |
| case s3like.S3AccessKey: | |
| hasAccessKey = hasAccessKey || values.Get(k) != "" | |
| case s3like.S3SecretAccessKey: | |
| hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != "" | |
| case s3like.S3RoleARN: | |
| hasRoleARN = hasRoleARN || values.Get(k) != "" | |
| } | |
| } | |
| if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) { | |
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN") | |
| hasAccessKey := false | |
| hasSecretAccessKey := false | |
| hasRoleARN := false | |
| for k := range values { | |
| normalizedK := objstore.NormalizeQueryParameterKey(k) | |
| switch normalizedK { | |
| case s3like.S3ExternalID: | |
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID") | |
| case s3like.S3AccessKey: | |
| hasAccessKey = hasAccessKey || strings.TrimSpace(values.Get(k)) != "" | |
| case s3like.S3SecretAccessKey: | |
| hasSecretAccessKey = hasSecretAccessKey || strings.TrimSpace(values.Get(k)) != "" | |
| case s3like.S3RoleARN: | |
| hasRoleARN = hasRoleARN || strings.TrimSpace(values.Get(k)) != "" | |
| } | |
| } | |
| if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) { | |
| return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/planner/core/planbuilder.go` around lines 6323 - 6341, The code treats
whitespace-only auth parameters as present by checking values.Get(k) != "";
update the checks inside the loop that set hasAccessKey, hasSecretAccessKey, and
hasRoleARN to trim whitespace before testing non-empty (e.g., use
strings.TrimSpace(values.Get(k)) != "") so that only non-blank values count as
provided; keep using objstore.NormalizeQueryParameterKey(k) and the same s3like
constants (s3like.S3AccessKey, s3like.S3SecretAccessKey, s3like.S3RoleARN) and
leave the final validation logic (the if !hasRoleARN && !(hasAccessKey &&
hasSecretAccessKey) ...) unchanged.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release-nextgen-202603 #68234 +/- ##
===========================================================
Coverage ? 77.5687%
===========================================================
Files ? 1962
Lines ? 544099
Branches ? 0
===========================================================
Hits ? 422051
Misses ? 121196
Partials ? 852
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, hawkingrei The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
77ac297
into
pingcap:release-nextgen-202603
This is an automated cherry-pick of #68231
What problem does this PR solve?
Issue Number: close #68226
Problem Summary:
In NextGen security enhanced mode,
IMPORT INTOaccepted S3-like storage URIs without explicit user-provided credentials. That allowed the object-store client to fall back to TiDB node-role credentials, which weakens the expected boundary for user-specified import sources.What changed and how does it work?
This PR requires explicit authentication for S3-like
IMPORT INTOsources when NextGen and SEM are enabled.Check List
Tests
Unit tests:
./tools/check/failpoint-go-test.sh pkg/planner/core -tags=intest,deadlock,nextgen -run TestProcessNextGenS3Path -count=1./tools/check/failpoint-go-test.sh pkg/executor -tags=intest,deadlock,nextgen -run TestNextGenS3ExternalID -count=1make lintSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
IMPORT INTOoperations on S3-like cloud storage to enforce explicit authentication requirements. Statements without valid access credentials (access key/secret key) or role ARN are now rejected in next-gen kernel mode.