Skip to content

importinto: require S3-like auth for nextgen import (#68231)#68234

Merged
ti-chi-bot[bot] merged 3 commits into
pingcap:release-nextgen-202603from
ti-chi-bot:cherry-pick-68231-to-release-nextgen-202603
May 9, 2026
Merged

importinto: require S3-like auth for nextgen import (#68231)#68234
ti-chi-bot[bot] merged 3 commits into
pingcap:release-nextgen-202603from
ti-chi-bot:cherry-pick-68231-to-release-nextgen-202603

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented May 8, 2026

This is an automated cherry-pick of #68231

What problem does this PR solve?

Issue Number: close #68226

Problem Summary:

In NextGen security enhanced mode, IMPORT INTO accepted S3-like storage URIs without explicit user-provided credentials. That allowed the object-store client to fall back to TiDB node-role credentials, which weakens the expected boundary for user-specified import sources.

What changed and how does it work?

This PR requires explicit authentication for S3-like IMPORT INTO sources when NextGen and SEM are enabled.

  • Adds normalized object-store query parameter matching so both dash and underscore spellings are handled consistently.
  • Defines shared S3-like query keys for access key, secret access key, and role ARN.
  • Rejects S3-like import paths unless they provide either a non-empty access key/secret access key pair or a non-empty role ARN.
  • Preserves the existing NextGen SEM behavior that rejects explicit external ID and injects the keyspace name as the external ID for allowed paths.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Unit tests:

  • ./tools/check/failpoint-go-test.sh pkg/planner/core -tags=intest,deadlock,nextgen -run TestProcessNextGenS3Path -count=1
  • ./tools/check/failpoint-go-test.sh pkg/executor -tags=intest,deadlock,nextgen -run TestNextGenS3ExternalID -count=1
  • make lint

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

In NextGen security enhanced mode, IMPORT INTO from S3-like storage now requires access key/secret access key credentials or a role ARN.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced validation for IMPORT INTO operations on S3-like cloud storage to enforce explicit authentication requirements. Statements without valid access credentials (access key/secret key) or role ARN are now rejected in next-gen kernel mode.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-nextgen-202603 labels May 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements authentication enforcement for IMPORT INTO on S3-like storage in NextGen clusters. It adds S3 credential constants, normalizes query parameter parsing, rewrites the S3 validation logic to require either access-key/secret-access-key or role-arn, and updates tests across executor, planner, and SEM integration layers.

Changes

NextGen S3 Authentication Requirements

Layer / File(s) Summary
S3 Credential Constants
pkg/objstore/s3like/store.go
Exports S3AccessKey, S3SecretAccessKey, S3RoleARN constants for credential parameter keys.
Query Parameter Normalization
pkg/objstore/parse.go
Adds NormalizeQueryParameterKey helper to lowercase and convert underscores to hyphens; ExtractQueryParameters uses it instead of inline normalization.
NextGen S3 Validation
pkg/planner/core/planbuilder.go
checkNextGenS3PathWithSem now parses normalized S3 parameters, rejects explicit external_id, and requires either both access_key and secret_access_key or role_arn; buildImportInto invokes validation unconditionally for NextGen SEM S3-like imports.
Planner Unit Tests
pkg/planner/core/planbuilder_test.go
TestProcessNextGenS3Path adds unsupported cases for external_id variants, supported cases for credential parameters (including underscore aliases), and error cases for missing credentials.
Executor Integration Tests
pkg/executor/import_into_test.go
Adds TestNextGenS3ExternalID asserting SEM rejection of credentials-less S3-like URIs; modifies "local sort" and "unsupported options" test URIs to include access-key/secret-access-key.
SEM Conditional Integration Tests
pkg/util/sem/compat/sem_integration_test.go
TestRestrictedSQL branches on kerneltype.IsNextGen(): NextGen rejects explicit EXTERNAL-ID, legacy mode preserves failpoint verification; adds import and Bazel dependency for kerneltype.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pingcap/tidb#68231: Makes parallel changes to normalize S3 query keys, add credential constants, and enforce NextGen SEM authentication requirements for IMPORT INTO.

Suggested labels

component/import, lgtm

Suggested reviewers

  • GMHDBJD
  • joechenrh
  • hawkingrei

Poem

🐰 S3 credentials now required with care,
No node-role fallback—authenticate with flair!
NextGen enforces: AK/SK or role ARN must be there,
Query parameters normalized, S3 is finally fair. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: requiring S3-like authentication for NextGen imports, which is the primary objective of this PR.
Description check ✅ Passed The description covers all required sections: issue number, problem summary, what changed, test coverage, breaking compatibility, and release notes. Content is complete and addresses the linked issue.
Linked Issues check ✅ Passed The PR implements all key requirements from #68226: normalized query parameter handling, S3-like auth key constants, enforcement of AK/SK or role ARN, rejection of credentials-less URIs, and preservation of external-ID handling.
Out of Scope Changes check ✅ Passed All changes are directly related to enforcing S3-like authentication for NextGen IMPORT INTO. No unrelated modifications to other systems or functionality were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/planner/core/planbuilder_test.go`:
- Around line 1147-1157: The test is missing S3 role-ARN cases: update the
supported-cases loop in planbuilder_test.go to include
"s3://bucket?role-arn=arn" and "s3://bucket?role_arn=arn" alongside the existing
S3 AK/SK and OSS role-ARN entries so checkNextGenS3PathWithSem is exercised for
S3 role ARN authentication; locate the loop that parses URLs and calls
checkNextGenS3PathWithSem and add those two S3 strings to the slice.

In `@pkg/planner/core/planbuilder.go`:
- Around line 6323-6341: The code treats whitespace-only auth parameters as
present by checking values.Get(k) != ""; update the checks inside the loop that
set hasAccessKey, hasSecretAccessKey, and hasRoleARN to trim whitespace before
testing non-empty (e.g., use strings.TrimSpace(values.Get(k)) != "") so that
only non-blank values count as provided; keep using
objstore.NormalizeQueryParameterKey(k) and the same s3like constants
(s3like.S3AccessKey, s3like.S3SecretAccessKey, s3like.S3RoleARN) and leave the
final validation logic (the if !hasRoleARN && !(hasAccessKey &&
hasSecretAccessKey) ...) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: af02fd39-e950-4a75-a0f1-3be9aa57900b

📥 Commits

Reviewing files that changed from the base of the PR and between 678fb23 and de7515c.

📒 Files selected for processing (7)
  • pkg/executor/import_into_test.go
  • pkg/objstore/parse.go
  • pkg/objstore/s3like/store.go
  • pkg/planner/core/planbuilder.go
  • pkg/planner/core/planbuilder_test.go
  • pkg/util/sem/compat/BUILD.bazel
  • pkg/util/sem/compat/sem_integration_test.go

Comment on lines +1147 to +1157
for _, str := range []string{
"s3://bucket?access-key=ak&secret-access-key=sk",
"s3://bucket?access_key=ak&secret_access_key=sk",
"oss://bucket?role-arn=arn",
"oss://bucket?role_arn=arn",
} {
u, err := url.Parse(str)
require.NoError(t, err)
err = checkNextGenS3PathWithSem(u)
require.NoError(t, err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the implementation of checkNextGenS3PathWithSem to see whether role-arn
# is checked per-scheme or scheme-agnostically.
ast-grep --pattern 'func checkNextGenS3PathWithSem($$$) {
  $$$
}'

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for the function definition using rg
rg "func checkNextGenS3PathWithSem" -A 30

Repository: pingcap/tidb

Length of output: 2200


🏁 Script executed:

#!/bin/bash
# Also search for where this function is defined/called
rg "checkNextGenS3PathWithSem" -B 2 -A 10

Repository: pingcap/tidb

Length of output: 4638


Add missing test cases for S3 with role ARN authentication.

The implementation of checkNextGenS3PathWithSem checks query parameters scheme-agnostically. Since it accepts role ARN as valid authentication (the logic is if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) returns error), the "supported" test block should include "s3://bucket?role-arn=arn" and "s3://bucket?role_arn=arn" alongside the existing S3 AK+SK cases and OSS role ARN cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/planner/core/planbuilder_test.go` around lines 1147 - 1157, The test is
missing S3 role-ARN cases: update the supported-cases loop in
planbuilder_test.go to include "s3://bucket?role-arn=arn" and
"s3://bucket?role_arn=arn" alongside the existing S3 AK/SK and OSS role-ARN
entries so checkNextGenS3PathWithSem is exercised for S3 role ARN
authentication; locate the loop that parses URLs and calls
checkNextGenS3PathWithSem and add those two S3 strings to the slice.

Comment on lines +6323 to +6341
hasAccessKey := false
hasSecretAccessKey := false
hasRoleARN := false
for k := range values {
lowerK := strings.ToLower(k)
if lowerK == s3like.S3ExternalID {
normalizedK := objstore.NormalizeQueryParameterKey(k)
switch normalizedK {
case s3like.S3ExternalID:
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID")
case s3like.S3AccessKey:
hasAccessKey = hasAccessKey || values.Get(k) != ""
case s3like.S3SecretAccessKey:
hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != ""
case s3like.S3RoleARN:
hasRoleARN = hasRoleARN || values.Get(k) != ""
}
}

if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) {
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Trim auth values before treating them as present.

values.Get(k) != "" accepts whitespace-only access-key, secret-access-key, and role-arn, so a URL like ...?role-arn=%20 currently passes this SEM gate even though the new contract requires non-empty explicit auth.

Suggested fix
 	for k := range values {
 		normalizedK := objstore.NormalizeQueryParameterKey(k)
 		switch normalizedK {
 		case s3like.S3ExternalID:
 			return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID")
 		case s3like.S3AccessKey:
-			hasAccessKey = hasAccessKey || values.Get(k) != ""
+			hasAccessKey = hasAccessKey || strings.TrimSpace(values.Get(k)) != ""
 		case s3like.S3SecretAccessKey:
-			hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != ""
+			hasSecretAccessKey = hasSecretAccessKey || strings.TrimSpace(values.Get(k)) != ""
 		case s3like.S3RoleARN:
-			hasRoleARN = hasRoleARN || values.Get(k) != ""
+			hasRoleARN = hasRoleARN || strings.TrimSpace(values.Get(k)) != ""
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hasAccessKey := false
hasSecretAccessKey := false
hasRoleARN := false
for k := range values {
lowerK := strings.ToLower(k)
if lowerK == s3like.S3ExternalID {
normalizedK := objstore.NormalizeQueryParameterKey(k)
switch normalizedK {
case s3like.S3ExternalID:
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID")
case s3like.S3AccessKey:
hasAccessKey = hasAccessKey || values.Get(k) != ""
case s3like.S3SecretAccessKey:
hasSecretAccessKey = hasSecretAccessKey || values.Get(k) != ""
case s3like.S3RoleARN:
hasRoleARN = hasRoleARN || values.Get(k) != ""
}
}
if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) {
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN")
hasAccessKey := false
hasSecretAccessKey := false
hasRoleARN := false
for k := range values {
normalizedK := objstore.NormalizeQueryParameterKey(k)
switch normalizedK {
case s3like.S3ExternalID:
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO with explicit external ID")
case s3like.S3AccessKey:
hasAccessKey = hasAccessKey || strings.TrimSpace(values.Get(k)) != ""
case s3like.S3SecretAccessKey:
hasSecretAccessKey = hasSecretAccessKey || strings.TrimSpace(values.Get(k)) != ""
case s3like.S3RoleARN:
hasRoleARN = hasRoleARN || strings.TrimSpace(values.Get(k)) != ""
}
}
if !hasRoleARN && !(hasAccessKey && hasSecretAccessKey) {
return plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from S3-like storage without access key/secret access key or role ARN")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/planner/core/planbuilder.go` around lines 6323 - 6341, The code treats
whitespace-only auth parameters as present by checking values.Get(k) != "";
update the checks inside the loop that set hasAccessKey, hasSecretAccessKey, and
hasRoleARN to trim whitespace before testing non-empty (e.g., use
strings.TrimSpace(values.Get(k)) != "") so that only non-blank values count as
provided; keep using objstore.NormalizeQueryParameterKey(k) and the same s3like
constants (s3like.S3AccessKey, s3like.S3SecretAccessKey, s3like.S3RoleARN) and
leave the final validation logic (the if !hasRoleARN && !(hasAccessKey &&
hasSecretAccessKey) ...) unchanged.

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release-nextgen-202603@678fb23). Learn more about missing BASE report.

Additional details and impacted files
@@                     Coverage Diff                     @@
##             release-nextgen-202603     #68234   +/-   ##
===========================================================
  Coverage                          ?   77.5687%           
===========================================================
  Files                             ?       1962           
  Lines                             ?     544099           
  Branches                          ?          0           
===========================================================
  Hits                              ?     422051           
  Misses                            ?     121196           
  Partials                          ?        852           
Flag Coverage Δ
unit 76.1749% <100.0000%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 60.9801% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@D3Hunter
Copy link
Copy Markdown
Contributor

D3Hunter commented May 8, 2026

/retest

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 9, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 9, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-08 12:32:21.101789191 +0000 UTC m=+443813.975139153: ☑️ agreed by D3Hunter.
  • 2026-05-09 02:04:01.272111903 +0000 UTC m=+492514.145461885: ☑️ agreed by hawkingrei.

@ti-chi-bot ti-chi-bot Bot merged commit 77ac297 into pingcap:release-nextgen-202603 May 9, 2026
18 checks passed
@ti-chi-bot ti-chi-bot Bot deleted the cherry-pick-68231-to-release-nextgen-202603 branch May 9, 2026 02:08
@coderabbitai coderabbitai Bot mentioned this pull request May 12, 2026
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-nextgen-202603

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants