Skip to content

fix(spark): align parse_url empty FILE path#21969

Open
kumarUjjawal wants to merge 2 commits intoapache:mainfrom
kumarUjjawal:fix/spark_compatibility_parse_url
Open

fix(spark): align parse_url empty FILE path#21969
kumarUjjawal wants to merge 2 commits intoapache:mainfrom
kumarUjjawal:fix/spark_compatibility_parse_url

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

@kumarUjjawal kumarUjjawal commented May 1, 2026

Which issue does this PR close?

Rationale for this change

parse_url in the Spark function library did not match Spark for some empty path URL cases. In particular, absolute URLs with no explicit path were treated as if they had / because the URL parser normalizes the path.

Also handles

  • parse_url(..., 'PATH') now returns / for an explicit root path like http://example.com/.
  • parse_url(..., 'QUERY', key) now returns the raw query value, so percent-encoded values like x%20y are not decoded to x y.

What changes are included in this PR?

This PR keeps the existing URL parser, but adjusts FILE extraction so parser-normalized / is treated as empty only when the original URL had no path after the authority.

It also adds regression coverage for parse_url and try_parse_url, including the boundary between a missing path and an explicit root path.

Are these changes tested?

Yes

Are there any user-facing changes?

parse_url and try_parse_url now match Spark more closely for empty path FILE results. There is no public API change.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) spark labels May 1, 2026
@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

kumarUjjawal commented May 1, 2026

There is another issue which should be part of this pr, I'm working on the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] parse_url incompatibilities

1 participant