Skip to content

fix: make column name lookups case-insensitive#5465

Merged
wjones127 merged 7 commits intolance-format:mainfrom
wjones127:fix-case-insensitive-column-names
Dec 18, 2025
Merged

fix: make column name lookups case-insensitive#5465
wjones127 merged 7 commits intolance-format:mainfrom
wjones127:fix-case-insensitive-column-names

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

Summary

  • SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive
  • This caused mixed-case column names (e.g., userId) to fail in filter expressions, scalar index creation, and merge insert
  • Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match

Test plan

  • Added Python tests for mixed-case column names (python/tests/test_column_names.py)
  • All 14 new tests pass
  • Existing Rust tests pass (68 merge_insert tests, 45 lance-datafusion tests)

Fixes #3424

🤖 Generated with Claude Code

wjones127 and others added 2 commits December 12, 2025 12:52
SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are
case-sensitive. This caused mixed-case column names (e.g., `userId`) to
fail in filter expressions, scalar index creation, and merge insert.

Added case-insensitive field lookup that tries exact match first, then
falls back to case-insensitive match. Applied this to:
- SQL filter/expression parsing in Planner
- Scalar index column validation
- Merge insert key column resolution and join expressions

Fixes lance-format#3424

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move case-insensitive column name resolution directly into Planner::column()
instead of having a separate post-processing step. This is cleaner since
Planner already has access to the schema.

Removed the resolve_column_names function from logical_expr.rs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added bug Something isn't working python labels Dec 12, 2025
The case-insensitive column resolution was incorrectly replacing nested
field paths (like `meta.lang`) with just the leaf field name (`lang`).
Now nested paths are kept intact while simple column names still get
case-insensitive resolution.

Also adds tests for mixed-case and special character names in nested
fields.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 12, 2025

Added explain_plan() checks to all scalar index tests to verify the index
is actually being used in the query plan by checking for "ScalarIndexQuery".

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wjones127 wjones127 marked this pull request as ready for review December 13, 2025 00:11
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this gives us consistent "try exact first and the case insensitive" behavior everywhere. Also, it might be nice to have test cases for multiple columns with different cases.

I think case-insensitive but inconsistent if multiple columns with same name is better than what we have today but having consistent behavior everywhere would be idea.

Comment on lines +154 to +156
"user-id": range(100),
"order:id": range(100, 200),
"item_name": [f"item_{i}" for i in range(100)],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is scary 😆 I wonder if we should limit the available character set for column names at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did have a user who wanted this. I don't think there's any reason we shouldn't support it.

Comment thread rust/lance-datafusion/src/planner.rs Outdated
Comment thread rust/lance/src/dataset/write/merge_insert.rs
Comment thread rust/lance/src/dataset/write/merge_insert/assign_action.rs
Comment thread rust/lance/src/index/create.rs Outdated
wjones127 and others added 3 commits December 18, 2025 11:38
- Add Field::resolve_case_insensitive for nested field path resolution
- Expose field_case_insensitive to Python bindings
- Update Python SDK to use case-insensitive field lookup in index APIs
- Fix format_field_path to quote fields with special characters
- Add Rust test for merge_insert with mixed-case key column
- Add Python test for lowercased nested path index creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use non-correlated column values so tests fail if wrong column is resolved:
- camelCase: 0-99 ascending
- CamelCase: 99-0 descending
- CAMELCASE: 50-99,0-49 rotated

Each test now verifies values from multiple columns to ensure correct
resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wjones127 wjones127 merged commit 8f880a5 into lance-format:main Dec 18, 2025
28 of 30 checks passed
@wjones127 wjones127 deleted the fix-case-insensitive-column-names branch December 18, 2025 22:54
wjones127 added a commit to wjones127/lance that referenced this pull request Dec 19, 2025
- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups
are case-sensitive
- This caused mixed-case column names (e.g., `userId`) to fail in filter
expressions, scalar index creation, and merge insert
- Added case-insensitive field lookup that tries exact match first, then
falls back to case-insensitive match

- [x] Added Python tests for mixed-case column names
(`python/tests/test_column_names.py`)
- [x] All 14 new tests pass
- [x] Existing Rust tests pass (68 merge_insert tests, 45
lance-datafusion tests)

Fixes lance-format#3424

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
wjones127 added a commit that referenced this pull request Dec 19, 2025
- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups
are case-sensitive
- This caused mixed-case column names (e.g., `userId`) to fail in filter
expressions, scalar index creation, and merge insert
- Added case-insensitive field lookup that tries exact match first, then
falls back to case-insensitive match

- [x] Added Python tests for mixed-case column names
(`python/tests/test_column_names.py`)
- [x] All 14 new tests pass
- [x] Existing Rust tests pass (68 merge_insert tests, 45
lance-datafusion tests)

Fixes #3424

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
wjones127 added a commit to wjones127/lance that referenced this pull request Dec 30, 2025
## Summary

- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups
are case-sensitive
- This caused mixed-case column names (e.g., `userId`) to fail in filter
expressions, scalar index creation, and merge insert
- Added case-insensitive field lookup that tries exact match first, then
falls back to case-insensitive match

## Test plan

- [x] Added Python tests for mixed-case column names
(`python/tests/test_column_names.py`)
- [x] All 14 new tests pass
- [x] Existing Rust tests pass (68 merge_insert tests, 45
lance-datafusion tests)

Fixes lance-format#3424

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
## Summary

- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups
are case-sensitive
- This caused mixed-case column names (e.g., `userId`) to fail in filter
expressions, scalar index creation, and merge insert
- Added case-insensitive field lookup that tries exact match first, then
falls back to case-insensitive match

## Test plan

- [x] Added Python tests for mixed-case column names
(`python/tests/test_column_names.py`)
- [x] All 14 new tests pass
- [x] Existing Rust tests pass (68 merge_insert tests, 45
lance-datafusion tests)

Fixes lance-format#3424

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make column references in SQL strings and column lists case insensitive

2 participants