fix: make column name lookups case-insensitive#5465
fix: make column name lookups case-insensitive#5465wjones127 merged 7 commits intolance-format:mainfrom
Conversation
SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive. This caused mixed-case column names (e.g., `userId`) to fail in filter expressions, scalar index creation, and merge insert. Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match. Applied this to: - SQL filter/expression parsing in Planner - Scalar index column validation - Merge insert key column resolution and join expressions Fixes lance-format#3424 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move case-insensitive column name resolution directly into Planner::column() instead of having a separate post-processing step. This is cleaner since Planner already has access to the schema. Removed the resolve_column_names function from logical_expr.rs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The case-insensitive column resolution was incorrectly replacing nested field paths (like `meta.lang`) with just the leaf field name (`lang`). Now nested paths are kept intact while simple column names still get case-insensitive resolution. Also adds tests for mixed-case and special character names in nested fields. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Added explain_plan() checks to all scalar index tests to verify the index is actually being used in the query plan by checking for "ScalarIndexQuery". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
westonpace
left a comment
There was a problem hiding this comment.
I'm unsure if this gives us consistent "try exact first and the case insensitive" behavior everywhere. Also, it might be nice to have test cases for multiple columns with different cases.
I think case-insensitive but inconsistent if multiple columns with same name is better than what we have today but having consistent behavior everywhere would be idea.
| "user-id": range(100), | ||
| "order:id": range(100, 200), | ||
| "item_name": [f"item_{i}" for i in range(100)], |
There was a problem hiding this comment.
This one is scary 😆 I wonder if we should limit the available character set for column names at all?
There was a problem hiding this comment.
I actually did have a user who wanted this. I don't think there's any reason we shouldn't support it.
- Add Field::resolve_case_insensitive for nested field path resolution - Expose field_case_insensitive to Python bindings - Update Python SDK to use case-insensitive field lookup in index APIs - Fix format_field_path to quote fields with special characters - Add Rust test for merge_insert with mixed-case key column - Add Python test for lowercased nested path index creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use non-correlated column values so tests fail if wrong column is resolved: - camelCase: 0-99 ascending - CamelCase: 99-0 descending - CAMELCASE: 50-99,0-49 rotated Each test now verifies values from multiple columns to ensure correct resolution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive - This caused mixed-case column names (e.g., `userId`) to fail in filter expressions, scalar index creation, and merge insert - Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match - [x] Added Python tests for mixed-case column names (`python/tests/test_column_names.py`) - [x] All 14 new tests pass - [x] Existing Rust tests pass (68 merge_insert tests, 45 lance-datafusion tests) Fixes lance-format#3424 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive - This caused mixed-case column names (e.g., `userId`) to fail in filter expressions, scalar index creation, and merge insert - Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match - [x] Added Python tests for mixed-case column names (`python/tests/test_column_names.py`) - [x] All 14 new tests pass - [x] Existing Rust tests pass (68 merge_insert tests, 45 lance-datafusion tests) Fixes #3424 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary - SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive - This caused mixed-case column names (e.g., `userId`) to fail in filter expressions, scalar index creation, and merge insert - Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match ## Test plan - [x] Added Python tests for mixed-case column names (`python/tests/test_column_names.py`) - [x] All 14 new tests pass - [x] Existing Rust tests pass (68 merge_insert tests, 45 lance-datafusion tests) Fixes lance-format#3424 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary - SQL parsers lowercase unquoted identifiers, but Arrow schema lookups are case-sensitive - This caused mixed-case column names (e.g., `userId`) to fail in filter expressions, scalar index creation, and merge insert - Added case-insensitive field lookup that tries exact match first, then falls back to case-insensitive match ## Test plan - [x] Added Python tests for mixed-case column names (`python/tests/test_column_names.py`) - [x] All 14 new tests pass - [x] Existing Rust tests pass (68 merge_insert tests, 45 lance-datafusion tests) Fixes lance-format#3424 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
userId) to fail in filter expressions, scalar index creation, and merge insertTest plan
python/tests/test_column_names.py)Fixes #3424
🤖 Generated with Claude Code