Skip to content

feat: add RTree index spec in table format#5360

Merged
jackye1995 merged 4 commits intolance-format:mainfrom
ddupg:feat/rtree-spec
Dec 20, 2025
Merged

feat: add RTree index spec in table format#5360
jackye1995 merged 4 commits intolance-format:mainfrom
ddupg:feat/rtree-spec

Conversation

@ddupg
Copy link
Copy Markdown
Contributor

@ddupg ddupg commented Nov 27, 2025

This PR proposes adding the R-Tree index specification to the Lance table format. For implementation details please see #5034

Feel free to leave comments or share feedback

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@ddupg ddupg changed the title spec: add RTree index in table format feat: add RTree index in table format Nov 27, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Nov 27, 2025
@ddupg ddupg changed the title feat: add RTree index in table format feat: add RTree index spec in table format Nov 27, 2025
@ddupg
Copy link
Copy Markdown
Contributor Author

ddupg commented Dec 3, 2025

@jackye1995 This PR is ready for review, PTAL when you have time.

Comment thread docs/src/format/table/index/scalar/rtree.md
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments


The R-Tree index is a static, immutable 2D spatial index. It is built on bounding boxes to organize the data. This index is intended to accelerate rectangle-based pruning.

It is designed a multi-level hierarchical structure: leaf pages store tuples `(bbox, id=rowid)` for indexed geometries; branch pages aggregate child bounding boxes and store `id=pageid` pointing to child pages; a single root page encloses the entire dataset. Conceptually, it can be thought of as an extension of the B+-tree to multidimensional objects, where bounding boxes act as keys for spatial pruning.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "It is designed as a multi-level hierarchical structure"


| Column | Type | Nullable | Description |
|:-------|:----------|:---------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `bbox` | RectType | false | Type is Rect defined by [geoarrow-rs](https://github.com/geoarrow/geoarrow-rs) RectType; physical storage is Struct<xmin: {FloatType}, ymin: {FloatType}, xmax: {FloatType}, ymax: {FloatType}>. Represents the node bounding box (leaf: item bbox; branch: child aggregation). |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should make it clear if it is float32 or float64

Comment thread docs/src/format/table/index/scalar/rtree.md
Comment thread docs/src/format/table/index/scalar/rtree.md

| Column | Type | Nullable | Description |
|:--------|:-------|:---------|:-----------------------------------------------------------|
| `nulls` | Binary | true | Serialized RowIdTreeMap of rows with null/invalid geometry |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nullable, what does it mean to have a null value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, this should be non-nullable

Comment thread docs/src/format/table/index/scalar/rtree.md
@ddupg
Copy link
Copy Markdown
Contributor Author

ddupg commented Dec 9, 2025

Thank @jackye1995 for review. I've made the changes based on your comments.

@ddupg ddupg requested a review from jackye1995 December 9, 2025 09:03

Hilbert sorting imposes a linear order on 2D items using a space-filling Hilbert curve to maximize locality in both axes. This improves leaf clustering, which benefits query pruning.

Items are mapped to a 16‑bit grid per axis (0..65535) by proportionally normalizing bbox centers within the global bounding box. Using a fast 2D Hilbert algorithm, we compute a 32‑bit `u32` key for each item and sort items in ascending order by this Hilbert value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also describe the "fast 2D Hilbert algorithm" similar to how you describe the traversal. The goal of this document is that there is enough information such that someone can look at the doc and implement another version of this same index.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @jackye1995 for the further explanation. I have added more details about Hilbert sorting along with pseudocode.

Change-Id: I60a33514d4d8e6914ae7df50eeb01e888ffb8e51
Change-Id: Iecdf94f858be5deb1e2ef0f9077f624b9fd755f5
Change-Id: I98174a8f5575681d97d1fbedbe57b234addcbe11
@ddupg ddupg force-pushed the feat/rtree-spec branch 3 times, most recently from c34606a to 0d23c8f Compare December 15, 2025 08:58
@ddupg ddupg requested a review from jackye1995 December 15, 2025 08:59
Change-Id: I84fba8e9e4f4569eb582ead663201ab0aaaf2bb6
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the vote has passed

@jackye1995 jackye1995 merged commit d7504c2 into lance-format:main Dec 20, 2025
26 checks passed
wjones127 pushed a commit to wjones127/lance that referenced this pull request Dec 30, 2025
This PR proposes adding the R-Tree index specification to the Lance
table format. For implementation details please see lance-format#5034

Feel free to leave comments or share feedback
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR proposes adding the R-Tree index specification to the Lance
table format. For implementation details please see lance-format#5034

Feel free to leave comments or share feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants