feat: add RTree index spec in table format#5360
feat: add RTree index spec in table format#5360jackye1995 merged 4 commits intolance-format:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
03e1559 to
6bfba9e
Compare
|
@jackye1995 This PR is ready for review, PTAL when you have time. |
0052764 to
5745d46
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
|
||
| The R-Tree index is a static, immutable 2D spatial index. It is built on bounding boxes to organize the data. This index is intended to accelerate rectangle-based pruning. | ||
|
|
||
| It is designed a multi-level hierarchical structure: leaf pages store tuples `(bbox, id=rowid)` for indexed geometries; branch pages aggregate child bounding boxes and store `id=pageid` pointing to child pages; a single root page encloses the entire dataset. Conceptually, it can be thought of as an extension of the B+-tree to multidimensional objects, where bounding boxes act as keys for spatial pruning. |
There was a problem hiding this comment.
nit: "It is designed as a multi-level hierarchical structure"
|
|
||
| | Column | Type | Nullable | Description | | ||
| |:-------|:----------|:---------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `bbox` | RectType | false | Type is Rect defined by [geoarrow-rs](https://github.com/geoarrow/geoarrow-rs) RectType; physical storage is Struct<xmin: {FloatType}, ymin: {FloatType}, xmax: {FloatType}, ymax: {FloatType}>. Represents the node bounding box (leaf: item bbox; branch: child aggregation). | |
There was a problem hiding this comment.
should make it clear if it is float32 or float64
|
|
||
| | Column | Type | Nullable | Description | | ||
| |:--------|:-------|:---------|:-----------------------------------------------------------| | ||
| | `nulls` | Binary | true | Serialized RowIdTreeMap of rows with null/invalid geometry | |
There was a problem hiding this comment.
this is nullable, what does it mean to have a null value?
There was a problem hiding this comment.
My mistake, this should be non-nullable
512026e to
43a2dac
Compare
|
Thank @jackye1995 for review. I've made the changes based on your comments. |
|
|
||
| Hilbert sorting imposes a linear order on 2D items using a space-filling Hilbert curve to maximize locality in both axes. This improves leaf clustering, which benefits query pruning. | ||
|
|
||
| Items are mapped to a 16‑bit grid per axis (0..65535) by proportionally normalizing bbox centers within the global bounding box. Using a fast 2D Hilbert algorithm, we compute a 32‑bit `u32` key for each item and sort items in ascending order by this Hilbert value. |
There was a problem hiding this comment.
I think we should also describe the "fast 2D Hilbert algorithm" similar to how you describe the traversal. The goal of this document is that there is enough information such that someone can look at the doc and implement another version of this same index.
There was a problem hiding this comment.
Thank @jackye1995 for the further explanation. I have added more details about Hilbert sorting along with pseudocode.
97c103f to
6667bd7
Compare
Change-Id: I60a33514d4d8e6914ae7df50eeb01e888ffb8e51
Change-Id: Iecdf94f858be5deb1e2ef0f9077f624b9fd755f5
Change-Id: I98174a8f5575681d97d1fbedbe57b234addcbe11
c34606a to
0d23c8f
Compare
Change-Id: I84fba8e9e4f4569eb582ead663201ab0aaaf2bb6
0d23c8f to
f93d063
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Approving as the vote has passed
This PR proposes adding the R-Tree index specification to the Lance table format. For implementation details please see lance-format#5034 Feel free to leave comments or share feedback
This PR proposes adding the R-Tree index specification to the Lance table format. For implementation details please see lance-format#5034 Feel free to leave comments or share feedback
This PR proposes adding the R-Tree index specification to the Lance table format. For implementation details please see #5034
Feel free to leave comments or share feedback