Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/src/format/file/encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,9 +184,9 @@ must be loaded at initialization time and placed in the search cache.
| 12 | Number of 8-byte words in block N |
| 4 | Log2 of number of values in block N |

The last 4 bits are special and we just store 0 today. This is because the protobuf contains the number of
values (not required to be a power of 2) in the entire disk page. We can subtract the values in the other blocks
to get the number of values in the last block.
For all chunks except the last, the lower 4 bits store `log2(num_values)` and `num_values` must be a power of two.
For the last chunk, these bits are set to `0`. The protobuf stores the total number of values in the page, so readers
can derive the final chunk size by subtracting the values from earlier chunks.

#### Buffer 2 (Dictionary, optional)

Expand Down Expand Up @@ -402,7 +402,8 @@ are always accessed together.

Packed struct is always opt-in (see section on configuration below).

Currently packed struct is limited to fixed-width data.
In Lance 2.1, packed struct is limited to fixed-width children (`PackedStruct`).
Starting with Lance 2.2, variable-width children are also supported via `VariablePackedStruct`.

### Fixed Size List

Expand Down Expand Up @@ -445,9 +446,9 @@ on a per-value basis. We use ☑️ to mark a technique that is applied on a per
| Constant | ✅ (2.1) | ❓ | ❓ |
| Bitpacking | ✅ (2.1) | ❓ | ✅ (2.1) |
| Fsst | ❓ | ✅ (2.1) | ✅ (2.1) |
| Rle | | ❌ | ✅ (2.1) |
| Rle | ✅ (2.2) | ❌ | ✅ (2.1) |
| ByteStreamSplit | ❓ | ❌ | ✅ (2.1) |
| General | | ☑️ (2.1) | ✅ (2.1) |
| General | ✅ (2.2) | ☑️ (2.1) | ✅ (2.1) |

In the following sections we will describe each technique in a bit more detail and explain how it is utilized
in various contexts.
Expand Down
5 changes: 5 additions & 0 deletions docs/src/guide/data_evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ assert dataset.schema == pa.schema([

This operation is very fast, as it only updates the metadata of the dataset.

For Lance file format `<= 2.1`, adding sub-columns under an existing `struct` is not supported.
Starting with Lance file format `2.2`, schema-only add can also extend nested `struct` fields
(including `struct` fields nested inside list types), for example by adding
`people.item.location` under `list<struct<...>>`.

### With data backfill

New columns can be added and populated within a single operation using the
Expand Down