From 95cf8b6dbbead052dd70a556c9f44fb47b9fc6ee Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Tue, 3 Feb 2026 16:51:06 +0800
Subject: [PATCH 1/6] docs: add lance skills as user guide

---
 skills/README.md                              |  13 +
 skills/lance-user-guide/SKILL.md              | 227 ++++++++++++++++++
 .../references/index-selection.md             |  69 ++++++
 .../references/io-cheatsheet.md               |  69 ++++++
 .../scripts/python_end_to_end.py              |  79 ++++++
 5 files changed, 457 insertions(+)
 create mode 100644 skills/README.md
 create mode 100644 skills/lance-user-guide/SKILL.md
 create mode 100644 skills/lance-user-guide/references/index-selection.md
 create mode 100644 skills/lance-user-guide/references/io-cheatsheet.md
 create mode 100644 skills/lance-user-guide/scripts/python_end_to_end.py

diff --git a/skills/README.md b/skills/README.md
new file mode 100644
index 00000000000..3bc81d019f8
--- /dev/null
+++ b/skills/README.md
@@ -0,0 +1,13 @@
+# Skills
+
+This directory contains code agent skills for the Lance project.
+
+Each skill is a folder that contains a required `SKILL.md` (with YAML frontmatter) and optional `scripts/`, `references/`, and `assets/`.
+
+## Install
+
+```bash
+npx skills add lance-format/lance
+```
+
+Restart code agents after installing.
diff --git a/skills/lance-user-guide/SKILL.md b/skills/lance-user-guide/SKILL.md
new file mode 100644
index 00000000000..3855ae86467
--- /dev/null
+++ b/skills/lance-user-guide/SKILL.md
@@ -0,0 +1,227 @@
+---
+name: lance-user-guide
+description: Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, INVERTED, FTS, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
+---
+
+# Lance User Guide
+
+## Scope
+
+Use this skill to answer questions about:
+
+- Writing datasets (create/append/overwrite) and reading/scanning datasets
+- Vector search (nearest-neighbor queries) and vector index creation/tuning
+- Scalar index creation and choosing a scalar index type for a filter workload
+- Combining filters (metadata predicates) with vector search
+
+Do not use this skill for:
+
+- Contributing to Lance itself (repo development, internal architecture)
+- File format internals beyond what is required to use the API correctly
+
+## Installation (quick)
+
+Python:
+
+```bash
+pip install pylance
+```
+
+Verify:
+
+```bash
+python -c "import lance; print(lance.__version__)"
+```
+
+Rust:
+
+```bash
+cargo add lance
+```
+
+Or add it to `Cargo.toml` (choose an appropriate version for your project):
+
+```toml
+[dependencies]
+lance = "x.y"
+```
+
+From source (this repository):
+
+```bash
+maturin develop -m python/Cargo.toml
+```
+
+## Minimal intake (ask only what you need)
+
+Collect the minimum information required to avoid wrong guidance:
+
+- Language/API surface: Python / Rust / CLI
+- Storage: local filesystem / S3 / other object store
+- Workload: scan-only / filter-heavy / vector search / hybrid (vector + filter)
+- Vector details (if applicable): dimension, metric (L2/cosine/dot), latency target, recall target
+- Update pattern: mostly append / frequent overwrite / frequent deletes/updates
+- Data scale: approximate row count and whether there are many small files
+
+If the user does not specify a language, default to Python examples and provide a short mapping to Rust concepts.
+
+## Workflow decision tree
+
+1. If the question is "How do I write or update data?": use the **Write** playbook.
+2. If the question is "How do I read / scan / filter?": use the **Read** playbook.
+3. If the question is "How do I do kNN / vector search?": use the **Vector search** playbook.
+4. If the question is "Which index should I use?": consult `references/index-selection.md` and confirm constraints.
+5. If the question is "Why is this slow / why are results missing?": use **Troubleshooting** and ask for a minimal reproduction.
+
+## Primary playbooks (Python)
+
+### Write
+
+Prefer `lance.write_dataset` for most user workflows.
+
+```python
+import lance
+import pyarrow as pa
+
+vectors = pa.array(
+    [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
+    type=pa.list_(pa.float32(), 3),
+)
+table = pa.table({"id": [1, 2], "vector": vectors, "category": ["a", "b"]})
+
+ds = lance.write_dataset(table, "my-data.lance", mode="create")
+ds = lance.write_dataset(table, "my-data.lance", mode="append")
+ds = lance.write_dataset(table, "my-data.lance", mode="overwrite")
+```
+
+Validation checklist:
+
+- Re-open and count rows: `lance.dataset(uri).count_rows()`
+- Confirm schema: `lance.dataset(uri).schema`
+
+Notes:
+
+- Use `storage_options={...}` when writing to an object store URI.
+- If the user mentions non-atomic object stores, mention `commit_lock` and point them to the user guide.
+
+### Read
+
+Use `lance.dataset` + `scanner(...)` for pushdowns (projection, filter, limit, nearest).
+
+```python
+import lance
+
+ds = lance.dataset("my-data.lance")
+tbl = ds.scanner(
+    columns=["id", "category"],
+    filter="category = 'a' and id >= 10",
+    limit=100,
+).to_table()
+```
+
+Validation checklist:
+
+- If performance is the concern, ask for a minimal `scanner(...)` call that reproduces it.
+- If correctness is the concern, ask for the exact `filter` string and whether `prefilter` is enabled (when using `nearest`).
+
+### Vector search (nearest)
+
+Run vector search with `scanner(nearest=...)` or `to_table(nearest=...)`.
+
+```python
+import lance
+import numpy as np
+
+ds = lance.dataset("my-data.lance")
+q = np.array([1.0, 2.0, 3.0], dtype=np.float32)
+tbl = ds.to_table(nearest={"column": "vector", "q": q, "k": 10})
+```
+
+If combining a filter with vector search, decide whether the filter must run before the vector query:
+
+- Use `prefilter=True` when the filter is highly selective and correctness (top-k among filtered rows) matters.
+- Use `prefilter=False` when the filter is not very selective and speed matters, and accept that results can be fewer than `k`.
+
+```python
+tbl = ds.scanner(
+    nearest={"column": "vector", "q": q, "k": 10},
+    filter="category = 'a'",
+    prefilter=True,
+).to_table()
+```
+
+### Build a vector index
+
+Create a vector index with `LanceDataset.create_index(...)`.
+
+Start with a minimal working configuration:
+
+```python
+ds = lance.dataset("my-data.lance")
+ds = ds.create_index(
+    "vector",
+    index_type="IVF_PQ",
+    num_partitions=256,
+    num_sub_vectors=16,
+)
+```
+
+Then verify:
+
+- `ds.describe_indices()` (preferred) or `ds.list_indices()` (can be expensive)
+- A small `nearest` query that uses the index
+
+For parameter selection and tuning, consult `references/index-selection.md`.
+
+### Build a scalar index
+
+Scalar indices speed up scans with filters. Use `create_scalar_index` for a stable entry point.
+
+```python
+ds = lance.dataset("my-data.lance")
+ds.create_scalar_index("category", "BTREE", replace=True)
+```
+
+Then verify:
+
+- `ds.describe_indices()`
+- A representative `scanner(filter=...)` query
+
+To choose a scalar index type (BTREE vs BITMAP vs INVERTED/FTS/NGRAM, etc.), consult `references/index-selection.md`.
+
+## Troubleshooting patterns
+
+### "Vector search + filter returns fewer than k rows"
+
+- Explain the difference between post-filtering and pre-filtering.
+- Suggest `prefilter=True` if the user expects top-k among filtered rows.
+
+### "Index creation is slow"
+
+- Confirm vector dimension and `num_sub_vectors`.
+- For IVF_PQ, call out the common pitfall: avoid misaligned `dimension / num_sub_vectors` (see `references/index-selection.md`).
+
+### "Scan is slow even with a scalar index"
+
+- Ask whether the filter is compatible with the index (equality vs range vs text search).
+- Suggest checking whether scalar index usage is disabled (`use_scalar_index=False`).
+
+## Local verification (when a repo checkout is available)
+
+When answering API questions, confirm the exact signature and docstrings locally:
+
+- Python I/O entry points: `python/python/lance/dataset.py` (`write_dataset`, `LanceDataset.scanner`)
+- Vector indexing: `python/python/lance/dataset.py` (`create_index`)
+- Scalar indexing: `python/python/lance/dataset.py` (`create_scalar_index`)
+
+Use targeted search:
+
+```bash
+rg -n "def write_dataset\\b|def create_index\\b|def create_scalar_index\\b|def scanner\\b" python/python/lance/dataset.py
+```
+
+## Bundled resources
+
+- Index selection and tuning: `references/index-selection.md`
+- I/O and versioning cheat sheet: `references/io-cheatsheet.md`
+- Runnable minimal example: `scripts/python_end_to_end.py`
diff --git a/skills/lance-user-guide/references/index-selection.md b/skills/lance-user-guide/references/index-selection.md
new file mode 100644
index 00000000000..aee43816641
--- /dev/null
+++ b/skills/lance-user-guide/references/index-selection.md
@@ -0,0 +1,69 @@
+## Index selection (quick)
+
+Use this file when the user asks "which index should I use" or "how do I tune it".
+
+Always confirm:
+
+- The query pattern (filter-only, vector-only, hybrid)
+- Data scale (rows, vector dimension)
+- Update pattern (append vs frequent updates/deletes)
+- Correctness needs (must return top-k within a filtered subset vs best-effort)
+
+## Decision table
+
+| Workload | Recommended starting point | Notes |
+| --- | --- | --- |
+| Filter-only scans (`scanner(filter=...)`) | Create a scalar index on the filtered column | Choose scalar index type based on predicate shape and cardinality |
+| Vector search only (`nearest=...`) on large data | Build a vector index | Start with `IVF_PQ` if you need compression; tune `nprobes` / `refine_factor` |
+| Vector search + selective filter | Scalar index for filter + vector index for search | Use `prefilter=True` when you need true top-k among filtered rows |
+| Vector search + non-selective filter | Vector index only (or scalar index optional) | Consider `prefilter=False` for speed; accept fewer than k results |
+| Text search | Create a text-oriented scalar index | Use `full_text_query=...` when available; verify the supported index type in the current Lance version |
+
+## Vector index types (user-facing summary)
+
+Vector index names typically follow a pattern like `{clustering}_{sub_index}_{quantization}`.
+
+Common combinations:
+
+- `IVF_PQ`: IVF clustering + PQ compression
+- `IVF_HNSW_SQ`: IVF clustering + HNSW + SQ
+- `IVF_SQ`: IVF clustering + SQ
+- `IVF_RQ`: IVF clustering + RQ
+- `IVF_FLAT`: IVF clustering + no quantization (exact vectors within clusters)
+
+If you are unsure which types are supported in the user's environment, recommend starting with `IVF_PQ` and fall back to "try and see" (the API will error on unsupported types).
+
+## Vector index creation defaults
+
+Start with:
+
+- `index_type="IVF_PQ"`
+- `num_partitions`: 64 to 1024 (higher for larger datasets)
+- `num_sub_vectors`: choose a value that divides the vector dimension
+
+Practical warning (performance):
+
+- Avoid misalignment: `(dimension / num_sub_vectors) % 8 == 0` is a common sweet spot for faster index creation.
+
+## Vector search tuning defaults
+
+Tune recall vs latency with:
+
+- `nprobes`: how many IVF partitions to search
+- `refine_factor`: how many candidates to re-rank to improve accuracy
+
+When a user reports "too slow" or "bad recall", ask for:
+
+- Current `nprobes`, `refine_factor`, and index type
+- Whether the query is using `prefilter`
+
+## Scalar index selection (starting guidance)
+
+Choose scalar index type based on the filter expression:
+
+- Equality filters on high-cardinality columns: start with `BTREE`
+- Equality / IN-list filters on low-cardinality columns: start with `BITMAP`
+- Text search: start with `FTS` (or other text index types supported by the version)
+- Range filters: start with range-friendly options (for example `ZONEMAP` when appropriate)
+
+If you cannot confidently map the filter to an index type, recommend `BTREE` as a safe baseline and confirm via a small benchmark on representative queries.
diff --git a/skills/lance-user-guide/references/io-cheatsheet.md b/skills/lance-user-guide/references/io-cheatsheet.md
new file mode 100644
index 00000000000..acb34ac233a
--- /dev/null
+++ b/skills/lance-user-guide/references/io-cheatsheet.md
@@ -0,0 +1,69 @@
+## I/O cheat sheet (Python)
+
+Use this file when the user asks how to write/read Lance datasets, manage versions, or work with object stores.
+
+## Write a dataset
+
+Use `lance.write_dataset(data, uri, mode=...)`.
+
+Modes:
+
+- `mode="create"`: create new dataset (error if exists)
+- `mode="overwrite"`: create a new version that replaces the latest snapshot
+- `mode="append"`: append data as a new version (or create if missing)
+
+Inputs:
+
+- `pyarrow.Table`
+- `pyarrow.RecordBatchReader`
+- pandas DataFrame
+- other reader-like sources supported by the installed Lance version
+
+## Open a dataset
+
+Use `lance.dataset(uri, version=..., asof=..., storage_options=...)`.
+
+Notes:
+
+- `version` can be a number or a tag (depending on the environment/version).
+- Use `storage_options` for object stores (credentials, endpoint, etc.).
+
+## Read / scan
+
+Use `ds.scanner(...)` for pushdowns:
+
+- `columns=[...]` for projection
+- `filter="..."` for predicate pushdown
+- `limit=...` for limit pushdown
+- `nearest={...}` for vector search
+- `prefilter=True/False` to control filter ordering when combined with `nearest`
+- `use_scalar_index=True/False` to control scalar index usage
+
+Then materialize:
+
+- `scanner(...).to_table()`
+- `scanner(...).to_batches()`
+
+## Hybrid query: vector + filter
+
+Use a scalar index for the filter column when the filter is selective and you set `prefilter=True`.
+
+Example:
+
+```python
+tbl = ds.scanner(
+    nearest={"column": "vector", "q": q, "k": 10},
+    filter="category = 'a'",
+    prefilter=True,
+).to_table()
+```
+
+## Inspect indices
+
+Prefer:
+
+- `ds.describe_indices()`
+
+Use with care:
+
+- `ds.list_indices()` can be expensive because it may load index statistics.
diff --git a/skills/lance-user-guide/scripts/python_end_to_end.py b/skills/lance-user-guide/scripts/python_end_to_end.py
new file mode 100644
index 00000000000..0d7e70aa6ed
--- /dev/null
+++ b/skills/lance-user-guide/scripts/python_end_to_end.py
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import argparse
+from pathlib import Path
+
+import numpy as np
+import pyarrow as pa
+
+import lance
+
+
+def _build_fixed_size_vectors(num_rows: int, dim: int) -> tuple[pa.FixedSizeListArray, np.ndarray]:
+    vectors = np.random.rand(num_rows, dim).astype("float32")
+    flat = pa.array(vectors.reshape(-1), type=pa.float32())
+    return pa.FixedSizeListArray.from_arrays(flat, dim), vectors
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Minimal Lance write/index/query example")
+    parser.add_argument("--uri", default="example.lance", help="Dataset URI (directory)")
+    parser.add_argument("--mode", default="overwrite", choices=["create", "append", "overwrite"])
+    parser.add_argument("--rows", type=int, default=1000)
+    parser.add_argument("--dim", type=int, default=32)
+
+    parser.add_argument("--build-scalar-index", action="store_true")
+    parser.add_argument("--build-vector-index", action="store_true")
+
+    parser.add_argument("--vector-index-type", default="IVF_PQ")
+    parser.add_argument("--num-partitions", type=int, default=64)
+    parser.add_argument("--num-sub-vectors", type=int, default=8)
+
+    parser.add_argument("--k", type=int, default=10)
+    parser.add_argument("--filter", default="category = 'a'")
+    parser.add_argument("--prefilter", action="store_true")
+
+    args = parser.parse_args()
+
+    uri = str(Path(args.uri))
+    vec_arr, vec_np = _build_fixed_size_vectors(args.rows, args.dim)
+    categories = pa.array(["a" if i % 2 == 0 else "b" for i in range(args.rows)])
+    table = pa.table({"id": pa.array(range(args.rows), pa.int64()), "category": categories, "vector": vec_arr})
+
+    ds = lance.write_dataset(table, uri, mode=args.mode)
+    ds = lance.dataset(uri)
+
+    if args.build_scalar_index:
+        ds.create_scalar_index("category", "BTREE", replace=True)
+
+    if args.build_vector_index:
+        ds = ds.create_index(
+            "vector",
+            index_type=args.vector_index_type,
+            num_partitions=args.num_partitions,
+            num_sub_vectors=args.num_sub_vectors,
+        )
+
+    print(f"uri={ds.uri}")
+    print(f"rows={ds.count_rows()}")
+    print("indices=")
+    for idx in ds.describe_indices():
+        print(f"  - {idx}")
+
+    q = vec_np[0]
+    scan = ds.scanner(
+        nearest={"column": "vector", "q": q, "k": args.k},
+        filter=args.filter if args.filter else None,
+        prefilter=args.prefilter,
+    )
+    result = scan.to_table()
+    print("result_schema=")
+    print(result.schema)
+    print("result_preview=")
+    print(result.slice(0, 5).to_pydict())
+
+
+if __name__ == "__main__":
+    main()

From cd00ef67d85deb6abcfbf55fe41653f5f69db37e Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Tue, 3 Feb 2026 16:57:43 +0800
Subject: [PATCH 2/6] docs(skills): use target_partition_size for vector index

---
 skills/lance-user-guide/SKILL.md                      | 2 +-
 skills/lance-user-guide/references/index-selection.md | 2 +-
 skills/lance-user-guide/scripts/python_end_to_end.py  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/skills/lance-user-guide/SKILL.md b/skills/lance-user-guide/SKILL.md
index 3855ae86467..d1bf7d38128 100644
--- a/skills/lance-user-guide/SKILL.md
+++ b/skills/lance-user-guide/SKILL.md
@@ -161,7 +161,7 @@ ds = lance.dataset("my-data.lance")
 ds = ds.create_index(
     "vector",
     index_type="IVF_PQ",
-    num_partitions=256,
+    target_partition_size=8192,
     num_sub_vectors=16,
 )
 ```
diff --git a/skills/lance-user-guide/references/index-selection.md b/skills/lance-user-guide/references/index-selection.md
index aee43816641..b225d333b1d 100644
--- a/skills/lance-user-guide/references/index-selection.md
+++ b/skills/lance-user-guide/references/index-selection.md
@@ -38,7 +38,7 @@ If you are unsure which types are supported in the user's environment, recommend
 Start with:
 
 - `index_type="IVF_PQ"`
-- `num_partitions`: 64 to 1024 (higher for larger datasets)
+- `target_partition_size`: start with 8192 and adjust based on the dataset size and latency/recall needs
 - `num_sub_vectors`: choose a value that divides the vector dimension
 
 Practical warning (performance):
diff --git a/skills/lance-user-guide/scripts/python_end_to_end.py b/skills/lance-user-guide/scripts/python_end_to_end.py
index 0d7e70aa6ed..ec2d02713c9 100644
--- a/skills/lance-user-guide/scripts/python_end_to_end.py
+++ b/skills/lance-user-guide/scripts/python_end_to_end.py
@@ -28,7 +28,7 @@ def main() -> None:
     parser.add_argument("--build-vector-index", action="store_true")
 
     parser.add_argument("--vector-index-type", default="IVF_PQ")
-    parser.add_argument("--num-partitions", type=int, default=64)
+    parser.add_argument("--target-partition-size", type=int, default=8192)
     parser.add_argument("--num-sub-vectors", type=int, default=8)
 
     parser.add_argument("--k", type=int, default=10)
@@ -52,7 +52,7 @@ def main() -> None:
         ds = ds.create_index(
             "vector",
             index_type=args.vector_index_type,
-            num_partitions=args.num_partitions,
+            target_partition_size=args.target_partition_size,
             num_sub_vectors=args.num_sub_vectors,
         )
 

From fb7da11233f660da6a209928c7f733a748b82640 Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Thu, 5 Feb 2026 17:30:10 +0800
Subject: [PATCH 3/6] docs(skills): clarify installation and compatibility

---
 skills/README.md                              | 43 +++++++++++++++++--
 skills/lance-user-guide/SKILL.md              |  4 ++
 .../scripts/python_end_to_end.py              |  5 +++
 3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/skills/README.md b/skills/README.md
index 3bc81d019f8..16ca9e508bd 100644
--- a/skills/README.md
+++ b/skills/README.md
@@ -1,13 +1,48 @@
 # Skills
 
-This directory contains code agent skills for the Lance project.
+This directory contains Codex-compatible skills for the Lance project.
 
 Each skill is a folder that contains a required `SKILL.md` (with YAML frontmatter) and optional `scripts/`, `references/`, and `assets/`.
 
-## Install
+## Install (npx skills)
+
+If you use `skills.sh`, install from GitHub:
+
+```bash
+npx skills add lance-format/lance --skill lance-user-guide
+```
+
+Install globally (user-level):
+
+```bash
+npx skills add lance-format/lance --skill lance-user-guide -g
+```
+
+List available skills in this repository:
+
+```bash
+npx skills add lance-format/lance --list
+```
+
+## Install (manual copy)
+
+Codex typically loads skills from:
+
+- Project: `.codex/skills/<skill-name>/`
+- Global: `~/.codex/skills/<skill-name>/`
+
+Install into the current repository:
+
+```bash
+mkdir -p .codex/skills
+cp -R skills/lance-user-guide .codex/skills/
+```
+
+Install globally:
 
 ```bash
-npx skills add lance-format/lance
+mkdir -p ~/.codex/skills
+cp -R skills/lance-user-guide ~/.codex/skills/
 ```
 
-Restart code agents after installing.
+Restart Codex after installing or updating skills.
diff --git a/skills/lance-user-guide/SKILL.md b/skills/lance-user-guide/SKILL.md
index d1bf7d38128..e227a2d83f2 100644
--- a/skills/lance-user-guide/SKILL.md
+++ b/skills/lance-user-guide/SKILL.md
@@ -173,6 +173,10 @@ Then verify:
 
 For parameter selection and tuning, consult `references/index-selection.md`.
 
+Compatibility note:
+
+- `target_partition_size` is preferred for new code. If your installed Lance Python SDK does not support it, fall back to `num_partitions` (deprecated).
+
 ### Build a scalar index
 
 Scalar indices speed up scans with filters. Use `create_scalar_index` for a stable entry point.
diff --git a/skills/lance-user-guide/scripts/python_end_to_end.py b/skills/lance-user-guide/scripts/python_end_to_end.py
index ec2d02713c9..e2bc07654ee 100644
--- a/skills/lance-user-guide/scripts/python_end_to_end.py
+++ b/skills/lance-user-guide/scripts/python_end_to_end.py
@@ -37,6 +37,11 @@ def main() -> None:
 
     args = parser.parse_args()
 
+    if args.num_sub_vectors <= 0:
+        raise ValueError("--num-sub-vectors must be positive")
+    if args.dim % args.num_sub_vectors != 0:
+        raise ValueError("--dim must be divisible by --num-sub-vectors")
+
     uri = str(Path(args.uri))
     vec_arr, vec_np = _build_fixed_size_vectors(args.rows, args.dim)
     categories = pa.array(["a" if i % 2 == 0 else "b" for i in range(args.rows)])

From 7edcc66be986945a58f20c4707155de112abfae2 Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Thu, 5 Feb 2026 17:32:40 +0800
Subject: [PATCH 4/6] Revert "docs(skills): clarify installation and
 compatibility"

This reverts commit fb7da11233f660da6a209928c7f733a748b82640.
---
 skills/README.md                              | 43 ++-----------------
 skills/lance-user-guide/SKILL.md              |  4 --
 .../scripts/python_end_to_end.py              |  5 ---
 3 files changed, 4 insertions(+), 48 deletions(-)

diff --git a/skills/README.md b/skills/README.md
index 16ca9e508bd..3bc81d019f8 100644
--- a/skills/README.md
+++ b/skills/README.md
@@ -1,48 +1,13 @@
 # Skills
 
-This directory contains Codex-compatible skills for the Lance project.
+This directory contains code agent skills for the Lance project.
 
 Each skill is a folder that contains a required `SKILL.md` (with YAML frontmatter) and optional `scripts/`, `references/`, and `assets/`.
 
-## Install (npx skills)
-
-If you use `skills.sh`, install from GitHub:
-
-```bash
-npx skills add lance-format/lance --skill lance-user-guide
-```
-
-Install globally (user-level):
-
-```bash
-npx skills add lance-format/lance --skill lance-user-guide -g
-```
-
-List available skills in this repository:
-
-```bash
-npx skills add lance-format/lance --list
-```
-
-## Install (manual copy)
-
-Codex typically loads skills from:
-
-- Project: `.codex/skills/<skill-name>/`
-- Global: `~/.codex/skills/<skill-name>/`
-
-Install into the current repository:
-
-```bash
-mkdir -p .codex/skills
-cp -R skills/lance-user-guide .codex/skills/
-```
-
-Install globally:
+## Install
 
 ```bash
-mkdir -p ~/.codex/skills
-cp -R skills/lance-user-guide ~/.codex/skills/
+npx skills add lance-format/lance
 ```
 
-Restart Codex after installing or updating skills.
+Restart code agents after installing.
diff --git a/skills/lance-user-guide/SKILL.md b/skills/lance-user-guide/SKILL.md
index e227a2d83f2..d1bf7d38128 100644
--- a/skills/lance-user-guide/SKILL.md
+++ b/skills/lance-user-guide/SKILL.md
@@ -173,10 +173,6 @@ Then verify:
 
 For parameter selection and tuning, consult `references/index-selection.md`.
 
-Compatibility note:
-
-- `target_partition_size` is preferred for new code. If your installed Lance Python SDK does not support it, fall back to `num_partitions` (deprecated).
-
 ### Build a scalar index
 
 Scalar indices speed up scans with filters. Use `create_scalar_index` for a stable entry point.
diff --git a/skills/lance-user-guide/scripts/python_end_to_end.py b/skills/lance-user-guide/scripts/python_end_to_end.py
index e2bc07654ee..ec2d02713c9 100644
--- a/skills/lance-user-guide/scripts/python_end_to_end.py
+++ b/skills/lance-user-guide/scripts/python_end_to_end.py
@@ -37,11 +37,6 @@ def main() -> None:
 
     args = parser.parse_args()
 
-    if args.num_sub_vectors <= 0:
-        raise ValueError("--num-sub-vectors must be positive")
-    if args.dim % args.num_sub_vectors != 0:
-        raise ValueError("--dim must be divisible by --num-sub-vectors")
-
     uri = str(Path(args.uri))
     vec_arr, vec_np = _build_fixed_size_vectors(args.rows, args.dim)
     categories = pa.array(["a" if i % 2 == 0 else "b" for i in range(args.rows)])

From b3ee727c4fc66c6009c5c2e95051c7e9876ffd56 Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Thu, 5 Feb 2026 17:45:13 +0800
Subject: [PATCH 5/6] Address comments

---
 skills/lance-user-guide/SKILL.md              |  4 ++--
 .../references/index-selection.md             | 23 +++++++++++++++++--
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/skills/lance-user-guide/SKILL.md b/skills/lance-user-guide/SKILL.md
index d1bf7d38128..4bf7eb515c5 100644
--- a/skills/lance-user-guide/SKILL.md
+++ b/skills/lance-user-guide/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: lance-user-guide
-description: Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, INVERTED, FTS, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
+description: Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, LABEL_LIST, NGRAM, INVERTED, BLOOMFILTER, RTREE, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
 ---
 
 # Lance User Guide
@@ -187,7 +187,7 @@ Then verify:
 - `ds.describe_indices()`
 - A representative `scanner(filter=...)` query
 
-To choose a scalar index type (BTREE vs BITMAP vs INVERTED/FTS/NGRAM, etc.), consult `references/index-selection.md`.
+To choose a scalar index type (BTREE vs BITMAP vs LABEL_LIST vs NGRAM vs INVERTED, etc.), consult `references/index-selection.md`.
 
 ## Troubleshooting patterns
 
diff --git a/skills/lance-user-guide/references/index-selection.md b/skills/lance-user-guide/references/index-selection.md
index b225d333b1d..7f6d926c138 100644
--- a/skills/lance-user-guide/references/index-selection.md
+++ b/skills/lance-user-guide/references/index-selection.md
@@ -17,7 +17,7 @@ Always confirm:
 | Vector search only (`nearest=...`) on large data | Build a vector index | Start with `IVF_PQ` if you need compression; tune `nprobes` / `refine_factor` |
 | Vector search + selective filter | Scalar index for filter + vector index for search | Use `prefilter=True` when you need true top-k among filtered rows |
 | Vector search + non-selective filter | Vector index only (or scalar index optional) | Consider `prefilter=False` for speed; accept fewer than k results |
-| Text search | Create a text-oriented scalar index | Use `full_text_query=...` when available; verify the supported index type in the current Lance version |
+| Text search | Create an `INVERTED` scalar index | Use `full_text_query=...` when available; note that `FTS` is not a universal alias in all SDK versions |
 
 ## Vector index types (user-facing summary)
 
@@ -63,7 +63,26 @@ Choose scalar index type based on the filter expression:
 
 - Equality filters on high-cardinality columns: start with `BTREE`
 - Equality / IN-list filters on low-cardinality columns: start with `BITMAP`
-- Text search: start with `FTS` (or other text index types supported by the version)
+- List membership filters on list-like columns: start with `LABEL_LIST`
+- Substring / `contains(...)` filters on strings: start with `NGRAM`
+- Text search: start with `INVERTED`
 - Range filters: start with range-friendly options (for example `ZONEMAP` when appropriate)
+- Highly selective negative membership / presence checks: consider `BLOOMFILTER` (inexact)
+- Geospatial queries (if present in your build): use `RTREE`
+
+## JSON fields
+
+Lance scalar indices are created on physical columns. If you want to index a JSON sub-field:
+
+1. Materialize the extracted value into a new column (for example with `add_columns`)
+2. Create a scalar index on that new column
+
+Example (Python, using SQL expressions):
+
+```python
+ds = lance.dataset(uri)
+ds.add_columns({"country": "json_extract(payload, '$.country')"})
+ds.create_scalar_index("country", "BTREE", replace=True)
+```
 
 If you cannot confidently map the filter to an index type, recommend `BTREE` as a safe baseline and confirm via a small benchmark on representative queries.

From 494065fdfa44ae581b46475d2100875cffacc0b0 Mon Sep 17 00:00:00 2001
From: Xuanwo <github@xuanwo.io>
Date: Mon, 23 Feb 2026 17:05:49 +0800
Subject: [PATCH 6/6] Update
 skills/lance-user-guide/references/index-selection.md

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
---
 skills/lance-user-guide/references/index-selection.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/skills/lance-user-guide/references/index-selection.md b/skills/lance-user-guide/references/index-selection.md
index 7f6d926c138..f83764f1a67 100644
--- a/skills/lance-user-guide/references/index-selection.md
+++ b/skills/lance-user-guide/references/index-selection.md
@@ -65,7 +65,7 @@ Choose scalar index type based on the filter expression:
 - Equality / IN-list filters on low-cardinality columns: start with `BITMAP`
 - List membership filters on list-like columns: start with `LABEL_LIST`
 - Substring / `contains(...)` filters on strings: start with `NGRAM`
-- Text search: start with `INVERTED`
+- Full-text search (FTS): start with `INVERTED`
 - Range filters: start with range-friendly options (for example `ZONEMAP` when appropriate)
 - Highly selective negative membership / presence checks: consider `BLOOMFILTER` (inexact)
 - Geospatial queries (if present in your build): use `RTREE`