test: introduce query integration tests by wjones127 · Pull Request #4745 · lance-format/lance

wjones127 · 2025-09-16T16:34:08Z

This adds a new integration test suite for queries. I decided to split up the test cases based on data type, as each one has different supported indices and query types. With this, it should be easy to add new test cases and have them automatically tested for many different dataset states.

Each test is repeated for the combination of index types supported as well as table states. Right now, there are three dimensions of the table states: Fragmentation, Deletions, and File version. We can add more in the future. I think directly creating a particular state will be a more useful way to design tests than doing sequences of operations.

Why run tests with optimizations

Some of the tests are slow, and we plan to add a lot more of them. Here are timings for a clean build (no caching):

Level	build time	test time
O2	10m 15s	3.4s
O1	7m 11s	3.5s
debug	3m 15s	18.79s

With caching and fast iteration in mind, I think O1 is a good balance.

Feedback request

The goal of this PR isn't to finish this test suite. I just want to focus on making sure we are happy with this test setup. We can easily add more test cases later. Let me know what you think of the framework.

github-actions · 2025-09-16T16:34:29Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov-commenter · 2025-09-18T18:54:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cmccabe · 2025-09-22T17:03:03Z

          flags: unittests
          fail_ci_if_error: false
+
+  integration_test:


does it make sense to call this query_integration_tests? Since we may add other kinds of integration test in the future to sophon.

I'll add other integration tests for sure. But they will be in the same binary and I think run in the same CI job.

cmccabe · 2025-09-22T17:04:58Z

sorry, this is probably a dumb question, but why do these take much more time than our other tests? is it because of the greater volume of data here?

wjones127 · 2025-09-22T17:43:48Z

sorry, this is probably a dumb question, but why do these take much more time than our other tests? is it because of the greater volume of data here?

Two motivations for saying this:

We run each test for a combination of table states. So, right now, each test is run 3 x 2 x 2 = 12 ways, and that list of states might keep growing as we add more versions. That makes each individual test take longer.
I also guide people add compute-intensive tests (for example, ones building bigger ANN indices) here, rather than in the unit tests. I think those benefit the most from being run in release mode.

Xuanwo · 2025-11-20T06:41:26Z

+          ALL_FEATURES=`cargo metadata --format-version=1 --no-deps | jq -r '.packages[] | .features | keys | .[]' | grep -v -e protoc -e slow_tests | sort | uniq | paste -s -d "," -`
          cargo test --locked --features ${ALL_FEATURES}
+  query-integration-tests:
+    runs-on: ubuntu-2404-8x-arm64


Suggested change

runs-on: ubuntu-2404-8x-arm64

runs-on: ubuntu-24.04-arm64-8x

Implement fill_deleted_rows function to simulate deleted data by interleaving filler rows with id=-1. Add dynamic index parameter generation supporting scalar (BTree, Bitmap), vector (IvfFlat, IvfPq), and FTS (Inverted) indices. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Implement comprehensive test functions for dataset queries: - test_scan: Verifies scanning with ordering by id column - test_take: Tests taking specific rows by indices, validates against Arrow's take_record_batch - test_filter: Tests filtering with SQL predicates using DataFusion for comparison 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix generate_index_combinations to create cartesian product of all index options across columns, enabling tests with multiple indices - Add better error messages with expect() at key test setup points - Add test cases for duplicate indices in test_take - Document why _distance column isn't validated in ANN tests - Document that index parameters are for deterministic small test datasets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Xuanwo

Thank you!

Closes lance-format#5148 This adds a new integration test suite for queries. I decided to split up the test cases based on data type, as each one has different supported indices and query types. With this, it should be easy to add new test cases and have them automatically tested for many different dataset states. Each test is repeated for the combination of index types supported as well as table states. Right now, there are three dimensions of the table states: Fragmentation, Deletions, and File version. We can add more in the future. I think directly creating a particular state will be a more useful way to design tests than doing sequences of operations. ## Why run tests with optimizations Some of the tests are slow, and we plan to add a lot more of them. Here are timings for a clean build (no caching): | Level | build time | test time | |--------|--------|--------| | O2 | 10m 15s | 3.4s | | O1 | 7m 11s | 3.5s | | debug | 3m 15s | 18.79s | With caching and fast iteration in mind, I think O1 is a good balance. ## Feedback request The goal of this PR isn't to finish this test suite. I just want to focus on making sure we are happy with this test setup. We can easily add more test cases later. Let me know what you think of the framework. --------- Co-authored-by: Claude <noreply@anthropic.com>

wjones127 force-pushed the broad-test branch from dafa7da to 75ef95a Compare September 16, 2025 22:20

wjones127 changed the title ~~Expand testing of various queries~~ Introduce query integration tests Sep 17, 2025

wjones127 changed the title ~~Introduce query integration tests~~ test: introduce query integration tests Sep 17, 2025

github-actions Bot added the chore label Sep 17, 2025

wjones127 force-pushed the broad-test branch from eabb397 to a97a34e Compare September 18, 2025 17:59

github-actions Bot added the python label Sep 20, 2025

wjones127 marked this pull request as ready for review September 22, 2025 15:12

cmccabe reviewed Sep 22, 2025

View reviewed changes

wjones127 force-pushed the broad-test branch from c8dcb38 to a7d0c42 Compare November 4, 2025 17:45

wjones127 commented Nov 5, 2025

View reviewed changes

Comment thread rust/lance/tests/query/mod.rs Outdated

wjones127 commented Nov 5, 2025

View reviewed changes

Comment thread rust/lance/tests/query/mod.rs Outdated

wjones127 commented Nov 5, 2025

View reviewed changes

Comment thread rust/lance/tests/query/mod.rs Outdated

wjones127 mentioned this pull request Nov 6, 2025

test: add tests for more primitive types #5173

Merged

Xuanwo reviewed Nov 20, 2025

View reviewed changes

wjones127 and others added 12 commits December 8, 2025 16:38

start a broad test

41e6c4e

scaffold new approach

89893ba

get primitive tests working

c0de557

get a working ANN test

de58c17

add index fragment coverage

5ee02c6

add workflow back

eb31e55

add ci

fee88b8

simplify

0d2621b

Faster build in CI

7187672

comment out flakey cases

4557372

wjones127 and others added 5 commits December 8, 2025 16:40

fix ci

96a26f3

Apply suggestion from @wjones127

9c01a7b

Apply suggestion from @wjones127

ee05a96

Apply suggestion from @wjones127

dcd6021

wjones127 force-pushed the broad-test branch from 4343a04 to dcd6021 Compare December 9, 2025 01:03

fix runner

030567a

wjones127 requested a review from Xuanwo December 9, 2025 03:06

Xuanwo approved these changes Dec 9, 2025

View reviewed changes

Merge branch 'main' into broad-test

a325ba8

wjones127 merged commit a6b6abb into lance-format:main Dec 10, 2025
28 checks passed

wjones127 deleted the broad-test branch December 10, 2025 01:01

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: introduce query integration tests#4745

test: introduce query integration tests#4745
wjones127 merged 19 commits intolance-format:mainfrom
wjones127:broad-test

wjones127 commented Sep 16, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Sep 16, 2025

Uh oh!

codecov-commenter commented Sep 18, 2025 •

edited by codecov Bot

Loading

Uh oh!

cmccabe Sep 22, 2025

Uh oh!

wjones127 Sep 22, 2025

Uh oh!

cmccabe commented Sep 22, 2025

Uh oh!

wjones127 commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xuanwo Nov 20, 2025

Uh oh!

Xuanwo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wjones127 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why run tests with optimizations

Feedback request

Uh oh!

github-actions Bot commented Sep 16, 2025

Uh oh!

codecov-commenter commented Sep 18, 2025 • edited by codecov Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cmccabe Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

wjones127 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

cmccabe commented Sep 22, 2025

Uh oh!

wjones127 commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xuanwo Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wjones127 commented Sep 16, 2025 •

edited

Loading

codecov-commenter commented Sep 18, 2025 •

edited by codecov Bot

Loading