Skip to content

test: introduce query integration tests#4745

Merged
wjones127 merged 19 commits intolance-format:mainfrom
wjones127:broad-test
Dec 10, 2025
Merged

test: introduce query integration tests#4745
wjones127 merged 19 commits intolance-format:mainfrom
wjones127:broad-test

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 commented Sep 16, 2025

Closes #5148

This adds a new integration test suite for queries. I decided to split up the test cases based on data type, as each one has different supported indices and query types. With this, it should be easy to add new test cases and have them automatically tested for many different dataset states.

Each test is repeated for the combination of index types supported as well as table states. Right now, there are three dimensions of the table states: Fragmentation, Deletions, and File version. We can add more in the future. I think directly creating a particular state will be a more useful way to design tests than doing sequences of operations.

Why run tests with optimizations

Some of the tests are slow, and we plan to add a lot more of them. Here are timings for a clean build (no caching):

Level build time test time
O2 10m 15s 3.4s
O1 7m 11s 3.5s
debug 3m 15s 18.79s

With caching and fast iteration in mind, I think O1 is a good balance.

Feedback request

The goal of this PR isn't to finish this test suite. I just want to focus on making sure we are happy with this test setup. We can easily add more test cases later. Let me know what you think of the framework.

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@wjones127 wjones127 changed the title Expand testing of various queries Introduce query integration tests Sep 17, 2025
@wjones127 wjones127 changed the title Introduce query integration tests test: introduce query integration tests Sep 17, 2025
@github-actions github-actions Bot added the chore label Sep 17, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 marked this pull request as ready for review September 22, 2025 15:12
Comment thread .github/workflows/rust.yml Outdated
flags: unittests
fail_ci_if_error: false

integration_test:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to call this query_integration_tests? Since we may add other kinds of integration test in the future to sophon.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add other integration tests for sure. But they will be in the same binary and I think run in the same CI job.

@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Sep 22, 2025

sorry, this is probably a dumb question, but why do these take much more time than our other tests? is it because of the greater volume of data here?

@wjones127
Copy link
Copy Markdown
Contributor Author

sorry, this is probably a dumb question, but why do these take much more time than our other tests? is it because of the greater volume of data here?

Two motivations for saying this:

  1. We run each test for a combination of table states. So, right now, each test is run 3 x 2 x 2 = 12 ways, and that list of states might keep growing as we add more versions. That makes each individual test take longer.
  2. I also guide people add compute-intensive tests (for example, ones building bigger ANN indices) here, rather than in the unit tests. I think those benefit the most from being run in release mode.

Comment thread rust/lance/tests/query/mod.rs Outdated
Comment thread rust/lance/tests/query/mod.rs Outdated
Comment thread rust/lance/tests/query/mod.rs Outdated
Comment thread .github/workflows/rust.yml Outdated
ALL_FEATURES=`cargo metadata --format-version=1 --no-deps | jq -r '.packages[] | .features | keys | .[]' | grep -v -e protoc -e slow_tests | sort | uniq | paste -s -d "," -`
cargo test --locked --features ${ALL_FEATURES}
query-integration-tests:
runs-on: ubuntu-2404-8x-arm64
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runs-on: ubuntu-2404-8x-arm64
runs-on: ubuntu-24.04-arm64-8x

wjones127 and others added 12 commits December 8, 2025 16:38
Implement fill_deleted_rows function to simulate deleted data by interleaving
filler rows with id=-1. Add dynamic index parameter generation supporting scalar
(BTree, Bitmap), vector (IvfFlat, IvfPq), and FTS (Inverted) indices.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement comprehensive test functions for dataset queries:
- test_scan: Verifies scanning with ordering by id column
- test_take: Tests taking specific rows by indices, validates against Arrow's take_record_batch
- test_filter: Tests filtering with SQL predicates using DataFusion for comparison

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
wjones127 and others added 5 commits December 8, 2025 16:40
- Fix generate_index_combinations to create cartesian product of all index
  options across columns, enabling tests with multiple indices
- Add better error messages with expect() at key test setup points
- Add test cases for duplicate indices in test_take
- Document why _distance column isn't validated in ANN tests
- Document that index parameters are for deterministic small test datasets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@wjones127 wjones127 requested a review from Xuanwo December 9, 2025 03:06
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@wjones127 wjones127 merged commit a6b6abb into lance-format:main Dec 10, 2025
28 checks passed
@wjones127 wjones127 deleted the broad-test branch December 10, 2025 01:01
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Closes lance-format#5148

This adds a new integration test suite for queries. I decided to split
up the test cases based on data type, as each one has different
supported indices and query types. With this, it should be easy to add
new test cases and have them automatically tested for many different
dataset states.

Each test is repeated for the combination of index types supported as
well as table states. Right now, there are three dimensions of the table
states: Fragmentation, Deletions, and File version. We can add more in
the future. I think directly creating a particular state will be a more
useful way to design tests than doing sequences of operations.

## Why run tests with optimizations

Some of the tests are slow, and we plan to add a lot more of them. Here
are timings for a clean build (no caching):

| Level | build time | test time |
|--------|--------|--------|
| O2 | 10m 15s | 3.4s |
| O1 | 7m 11s | 3.5s |
| debug | 3m 15s | 18.79s |

With caching and fast iteration in mind, I think O1 is a good balance.

## Feedback request

The goal of this PR isn't to finish this test suite. I just want to
focus on making sure we are happy with this test setup. We can easily
add more test cases later. Let me know what you think of the framework.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query tests: Setup testing framework

4 participants