ci: create a new library compatibility test suite#5178
ci: create a new library compatibility test suite#5178wjones127 merged 19 commits intolance-format:mainfrom
Conversation
Previously all compatibility tests lived in a monolithic 751-line index_tests.py file mixed with infrastructure code. This made tests hard to find and maintain. Split into focused modules: - compat_decorator.py: infrastructure and @compat_test decorator - test_file_formats.py: file format compatibility tests - test_scalar_indices.py: scalar index compatibility tests - test_vector_indices.py: vector index compatibility tests Removed deprecated datagen.py and test_compat.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Performance improvements: - Removed pip upgrade step (saves ~1s per version, 7% faster) - Added --quiet flag to pip install for cleaner output Instrumentation: - Added detailed timing instrumentation for performance analysis - Timing output controlled by DEBUG=1 environment variable - Tracks venv creation, package install, Lance import, and execution time - Added PERFORMANCE.md documenting bottlenecks and optimization strategies Key findings from analysis: - Package installation: 4.9s (29% of total time) - First Lance import: ~5.0s (29% of total time) - Venv creation: 2.2s (13% of total time) - Persistent subprocess provides 500x speedup on subsequent calls 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Virtual environments are now persistent by default, stored in ~/.cache/lance-compat-venvs/. This provides a 5x speedup for interactive development after the first run. Changes: - Venvs persist across pytest sessions by default - Validation checks ensure correct Lance version is installed - Set COMPAT_TEMP_VENV=1 to use temporary venvs (old behavior) - Added cleanup instructions to PERFORMANCE.md Performance impact: - First run: ~13-16s per version (creates venv) - Subsequent runs: ~2-6s per version (reuses venv) - Example: 2 tests that took 26s now take 6s This makes iterative test development much more pleasant! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds a new CI job that runs compatibility tests across multiple Lance versions to verify forward/backward compatibility. The job: - Runs on Ubuntu 24.04 with Python 3.13 - Uses temporary venvs (COMPAT_TEMP_VENV=1) for clean CI environments - Has 60-minute timeout to account for venv creation - Tests compatibility with versions: 0.16.0, 0.30.0, 0.36.0, latest stable, and latest beta 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds compatibility tests for two additional vector index types: - IVF_HNSW_PQ: Hierarchical Navigable Small World with Product Quantization - IVF_HNSW_SQ: Hierarchical Navigable Small World with Scalar Quantization These tests only run against versions >= 0.39.0 because earlier versions do not support remapping for HNSW indices, which is required for optimize operations like compact_files(). Adds 12 new test cases (6 per index type: 3 versions × 2 test types). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
westonpace
left a comment
There was a problem hiding this comment.
A few minor nits. I suppose one question would be why is this approach better than the old CI-based one?
| flush=True, | ||
| ) | ||
|
|
||
| method = getattr(obj, method_name) |
There was a problem hiding this comment.
Why wrap the method in an object? Why not just pickle the method itself? Then you could use __name__ to get the method name for logging.
There was a problem hiding this comment.
I suppose there's two ways you could go here:
- (What I did): Put all the parameters (such as the temp directory) in the class attributes. Then you can just pickle the object and call it's method without having to pass them.
- Pickle the function and the parameters separately, and pass the parameters in on the other side.
It doesn't seem like pickle works with closures. So I couldn't just create a closure with the parameters passed in, pickle it, and run it on the other side.
I felt like option (1) was a simpler way to implement this. Plus I kind of liked using the class to organize the methods into a logical grouping.
| "0.29.1.beta2", | ||
| "0.30.0", | ||
| "0.36.0", |
There was a problem hiding this comment.
Looks like we still have a hard-coded list of versions. At one point we had discussed using "the latest X major releases" Is that still a follow-up or do we want this approach for some reason?
There was a problem hiding this comment.
We do the last stable and last beta release. We could expand that further.
Separately, there's the issue of whether we want to hard code versions. Some of the hardcoded ones here are just representing the first time we have compatibility working for that feature. I think at least that version is worth hard-coding in.
We might be able to take some of these versions out in the middle. I think some are there for no particular reason. But it's possible we might want to record some specific version in there if there was a risky compatibility change that happened in that version.
There was a problem hiding this comment.
Thinking about it, I think what I'd like to do is for all test, have it test against:
- Last stable release
- Last preview release
- Last 2 major releases
I can make that change.
Main advantage is that anyone can run this locally with one command: |
This adds a new `compat` test suite that replaces the `forwards_compat` suite. Key changes: 1. It tests both forwards **and** backwards compatibility. 2. It adds the most recent stable and beta release as comparison targets. Individual tests can also choose which versions to test against. 3. It can be run easily locally via a single command `make compattest` Closes lance-format#4416 ## How it works Tests are written like: ```python @compat_test() class MyTest(UpgradeDowngradeTest): def __init__(self, path: Path): self.path = path def create(self): # write initial data def check_read(self): # test reading data def check_write(self): # test writing data ``` Then two tests will be generated: 1. **downgrade**: We call `create` in current version, then `check_read` and `check_write` in old version. 2. **upgrade_downgrade**: We call `create` in old version, then `check_read` and `check_write` in current version, then go back and call `check_read` and `check_write` in old version again. The way we execute on old versions of the library is using a persistent executor process within special virtual environment. We create one virtual env per old version, and one executor process per virtual env. The executor process receives a tuple `(TestObject, method_name)` (for example, `(MyTest, 'create')`), which is serialized via pickle. The executor process runs the method, and returns back a status indicating if there were any errors. --------- Co-authored-by: Claude <noreply@anthropic.com>
This adds a new
compattest suite that replaces theforwards_compatsuite. Key changes:make compattestCloses #4416
How it works
Tests are written like:
Then two tests will be generated:
createin current version, thencheck_readandcheck_writein old version.createin old version, thencheck_readandcheck_writein current version, then go back and callcheck_readandcheck_writein old version again.The way we execute on old versions of the library is using a persistent executor process within special virtual environment. We create one virtual env per old version, and one executor process per virtual env. The executor process receives a tuple
(TestObject, method_name)(for example,(MyTest, 'create')), which is serialized via pickle. The executor process runs the method, and returns back a status indicating if there were any errors.