Skip to content

fix: forward incompatibility of prerelease in writer version#5116

Merged
jackye1995 merged 2 commits intolance-format:mainfrom
jackye1995:semver-imcompat
Oct 31, 2025
Merged

fix: forward incompatibility of prerelease in writer version#5116
jackye1995 merged 2 commits intolance-format:mainfrom
jackye1995:semver-imcompat

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Oct 31, 2025

WIth previous CI change, we now use the actual prerelease version as writer version. However, old version cannot parse such version string and cause panic.

This PR makes sure that the version in WriterVersion is always just major.minor.patch. Any prerelease and build metadata are stored separately and not visible to old clients.

@github-actions github-actions Bot added bug Something isn't working java labels Oct 31, 2025
Comment thread java/lance-jni/Cargo.lock
Copy link
Copy Markdown
Contributor Author

@jackye1995 jackye1995 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the old problem of #5044

Comment thread protos/table.proto
// full semantic version by combining version, prerelease, and build_metadata.
//
// If absent, the version field is used as-is.
optional string prerelease = 3;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original proposal was to just use a single classifier string, but that makes it hard to leverage the semver parser, so I made it aligned with the semver spec

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Does this supersede #5113 ?

Comment thread protos/table.proto
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we baking in semver as a requirement for writers? Seems unnecessary for the format to be opinionated about that?

Copy link
Copy Markdown
Contributor Author

@jackye1995 jackye1995 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still works with arbitrary string. But in general it feels like recording the full semver is beneficial that we can know if the writer is a specific version, if it is the main release version or a specific beta version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't disagree with recording the full semver. More concerned that providing specific prerelease and build_metadata then we baking in semver concepts into the format. It seems like we could just have a field, version_extra or something like that where we put that segment of the version.

Copy link
Copy Markdown
Contributor Author

@jackye1995 jackye1995 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I went back and forth with it tbh.

Originally I just have a classifier field (that was originally what was proposed on Slack), then my internal debate is that for example I have a 1.2.3-beta.2 string, I can choose to store beta.2 in classifier, and I just split by - string to get that split. But what if there is a 1.2.3+build.abcde string which is still semver in the future, then it does not work.

I also thought about storing -beta.2 in the classifier, but then we need a parser to seek to the first non-version position of the string and then split there. It feels like a bit of an overkill, given for the Lance library it is basically always semver. Even if we have some internal versions in the future, it probably still makes sense to follow the semver part and just have -internal.2 or leverage build metadata to store internal info, so we can easily infer the ordering of different versions.

So that was my thoughts to arrive at this state.

I guess having a single classifier/version_extra field would also work, let me know which you prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking you could just parse semver as str_concat(version, version_extra).

Copy link
Copy Markdown
Contributor Author

@jackye1995 jackye1995 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is I think the reverse, how do you move from a version string to version and version_extra. It is not clear to me if for example if I have 1.0-beta.1, this is not a standard semver, does it mean I need to put everything to version, or split it to 1.0 and -beta.1.

The current approach provides a clear rule that (1) if it is semver, then can leverage those additional fields, (2) if it is not semver, everything is still stored in the version string.

If we want to move to just a version_extra, looks like we will define something like if it starts with 3 numbers connected with 2 dots, those go to version, and the rest go to the extra. Does that sound good?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach provides a clear rule that (1) if it is semver, then can leverage those additional fields, (2) if it is not semver, everything is still stored in the version string.

Okay, I guess that is fine.

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the collision with #5113. Could we bring over the forward compat tests? I want to make sure we have tests to catch issues like this.

Comment thread protos/table.proto
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 97.18310% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.80%. Comparing base (9ed9ee2) to head (29bc5b6).

Files with missing lines Patch % Lines
rust/lance-table/src/format/manifest.rs 97.18% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5116      +/-   ##
==========================================
+ Coverage   81.77%   81.80%   +0.02%     
==========================================
  Files         340      340              
  Lines      140102   140237     +135     
  Branches   140102   140237     +135     
==========================================
+ Hits       114568   114714     +146     
+ Misses      21729    21716      -13     
- Partials     3805     3807       +2     
Flag Coverage Δ
unittests 81.80% <97.18%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackye1995
Copy link
Copy Markdown
Contributor Author

Failure due to flaky tests, merging

@jackye1995 jackye1995 merged commit 4cfb99b into lance-format:main Oct 31, 2025
25 of 27 checks passed
jackye1995 added a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…ormat#5116)

WIth previous CI change, we now use the actual prerelease version as
writer version. However, old version cannot parse such version string
and cause panic.

This PR makes sure that the version in WriterVersion is always just
major.minor.patch. Any prerelease and build metadata are stored
separately and not visible to old clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants