Skip to content

feat: support manifeset summary and get it from Version#4754

Merged
yanghua merged 19 commits intolance-format:mainfrom
majin1102:version-summary
Sep 24, 2025
Merged

feat: support manifeset summary and get it from Version#4754
yanghua merged 19 commits intolance-format:mainfrom
majin1102:version-summary

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

Discussion: #4337

Still we can't get transaction information directly from manifest. This will be enabled after feature #4308

@github-actions github-actions Bot added the enhancement New feature or request label Sep 17, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.90%. Comparing base (3678c5d) to head (2aab38f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4754      +/-   ##
==========================================
+ Coverage   80.88%   80.90%   +0.02%     
==========================================
  Files         323      323              
  Lines      127559   127695     +136     
  Branches   127559   127695     +136     
==========================================
+ Hits       103171   103308     +137     
  Misses      20754    20754              
+ Partials     3634     3633       -1     
Flag Coverage Δ
unittests 80.90% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@majin1102 majin1102 force-pushed the version-summary branch 3 times, most recently from 390ee29 to 0dabbf7 Compare September 17, 2025 14:50
Comment thread rust/lance-table/src/format/manifest.rs Outdated
/// - total-data-files: Total number of data files across all fragments
/// - total-deletions: Total number of deleted records
/// - total-deletion-files: Number of fragments with deletion files
pub fn summary(&self) -> BTreeMap<String, String> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why output as a map of strings? Then the caller has to parse the integers. Seems like it would be better to just expose the ManifestStats as public, right?

Copy link
Copy Markdown
Contributor Author

@majin1102 majin1102 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to extend the summary to transaction as #4337 metioned. For that part we may need string. And after that I probobly use the stats in Version struct which already have a map of string(the final usage I think accepts string more friendly).

I think you are right we could expose ManifestStats and then transform to map<string, string> for Version struct. That should be more flexible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjones127 PTAL when you have time

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. I was thinking we'd want these kinds of stats easily available anyways. Thanks!

Comment thread rust/lance-table/src/format/manifest.rs Outdated
fn from(val: ManifestStats) -> Self {
let mut stats_map = Self::new();
stats_map.insert(
"total-fragments".to_string(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kebab case does seem like an odd choice. I'm guessing this is the match the iceberg statistics?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kebab case does seem like an odd choice. I'm guessing this is the match the iceberg statistics?

The initial idea was inspired by Iceberg, and we utilized the summary for management functions. However, matching the naming convention wasn't intentional; I just didn't give it much thought.

I've switched it to snake_case. Appreciate you catching that!

Comment thread rust/lance-table/src/format/manifest.rs Outdated
summary.total_data_files += f.files.len() as u64;
// Sum the number of rows for the current fragment (if available)
if let Some(num_rows) = f.num_rows() {
summary.total_records += num_rows as u64;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to say total-data-file-rows, total-deletion-file-rows instead, and we can have a total-rows that is total-data-file-rows - total-deletion-file-rows

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you prefer kebab case or snake case? I'm OK with both

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just copied it from the docstring, looks like you need to update them to be consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done.

Feel free to let me know if you have any other thoughts

Comment thread rust/lance-table/src/format/manifest.rs Outdated
/// Get the summary information of a manifest.
///
/// This function calculates various statistics about the manifest, including:
/// - total-records: Total number of records in the dataset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like these docs are still not updated to snake case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry,my miss

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL and feel free to merge if no other comments

@yanghua yanghua merged commit ea16e39 into lance-format:main Sep 24, 2025
29 checks passed
@majin1102 majin1102 deleted the version-summary branch September 24, 2025 03:01
westonpace pushed a commit that referenced this pull request Oct 31, 2025
This PR follows #4754 :
ManifestSummary is useful in many scenarios. For example, user can get
statistical information about their datasets, and computing engines such
as Flink and Spark can dynamically decide source parallelism and
resource spec.

---------

Co-authored-by: 喆宇 <wxy407679@antgroup.com>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR follows lance-format#4754 :
ManifestSummary is useful in many scenarios. For example, user can get
statistical information about their datasets, and computing engines such
as Flink and Spark can dynamically decide source parallelism and
resource spec.

---------

Co-authored-by: 喆宇 <wxy407679@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants