Skip to content

feat: support write multi fragments or empty fragment in one spark task#3183

Merged
LuQQiu merged 6 commits intolance-format:mainfrom
SaintBacchus:EmptyFragment
Dec 2, 2024
Merged

feat: support write multi fragments or empty fragment in one spark task#3183
LuQQiu merged 6 commits intolance-format:mainfrom
SaintBacchus:EmptyFragment

Conversation

@SaintBacchus
Copy link
Copy Markdown
Collaborator

@SaintBacchus SaintBacchus commented Nov 28, 2024

Now FileFragment::create only support create one file fragment and in spark connector will cause these two issues:

  1. if the spark task is empty, this api will have exception since there is no data to create the fragment.
  2. if the task data stream is very large, it will generate a huge file in lance format. It is not friendly for spark parallism.

So I remove the assigned fragment id and add a new method named FileFragment::create_fragments to generate empty or multi fragments.

image

@github-actions github-actions Bot added enhancement New feature or request java labels Nov 28, 2024
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@SaintBacchus SaintBacchus changed the title feat: Support write multi fragments or empty fragment in one spark task feat: support write multi fragments or empty fragment in one spark task Nov 28, 2024
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 85.03937% with 19 lines in your changes missing coverage. Please review.

Project coverage is 78.72%. Comparing base (dc9afbb) to head (0489103).

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment.rs 0.00% 11 Missing ⚠️
rust/lance/src/dataset/fragment/write.rs 94.73% 0 Missing and 6 partials ⚠️
java/core/lance-jni/src/fragment.rs 0.00% 1 Missing ⚠️
java/core/lance-jni/src/utils.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3183      +/-   ##
==========================================
+ Coverage   78.06%   78.72%   +0.66%     
==========================================
  Files         243      243              
  Lines       82661    82793     +132     
  Branches    82661    82793     +132     
==========================================
+ Hits        64529    65180     +651     
+ Misses      14932    14833      -99     
+ Partials     3200     2780     -420     
Flag Coverage Δ
unittests 78.72% <85.03%> (+0.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LuQQiu LuQQiu merged commit 39222ec into lance-format:main Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants