-
Notifications
You must be signed in to change notification settings - Fork 638
feat(blob_v2): add dedicated blob support #5406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
deb1db4
Save work
Xuanwo f4ac526
save work
Xuanwo 3f7334d
Merge remote-tracking branch 'origin/main' into xuanwo/blobv2-dedicated
Xuanwo 24f391d
Fix build
Xuanwo b3a6b17
remove not needed checks
Xuanwo 438d3d4
refactor
Xuanwo d949ed9
refactor blob logic
Xuanwo e0d03d8
refactor
Xuanwo b73d3b3
Remove not needed comment
Xuanwo 151e4de
Fix build
Xuanwo b8a07d3
Fix build
Xuanwo 51816ed
Fix existsing tests
Xuanwo 6f522ba
Add tests
Xuanwo 55d645e
Polish blob id
Xuanwo f24739b
polish
Xuanwo 0289d8d
use better logic for blob path
Xuanwo 026a12c
Fix uri should use utf-8 instead of large-utf8
Xuanwo 682fee5
Address comments
Xuanwo 93f1977
Merge branch 'main' into xuanwo/blobv2-dedicated
Xuanwo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: Copyright The Lance Authors | ||
|
|
||
| use object_store::path::Path; | ||
|
|
||
| /// Format a dedicated blob sidecar path for a data file. | ||
| /// | ||
| /// Layout: `<base>/<data_file_key>/<blob_id>.raw` | ||
| /// - `base` is typically the dataset's data directory. | ||
| /// - `data_file_key` is the stem of the data file (without extension). | ||
| pub fn blob_path(base: &Path, data_file_key: &str, blob_id: u32) -> Path { | ||
| let file_name = format!("{:08x}.raw", blob_id); | ||
| base.child(data_file_key).child(file_name.as_str()) | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn test_blob_path_formatting() { | ||
| let base = Path::from("base"); | ||
| let path = blob_path(&base, "deadbeef", 2); | ||
| assert_eq!(path.to_string(), "base/deadbeef/00000002.raw"); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I hadn't expected path to include
data_file_keybut I don't see any harm in it and it could maybe be useful for things like cleanup?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I picked up this design to make the GC much easier: we can just remove all blob files under the same
data_file_keyonce we decide to remove that data file.