-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(sort): auto-tune buffer sizing from available memory #8959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
810d0e6
feat(sort): auto-tune buffer sizing from available memory
mattsu2020 e197e64
Merge branch 'uutils:main' into sort-memory-functions
mattsu2020 7160f59
docs: add 'sysconf' to jargon wordlist
mattsu2020 e694ae6
refactor(sort): extract buffer hint logic to separate module
mattsu2020 0ebf12b
refactor(sort): Explicitly cast to u128 in physical_memory_bytes_unix
mattsu2020 e4d46f4
refactor(sort): improve readability of cfg attribute for physical_mem…
mattsu2020 280e127
style(buffer_hint): remove unnecessary blank line in physical_memory_…
mattsu2020 ddb36bc
refactor(sort): remove unnecessary return statement in physical_memor…
mattsu2020 bfa172e
fix: correct typo in buffer_hint.rs comment
mattsu2020 d273a69
Merge branch 'uutils:main' into sort-memory-functions
mattsu2020 92a4574
docs: add license header to buffer_hint.rs
mattsu2020 f8de88e
Update src/uu/sort/src/buffer_hint.rs
mattsu2020 5725d06
docs(sort): add comment explaining memory detection limitation
mattsu2020 f941f1c
refactor(sort): enhance physical memory detection for Unix systems
mattsu2020 5586e8a
refactor(uu/sort): remove libc dependency and use named constants for…
mattsu2020 c7298c9
refactor(sort): reorder imports in buffer_hint.rs for consistency
mattsu2020 7fd534e
fix Cargo.lock linux enviroments
mattsu2020 08a9548
Merge branch 'uutils:main' into sort-memory-functions
mattsu2020 19fd282
Merge branch 'uutils:main' into sort-memory-functions
mattsu2020 72201b2
Merge branch 'uutils:main' into sort-memory-functions
mattsu2020 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,6 +131,7 @@ symlink | |
| symlinks | ||
| syscall | ||
| syscalls | ||
| sysconf | ||
| tokenize | ||
| toolchain | ||
| truthy | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| // This file is part of the uutils coreutils package. | ||
| // | ||
| // For the full copyright and license information, please view the LICENSE | ||
| // file that was distributed with this source code. | ||
|
|
||
| //! Heuristics for determining buffer size for external sorting. | ||
| use std::ffi::OsString; | ||
|
|
||
| use crate::{ | ||
| FALLBACK_AUTOMATIC_BUF_SIZE, MAX_AUTOMATIC_BUF_SIZE, MIN_AUTOMATIC_BUF_SIZE, STDIN_FILE, | ||
| }; | ||
|
|
||
| // Heuristics to size the external sort buffer without overcommit memory. | ||
| pub(crate) fn automatic_buffer_size(files: &[OsString]) -> usize { | ||
| let file_hint = file_size_hint(files); | ||
| let mem_hint = available_memory_hint(); | ||
|
|
||
| // Prefer the tighter bound when both hints exist, otherwise fall back to whichever hint is available. | ||
| match (file_hint, mem_hint) { | ||
| (Some(file), Some(mem)) => file.min(mem), | ||
| (Some(file), None) => file, | ||
| (None, Some(mem)) => mem, | ||
| (None, None) => FALLBACK_AUTOMATIC_BUF_SIZE, | ||
| } | ||
| } | ||
|
|
||
| fn file_size_hint(files: &[OsString]) -> Option<usize> { | ||
| // Estimate total bytes across real files; non-regular inputs are skipped. | ||
| let mut total_bytes: u128 = 0; | ||
|
|
||
| for file in files { | ||
| if file == STDIN_FILE { | ||
| continue; | ||
| } | ||
|
|
||
| let Ok(metadata) = std::fs::metadata(file) else { | ||
| continue; | ||
| }; | ||
|
|
||
| if !metadata.is_file() { | ||
| continue; | ||
| } | ||
|
|
||
| total_bytes = total_bytes.saturating_add(metadata.len() as u128); | ||
|
|
||
| if total_bytes >= (MAX_AUTOMATIC_BUF_SIZE as u128) * 8 { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if total_bytes == 0 { | ||
| return None; | ||
| } | ||
|
|
||
| let desired_bytes = desired_file_buffer_bytes(total_bytes); | ||
| Some(clamp_hint(desired_bytes)) | ||
| } | ||
|
|
||
| fn available_memory_hint() -> Option<usize> { | ||
| #[cfg(target_os = "linux")] | ||
| if let Some(bytes) = uucore::parser::parse_size::available_memory_bytes() { | ||
| return Some(clamp_hint(bytes / 4)); | ||
| } | ||
|
|
||
| physical_memory_bytes().map(|bytes| clamp_hint(bytes / 4)) | ||
| } | ||
|
|
||
| fn clamp_hint(bytes: u128) -> usize { | ||
| let min = MIN_AUTOMATIC_BUF_SIZE as u128; | ||
| let max = MAX_AUTOMATIC_BUF_SIZE as u128; | ||
| let clamped = bytes.clamp(min, max); | ||
| clamped.min(usize::MAX as u128) as usize | ||
| } | ||
|
|
||
| fn desired_file_buffer_bytes(total_bytes: u128) -> u128 { | ||
| if total_bytes == 0 { | ||
| return 0; | ||
| } | ||
|
|
||
| let max = MAX_AUTOMATIC_BUF_SIZE as u128; | ||
|
|
||
| if total_bytes <= max { | ||
| return total_bytes.saturating_mul(12).clamp(total_bytes, max); | ||
| } | ||
|
|
||
| let quarter = total_bytes / 4; | ||
| quarter.max(max) | ||
| } | ||
|
|
||
| fn physical_memory_bytes() -> Option<u128> { | ||
| #[cfg(all( | ||
| target_family = "unix", | ||
| not(target_os = "redox"), | ||
| any(target_os = "linux", target_os = "android") | ||
| ))] | ||
| { | ||
| physical_memory_bytes_unix() | ||
| } | ||
|
|
||
| #[cfg(any( | ||
| not(target_family = "unix"), | ||
| target_os = "redox", | ||
| not(any(target_os = "linux", target_os = "android")) | ||
| ))] | ||
| { | ||
mattsu2020 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // No portable or safe API is available here to detect total physical memory. | ||
| None | ||
| } | ||
| } | ||
|
|
||
| #[cfg(all( | ||
| target_family = "unix", | ||
| not(target_os = "redox"), | ||
| any(target_os = "linux", target_os = "android") | ||
| ))] | ||
| fn physical_memory_bytes_unix() -> Option<u128> { | ||
| use nix::unistd::{SysconfVar, sysconf}; | ||
|
|
||
| let pages = match sysconf(SysconfVar::_PHYS_PAGES) { | ||
| Ok(Some(pages)) if pages > 0 => u128::try_from(pages).ok()?, | ||
| _ => return None, | ||
| }; | ||
|
|
||
| let page_size = match sysconf(SysconfVar::PAGE_SIZE) { | ||
| Ok(Some(page_size)) if page_size > 0 => u128::try_from(page_size).ok()?, | ||
| _ => return None, | ||
| }; | ||
|
|
||
| Some(pages.saturating_mul(page_size)) | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn desired_buffer_matches_total_when_small() { | ||
| let six_mebibytes = 6 * 1024 * 1024; | ||
| let expected = ((six_mebibytes as u128) * 12) | ||
| .clamp(six_mebibytes as u128, crate::MAX_AUTOMATIC_BUF_SIZE as u128); | ||
| assert_eq!(desired_file_buffer_bytes(six_mebibytes as u128), expected); | ||
| } | ||
|
|
||
| #[test] | ||
| fn desired_buffer_caps_at_max_for_large_inputs() { | ||
| let large = 256 * 1024 * 1024; // 256 MiB | ||
| assert_eq!( | ||
| desired_file_buffer_bytes(large as u128), | ||
| crate::MAX_AUTOMATIC_BUF_SIZE as u128 | ||
| ); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.