feat(cleanup): support rate limiter for cleanup operation#6084
feat(cleanup): support rate limiter for cleanup operation#6084Xuanwo merged 12 commits intolance-format:mainfrom
Conversation
PR Review: feat(cleanup): support rate limiter for cleanup operationP1: Panic with very small rate valuesIn let duration = Duration::from_secs_f64(1.0 / rate);The builder validation rejects non-positive and non-finite values, but a very small positive finite rate (e.g. Even more directly: Suggested fix: Add a lower-bound check in the builder validation, or use let duration = Duration::try_from_secs_f64(1.0 / rate)
.map_err(|e| Error::Cleanup {
message: format!("delete_rate_limit {} is too small: {}", rate, e),
})?;P1: Test timing sensitivityThe Rust, Python, and Java tests all assert wall-clock elapsed time
This is a minor concern — just flagging for awareness. Overall the approach is sound — using |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| /// Maximum number of delete operations per second. If None, no rate limiting is applied. | ||
| /// | ||
| /// Use this to avoid hitting S3 (or other object store) request rate limits during cleanup. | ||
| /// For example, `Some(100.0)` limits deletions to 100 files per second. |
There was a problem hiding this comment.
Our delete is using batch delete which is 1k files per op, so 100 limit her should be 100k files per second.
| rate | ||
| ); | ||
| let duration = Duration::try_from_secs_f64(1.0 / rate).map_err(|e| Error::Cleanup { | ||
| message: format!("delete_rate_limit {} is too small: {}", rate, e), |
There was a problem hiding this comment.
Setting the rate to a value that is too small does not seem logical to me. We can instead set it to the smallest meaningful value. For example, we could set the rate's minimum to 1, which would mean one operation per second.
| /// # Errors | ||
| /// | ||
| /// Returns an error if `rate` is not a positive finite number. | ||
| pub fn delete_rate_limit(mut self, rate: f64) -> Result<Self> { |
There was a problem hiding this comment.
Do we really need to make this a float?
| time.sleep(1) | ||
| lance.write_dataset(table, base_dir, mode="overwrite") | ||
| time.sleep(1) | ||
| lance.write_dataset(table, base_dir, mode="overwrite") | ||
| time.sleep(1) |
There was a problem hiding this comment.
Are these sleeps needed? Each write will get a unique timestamp. You can fetch the timestamps with a get_versions call. This will help avoid slow test times.
|
Hi @Xuanwo and @westonpace Thanks a lot for your view!
|
Closes #3291