Background
Currently:
- SpillMetrics (per operator) are updated only at the end of a spill.
- DiskManager tracks
used_disk_space (current total) but doesn't expose a structured "progress" view.
Proposed Changes
- Real-time Metric Updates in SpillMetrics: modify
InProgressSpillFile to ensure spilled_bytes
and spill_file_count metrics are updated as soon as the data is written to disk.
- Initial update: In append_batch, when the IPCStreamWriter is first created, immediately call
update_disk_usage() on the file and add the size (schema/header) to spilled_bytes
- Incremental update: After each writer.write(batch) call, call update_disk_usage() and add the delta size to
spilled_bytes
- Final update: In finish() call update_disk_usage() after finishing the writer and add the remaining delta size (footer/metadata) to spilled_bytes
.
- Spilling Progress Interface in DiskManager: expose the current global state of the disk manager.
- New SpillingProgress struct
pub struct SpillingProgress {
/// Total bytes currently used on disk for spilling
pub current_bytes: u64,
/// Total number of active spill files
pub active_files_count: usize,
}
- Implement
spilling_progress(&self) -> SpillingProgress
- Delegate Interface in RuntimeEnv: provide a convenient entry point for users.
let progress = ctx.runtime_env().spilling_progress();
Then users could call the API to get the real-time spilling progress, for our use case, we want to call this from the SQL UI to give users the real-time feedback about their SQLs.
Background
Currently:
used_disk_space(current total) but doesn't expose a structured "progress" view.Proposed Changes
InProgressSpillFileto ensurespilled_bytesand
spill_file_countmetrics are updated as soon as the data is written to disk.update_disk_usage()on the file and add the size (schema/header) tospilled_bytesspilled_bytes
.
spilling_progress(&self) -> SpillingProgressThen users could call the API to get the real-time spilling progress, for our use case, we want to call this from the SQL UI to give users the real-time feedback about their SQLs.