feat: clearer progress reporting for IVF#6126
Conversation
|
here is an example of the feedback that is possible with the change: |
PR ReviewClean, well-scoped change. A few minor observations: 1. After the split, the 2. In Overall this is a clean improvement to progress reporting — switching from batch-count to row-count for shuffle progress and splitting the build/merge phases gives users much better visibility into index build progress. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| .stage_start("build_partitions", num_partitions, "partitions") | ||
| .await?; | ||
| let build_idx_stream = self.build_partitions().boxed().await?; | ||
| progress.stage_complete("build_partitions").await?; |
There was a problem hiding this comment.
This part is a bit tricky because build_partitions only returns a stream. This stage could always finish so quickly that it becomes less useful.
What do you think about this?
There was a problem hiding this comment.
yeah, thanks. I agree. I observed after running this that this completed quickly but didn't focus on it being just because it was constructing a stream.
Maybe we should just rename build_partitions stage to merge_partitions here? I'm happy with that too; it'll make it clearer where the time is going.
There was a problem hiding this comment.
Yeah, I think it makes more sense.
d5d09e3 to
78d6b6b
Compare
This breaks the "build_partitions" stage into "build_partitions" and "merge_partitions", and also updates the progress reporting on the shuffle phase to be in terms of rows instead of batches.
78d6b6b to
75ba906
Compare
This breaks the "build_partitions" stage into "build_partitions" and "merge_partitions", and also updates the progress reporting on the shuffle phase to be in terms of rows instead of batches.