Skip to content

Conversation

@Samyak2
Copy link
Contributor

@Samyak2 Samyak2 commented Jun 22, 2025

Which issue does this PR close?

Closes #16495

Rationale for this change

What changes are included in this PR?

Add a BaselineMetrics and remove output_rows from BuildProbeJoinMetrics.

  • The elapsed_compute in baseline is populated in the Drop trait implementation by summing up join_time and build_time.

Are these changes tested?

Here's an example of an explain analyze of a hash join showing these metrics:

[(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms]

Notice output_rows=100, elapsed_compute=2.313624ms in the above.

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 22, 2025
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left some suggestions, looking forward to your thoughts.

pub(crate) output_rows: metrics::Count,
}

impl Drop for BuildProbeJoinMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the best way but I guess it's okay to merge 🤔 If we forget to count a specific time period in some join operator, we can fix it in the future.

Could you also add a brief comment to explain this drop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is not intuitive at all. When I came back to this PR to act on your suggestions, it took me a while to remember how this works :)

I have added a comment for now. I would be open to other ways of handling this!

@Samyak2 Samyak2 force-pushed the issue-16495 branch 2 times, most recently from 6fb44a6 to b5e87fb Compare June 28, 2025 10:18
@Samyak2
Copy link
Contributor Author

Samyak2 commented Jun 28, 2025

Rebased on latest main

@2010YOUY01
Copy link
Contributor

Thank you! this implementation looks correct to me.

Since the state transition in joins are tricky, could you add a test (or ensure there are some existing tests), to double-check this refactor won't change the result of the related metrics?

Closes apache#16495

Here's an example of an `explain analyze` of a hash join showing these metrics:
```
[(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms]
```

Notice `output_rows=100, elapsed_compute=2.313624ms` in the above.
@Samyak2
Copy link
Contributor Author

Samyak2 commented Jul 5, 2025

I have added asserts for metrics in the existing join tests. The ones in hash_join and cross_join are working. The asserts in nested_loop_join are currently failing due to a mismatch in output_rows. I'm debugging this, but I have put this out here in case anyone has an idea about this (or if this is expected):

failures:

---- joins::nested_loop_join::tests::join_left_anti_with_filter stdout ----

thread 'joins::nested_loop_join::tests::join_left_anti_with_filter' panicked at datafusion/physical-plan/src/joins/nested_loop_join.rs:1405:9:
assertion `left == right` failed
  left: 0
 right: 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- joins::nested_loop_join::tests::join_left_semi_with_filter stdout ----

thread 'joins::nested_loop_join::tests::join_left_semi_with_filter' panicked at datafusion/physical-plan/src/joins/nested_loop_join.rs:1375:9:
assertion `left == right` failed
  left: 0
 right: 1

---- joins::nested_loop_join::tests::join_left_with_filter stdout ----

thread 'joins::nested_loop_join::tests::join_left_with_filter' panicked at datafusion/physical-plan/src/joins/nested_loop_join.rs:1282:9:
assertion `left == right` failed
  left: 1
 right: 3

---- joins::nested_loop_join::tests::join_full_with_filter stdout ----

thread 'joins::nested_loop_join::tests::join_full_with_filter' panicked at datafusion/physical-plan/src/joins/nested_loop_join.rs:1346:9:
assertion `left == right` failed
  left: 3
 right: 5

---- joins::nested_loop_join::tests::join_left_mark_with_filter stdout ----

thread 'joins::nested_loop_join::tests::join_left_mark_with_filter' panicked at datafusion/physical-plan/src/joins/nested_loop_join.rs:1495:9:
assertion `left == right` failed
  left: 0
 right: 3

@Samyak2
Copy link
Contributor Author

Samyak2 commented Jul 5, 2025

Actually, I see the same behavior on latest main. Looks like the output_rows metric in nested loop join is currently wrong?

@2010YOUY01
Copy link
Contributor

Actually, I see the same behavior on latest main. Looks like the output_rows metric in nested loop join is currently wrong?

Thank you for the catch! Here might also need record_poll(): https://github.com/apache/datafusion/pull/16500/files#r2187952197

@Samyak2
Copy link
Contributor Author

Samyak2 commented Jul 6, 2025

Actually, I see the same behavior on latest main. Looks like the output_rows metric in nested loop join is currently wrong?

Thank you for the catch! Here might also need record_poll(): https://github.com/apache/datafusion/pull/16500/files#r2187952197

Thanks for the pointer! It makes sense now. I have added a record_poll there and also updated the output_batches metric inside the function. Please take a look

Samyak2 added 2 commits July 6, 2025 14:42
This was needed because ExhaustedProbeSide state can also return output
rows - in certain types of joins. Without this, the output_rows metric
for nested loop join was wrong!
@Samyak2
Copy link
Contributor Author

Samyak2 commented Jul 6, 2025

Fixed the formatting issues

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again.

@alamb alamb merged commit 01698cb into apache:main Jul 7, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 7, 2025

Thanks @Samyak2 and @2010YOUY01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor BuildProbeJoinMetrics in Hash Join to reuse BaselineMetrics

3 participants