Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 18, 2024

Draft as I run benchmarks

Which issue does this PR close?

Follow on to #12950

Rationale for this change

While reviewing #12950 from @askalt I noticed a few places where deep copies of Schema was happening when we could have simply been copying Arcs

What changes are included in this PR?

Add some more Arc to improce performance

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates labels Oct 18, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 21, 2024

Running performance tests...

/// See [`AggregateUDFImpl::with_beneficial_ordering`] for more details.
pub fn with_beneficial_ordering(
self,
self: Arc<Self>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of using Arc here?

Suggested change
self: Arc<Self>,
&self

.clone()
.with_beneficial_ordering(beneficial_ordering)?
let Some(updated_fn) =
Arc::clone(&self.fun).with_beneficial_ordering(beneficial_ordering)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Arc::clone(&self.fun).with_beneficial_ordering(beneficial_ordering)?
self.fun.with_beneficial_ordering(beneficial_ordering)?

AggregateExprBuilder::new(reverse_udf, self.args.to_vec())
.order_by(reverse_ordering_req.to_vec())
.schema(Arc::new(self.schema.clone()))
.schema(Arc::clone(&self.schema))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

data_type: DataType,
name: String,
schema: Schema,
schema: Arc<Schema>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

beneficial_ordering: bool,
) -> Result<Option<AggregateUDF>> {
self.inner
Arc::clone(&self.inner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't explain why AggregateUDFImpl::with_beneficial_ordering takes an Arc (it could as well take &self), but that's for later

@alamb
Copy link
Contributor Author

alamb commented Oct 22, 2024

I ran some benchmarks and didn't see any noticable difference in performance:


++ critcmp main alamb_more_arc
group                                         alamb_more_arc                         main
-----                                         --------------                         ----
logical_aggregate_with_join                   1.00  1413.0±16.22µs        ? ?/sec    1.00  1411.4±15.49µs        ? ?/sec
logical_plan_tpcds_all                        1.01    194.5±1.17ms        ? ?/sec    1.00    193.4±1.14ms        ? ?/sec
logical_plan_tpch_all                         1.01     24.1±0.27ms        ? ?/sec    1.00     23.9±0.22ms        ? ?/sec
logical_select_all_from_1000                  1.00      5.3±0.03ms        ? ?/sec    1.00      5.3±0.05ms        ? ?/sec
logical_select_one_from_700                   1.01  1144.7±14.16µs        ? ?/sec    1.00  1133.9±21.82µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.01  1095.8±45.65µs        ? ?/sec    1.00  1083.1±16.25µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.02  1086.2±52.96µs        ? ?/sec    1.00  1065.2±20.14µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1219.6±2.84ms        ? ?/sec    1.00   1219.7±2.84ms        ? ?/sec
physical_plan_tpch_all                        1.00     80.4±0.59ms        ? ?/sec    1.00     80.3±0.62ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.9±0.02ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.0±0.04ms        ? ?/sec    1.00      4.0±0.04ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.2±0.02ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.2±0.03ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.7±0.03ms        ? ?/sec    1.00      3.7±0.04ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.1±0.04ms        ? ?/sec    1.01      5.2±0.41ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.7±0.05ms        ? ?/sec    1.00      6.7±0.05ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.1±0.05ms        ? ?/sec    1.00      4.1±0.05ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.4±0.05ms        ? ?/sec    1.00      5.4±0.04ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.1±0.03ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.01      4.1±0.03ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1669.9±17.99µs        ? ?/sec    1.00  1663.1±18.41µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.2±0.04ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.00      6.2±0.04ms        ? ?/sec    1.00      6.2±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.9±0.03ms        ? ?/sec    1.00      4.9±0.05ms        ? ?/sec
physical_select_aggregates_from_200           1.00     28.5±0.30ms        ? ?/sec    1.00     28.5±0.28ms        ? ?/sec
physical_select_all_from_1000                 1.00     40.4±0.20ms        ? ?/sec    1.00     40.5±0.30ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.02ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec

So I am inclined to not pursue this PR

@alamb alamb closed this Oct 22, 2024
@findepi
Copy link
Member

findepi commented Oct 23, 2024

tpch queries operate on rather small (narrow) schemas.

@alamb
Copy link
Contributor Author

alamb commented Oct 23, 2024

tpch queries operate on rather small (narrow) schemas.

That is true -- I do think logical_select_all_from_1000 and logical_select_one_from_700 physical_select_aggregates_from_200 is on a wider schema

I am sure this makes things a tiny bit faster but my benchmarks suggest it isn't very measurable

@berkaysynnada
Copy link
Contributor

Wouldn't it be better to maintain consistency throughout the entire codebase?

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2024

Wouldn't it be better to maintain consistency throughout the entire codebase?

I agree it would be good to maintain consistency when possible. Are you thinking that this PR is more consistent? It is not entirely clear to me how much we like / don't like Arc (I don't think it is totally consistent yet)

@findepi
Copy link
Member

findepi commented Oct 25, 2024

That is true -- I do think logical_select_all_from_1000 and logical_select_one_from_700 physical_select_aggregates_from_200 is on a wider schema

I figured, but their names suggest these are very simple queries.

and real-life will have combination of both: wide schemas and complex query plans, where optimizers will do more changes along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants