Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Aug 16, 2021

Which issue does this PR close?

Closes #679

Note: If people basically like this API I will go ahead and add unit tests for metrics.rs (e.g. for aggregate_by_partition)

Rationale for this change

See the description on #679 (comment) for the full rationale, but the TLDR version is:

  1. Better align SQLMetric data model to ease integration in other metric systems (e.g. prometheus, influxdb, etc)
  2. Ability to get per-partition metrics
  3. Ability to get current metric values during execution

What changes are included in this PR?

  1. Update the SQLMetric API to be in its own module, have labels, know about partitions, and allow for real time inspection
  2. Update uses of SQLMetric in DataFusion and Ballista to the new API
  3. Functionality to aggregate (sum) metrics via predicate and via partition

Are there any user-facing changes?

No

The SQLMetric API is basically now totally different so any code that creates / uses SQLMetrics would have to be updated.

Notes

In keeping with Rust's tradition of static typing, I also changed to using a more strongly typed version of these metrics to avoid mistakes such as adding a "time" to a counter value, as well as allowing other counter specific operations.

Open Questions:

The current SQL counters use "camel case" for the counter names (e.g. numRows) rather than the Rust standard "snake case" (e.g. num_rows). I kept the same naming convention in this PR, but I wonder if we want to make them more Rust standard snake case given we are messing with them all anyways.

Not included in this PR:

  1. Ensure that all operators have reasonable metrics: Add "baseline" metrics to all built in operators #866
  2. Support for a global "operator id" as described by @andygrove in Improved features and interoperability for SQLMetrics #679 (comment)

@alamb alamb added the api change Changes the API exposed to users of the crate label Aug 16, 2021
@alamb
Copy link
Contributor Author

alamb commented Aug 16, 2021

cc @tustvold would like your opinion on the suitability / consistency of the metrics.rs API in this PR to other metric APIs

/// Sums the values for metrics for which `f(metric)` returns
/// true, and returns the value. Returns None if no metrics match
/// the predicate.
pub fn sum<F>(&self, mut f: F) -> Option<usize>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some of the aggregation primitives (sum and group by partition). I feel this API may grow as we understand the usecases more

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked in great detail at the instrumentation of the operators themselves, nor am I particularly familiar with what came before, but this makes a lot of sense to me. Very nice 👍

Edit: r.e. snake case vs camel case, FWIW most metrics systems I've interacted with don't support upper-case letters, nor hyphens, so snake case is pretty typical

}

/// Add `n` to the metric's value
pub fn add(&self, n: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that metric recording doesn't involve any string manipulation 👍

impl Time {
/// Create a new [`Time`] wrapper suitable for recording elapsed
/// times for operations.
pub fn new(inner: Arc<SQLMetric>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be verifying the MetricKind of the SQLMetric? Or alternatively be private and have a member function on SQLMetric that does the verification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. 🤔 though now I think about it the more Rust-y way of doing this would be to do it in the type system... so perhaps I will collapse MetricKind 🤔

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks @tustvold -- I am going to implement the Cow approach and also try and consolidate SQLMetric and MetricKind to see how that looks

impl Time {
/// Create a new [`Time`] wrapper suitable for recording elapsed
/// times for operations.
pub fn new(inner: Arc<SQLMetric>) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. 🤔 though now I think about it the more Rust-y way of doing this would be to do it in the type system... so perhaps I will collapse MetricKind 🤔

@alamb
Copy link
Contributor Author

alamb commented Aug 18, 2021

@andygrove / @returnString as you implemented the SQLMetrics initially and have worked with them, do you have any context or opinion on the use of snake_case vs camelCase names for the metrics? For example numRows vs num_rows?

From this PR's original description:

The current SQL counters use "camel case" for the counter names (e.g. numRows) rather than the Rust standard "snake case" (e.g. num_rows). I kept the same naming convention in this PR, but I wonder if we want to make them more Rust standard snake case given we are messing with them all anyways.

@tustvold notes that

Edit: r.e. snake case vs camel case, FWIW most metrics systems I've interacted with don't support upper-case letters, nor hyphens, so snake case is pretty typical

Given that snake case is the standard in Rust as well, I would probably be inclined to update the metric names to use snake case as well

@andygrove
Copy link
Member

Given that snake case is the standard in Rust as well, I would probably be inclined to update the metric names to use snake case as well

This makes sense. I didn't even think about the casing. I just spend too much time looking at Spark query plans so that influenced my initial work.

@alamb
Copy link
Contributor Author

alamb commented Aug 18, 2021

This makes sense. I didn't even think about the casing. I just spend too much time looking at Spark query plans so that influenced my initial work.

Thanks @andygrove -- I will then update this proposal to switch the names to snake_case then

@alamb alamb marked this pull request as draft August 18, 2021 19:22
@alamb
Copy link
Contributor Author

alamb commented Aug 18, 2021

I have some non trivial feedback to incorporate so marking this as a draft (and maybe I will open a new PR after updating the proposal)

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to review in detail at the moment but I took a quick look through and LGTM

@alamb alamb changed the title Implement new metrics API Implement new metrics API / RFC Aug 19, 2021
@alamb
Copy link
Contributor Author

alamb commented Aug 19, 2021

This PR has changed enough since initial feedback that I have opened a second one #908 with the updates. Closing this one.

@alamb alamb closed this Aug 19, 2021
@alamb alamb deleted the alamb/metrics branch August 8, 2023 20:12
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improved features and interoperability for SQLMetrics

3 participants