Skip to content

distsql: collect tableReader stats and output to trace#24529

Merged
craig[bot] merged 2 commits into
cockroachdb:masterfrom
asubiotto:pstats
Apr 12, 2018
Merged

distsql: collect tableReader stats and output to trace#24529
craig[bot] merged 2 commits into
cockroachdb:masterfrom
asubiotto:pstats

Conversation

@asubiotto
Copy link
Copy Markdown
Contributor

This PR is split into two commits. The first commit lays the groundwork for stat collection code. The stats are modeled as a proto to be sent over the wire in the future (we will be sending these stats as metadata). The second commit modifies the tableReader to use this InputStatCollector to count the number of rows it reads. Example usage:

root@:26257/> SELECT COUNT(*) FROM test.xyz;
+-------+
| count |
+-------+
|     7 |
+-------+
(1 row)

Time: 257.249037ms

root@:26257/> SET tracing = ON;
SET

Time: 362.575µs

root@:26257/> SELECT x,y,z FROM test.xyz ORDER BY x;
+---+---+------+
| x | y |  z   |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 1 |    2 |
| 1 | 1 |    2 |
| 1 | 2 |    1 |
| 2 | 2 |    3 |
| 4 | 5 |    6 |
| 4 | 1 |    6 |
+---+---+------+
(7 rows)

Time: 207.334037ms

root@:26257/> SHOW TRACE FOR SESSION;

... rest of trace elided ...

| 2018-04-05 20:53:01.571467+00:00 | 21s794ms540µs951ns | waiting for read lock                                                    | [n1,s1,r23/1:/Table/5{2-3}]                                |     | /cockroach.roachpb.Internal/Batch |   13 |
| 2018-04-05 20:53:01.777272+00:00 | 22s345µs429ns      | read completed                                                           | [n1,s1,r23/1:/Table/5{2-3}]                                |     | /cockroach.roachpb.Internal/Batch |   13 |
| 2018-04-05 20:53:01.777435+00:00 | 22s508µs797ns      | stat summary for tableReader input                                       | [client=[::1]:64047,user=root,n1,TableReader]              |     | table reader                      |   11 |
|                                  |                    |                                                                          |                                                            |     |                                   |      |
|                                  |                    |  rows read         7                                                     |                                                            |     |                                   |      |

... rest of trace elided ...

And then through jaeger:
screen shot 2018-04-05 at 4 53 55 pm

Note that we can't currently run SHOW TRACE FOR <QUERY> in distsql mode, which I hope to fix soon (#16562)

Any suggestions to improve the output welcome.

cc @knz @jordanlewis

@asubiotto asubiotto requested review from a team and solongordon April 5, 2018 21:18
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Apr 5, 2018

Glad to see work in this direction! I was just wondering the other day how we could get better insights here. From the PR description, it looks like you're printing the stats into an event message. Have you considered making them tags instead? Instead of a message you could have tags of the type "tablereader.spans.read" -> 7 (and I'm anticipating that we'll have the RocksDB performance counters similarly). I'm not sure this is a better idea, btw, and even if it is it's straightforward to change it later.

@asubiotto
Copy link
Copy Markdown
Contributor Author

I haven't considered any alternatives to the trace output since I'm not too familiar with logging. Thanks for suggesting the tags, I'll try those out.

@tbg
Copy link
Copy Markdown
Member

tbg commented Apr 5, 2018

Sounds good! I just want to reiterate that maybe this just won't work very well. But it's worth a quick experiment. (Even if you add tags, you probably still want to print it into the text so that it's properly exposed during SHOW TRACE FOR).

@petermattis
Copy link
Copy Markdown
Contributor

Review status: 0 of 7 files reviewed at latest revision, all discussions resolved, all commit checks successful.


pkg/sql/distsqlrun/data.proto, line 265 at r1 (raw file):

message InputStats {
  // name describes the input whose stats were collected.
  optional string name = 1 [(gogoproto.nullable) = false];

Streams that connect distsql processors have a stream_id. Perhaps that should be used to identify the stats instead of a name. @andreimatei Might have opinions about this.


Comments from Reviewable

@asubiotto
Copy link
Copy Markdown
Contributor Author

asubiotto commented Apr 9, 2018

I think we need to resolve the question of how to send structured stats data back (e.g. to display in EXPLAIN). Initially, I was thinking of having these stats sent back as ProducerMetadata. The alternative is to, as @tschottdorf suggests, add tags to the processor's span. We could prefix the stats tags with stats. and print those out, to not have to care about any other changes in the stats output. This sounds a bit brittle though. @andreimatei, what do you think?

@asubiotto
Copy link
Copy Markdown
Contributor Author

Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/distsqlrun/data.proto, line 265 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Streams that connect distsql processors have a stream_id. Perhaps that should be used to identify the stats instead of a name. @andreimatei Might have opinions about this.

I was thinking that this name would describe what the stats are about, rather than identify the stats (e.g. "left input of hashJoiner"). The identification will be done at a later stage, and I'm not sure stream_ids are enough to do that. The goal is to group a bunch of these stats (InputStats, MemoryStats, etc...) into a StatsGroup identified by a unique processor id and a stage in the logical plan.


Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

The idea of using span tags for this information seems weird to me. If a processor collects 5 stats, would that be on tag or 5? Would they really look nice in the Jaegar UI? One argument I'd maybe buy was if putting them as tags would let any of the tracing services provide structured searching over them. But I doubt it...
If anything, I think the concept of span baggage items is closer in intent to this use case. But I don't know if tracers render baggage at all.
Another thing to consider is that I'm not convinced that the current mode of running with one span per processor works well for fused processors in the long run. The alternative would be to have all fused processors use a single span - which seems more like the canonical use of spans, but also has disadvantages. If multiple processors were to share a span, we'd need to consider what we want that span's tags to be.

The way we've returned structured info from processors so far is through ProducerMetadata. If the stats introduced here need to be "structured", then I think my preference would be for processors to do both - return some metadata and also render that metadata as a trace message.
We could teach SHOW TRACE FOR to recognize these messages and bump them to the top of their spans.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/distsqlrun/data.proto, line 265 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I was thinking that this name would describe what the stats are about, rather than identify the stats (e.g. "left input of hashJoiner"). The identification will be done at a later stage, and I'm not sure stream_ids are enough to do that. The goal is to group a bunch of these stats (InputStats, MemoryStats, etc...) into a StatsGroup identified by a unique processor id and a stage in the logical plan.

An "input" is a stream, right? So I think it makes sense to rename this to StreamStats and put a stream_id in it...
I think the concept of a StatsGroup probably makes sense, but I don't think that's in conflict with the StreamStats?


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Apr 9, 2018

The idea of using span tags for this information seems weird to me. If a processor collects 5 stats, would that be on tag or 5? Would they really look nice in the Jaegar UI? One argument I'd maybe buy was if putting them as tags would let any of the tracing services provide structured searching over them. But I doubt it...
If anything, I think the concept of span baggage items is closer in intent to this use case. But I don't know if tracers render baggage at all.

Could you elaborate on why that seems weird to you? Think of a span like a struct { messages []string, tags map[string]string }. You would use a tag for structured information, which this is. And I think you would have a tag for each metric you're collecting. Also, this isn't limited to DistSQL. For example, you may want your SHOW TRACE FOR COUNT(*) to tell you how many RocksDB tombstones were skipped over. You get a snowball trace bag, and simply add up the corresponding values. How to aggregate could be obvious from the tag name: count.<anything>.

For example, if you want to know how many rows you've read, you let every tablereader's span emit a count count.read (or whatever). Building something DistSQL specific would be unfortunate.

Baggage items are not the right choice here. The storage layer does not need to know how many rows the table reader on top of it has already read.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

Could you elaborate on why that seems weird to you?

My mind is not set on this, so let me ask you a question in return: any log message provides some information (at the very lease its presence/absence is a bit). What nuggets of this information should be tags?


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@asubiotto
Copy link
Copy Markdown
Contributor Author

Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/distsqlrun/data.proto, line 265 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

An "input" is a stream, right? So I think it makes sense to rename this to StreamStats and put a stream_id in it...
I think the concept of a StatsGroup probably makes sense, but I don't think that's in conflict with the StreamStats?

TIL, I thought streams were only remote. It's not in conflict, but I'm not sure what we get out of having a stream_id here when I'm going to add identifiers for a group of stats later. Also, taking this PR as a specific example, not all InputStats will correspond to a stream. Here, the tableReader doesn't have an input from a plan perspective but we still want to collect these stats from its interaction with the rowFetcher. I think it's better to leave this as it is and there's no harm in keeping it general.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Apr 9, 2018

Log messages are meant for human consumption and so their form can change from one day to the next (you don't go around wondering whether changing a log.Event will mess up some parsing somewhere else). For span tag keys the semantics are that they are a label for which there is a registry (i.e. you don't just make them up free-form). Span tags are represented by a map so they are used for key lookup while events are not supposed to be searched. I think you'll find that represented in how jaeger and lightstep interpret tags; they are sensitive to certain standard tags.

I don't think that you're seriously suggesting that we don't use tags and instead parse the information out of log messages because well, not having to do that is exactly why tags exist.

See https://github.com/opentracing/specification/blob/master/semantic_conventions.md for some intuition about span tags. Tags are usually something like component, db.statement, user, http_code, etc. Using it to propagate information that can be accumulated is not something mentioned there, but it seems a reasonable fit.

Take a look at the tentative API below. We should show the tags inSHOW TRACE just before the span closes (because in the ordinary SHOW TRACE FOR that's kind of what you want, you don't want it aggregated -- it's the raw view of the timeline).

const keysRead = 12515 // example
log.CountEvent(ctx, "tablereader.keys", keysRead)
// ^- calls
// (opentracing.Span).SetTag("count.tablereader.keys", fmt.Sprintf("%v", keysRead))

But the upside is that you can imagine running (mod syntax bikeshed) SHOW TRACE (STATS) FOR SELECT COUNT(*) ... and get something like

metric                    | value
--------------------------+---------
count.tablereader.keys    | 12345
count.replicas.contacted  | 591
count.requests.           | 25
count.writes              | 3
...

I'm not sure what everyone else's experience is, but I've often wanted to isolate specific information from a trace and that has always been a problem since everything is currently unstructured. Would be good to change that.

This proposal isn't perfect because what if CountEvent is called multiple times for a span? We don't necessarily want to have to parse the string, add, and re-encode it. Maybe that's just not something that should be done? But I think some version of that proposal should be feasible, and we would make a central registry of tags and tie it all together that way. I'd prefer that over cooking up some DistSQL-specific mechanism.

Happy to chat about this more and help out if what I'm proposing is determined worth a shot but hits a snag.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

you don't go around wondering whether changing a log.Event will mess up some parsing somewhere else

Well, actually, we already have logs that we can't change willy-nilly because of the damn "KV tracing" - those are logs selected by some regex.
I always thought the implementation of KV tracing in particular was a bad idea, but I've also thought that as tracing starts being used for testing more and more (which it already is), we should have a set of messages for which the format string is a constant shared by the code doing the logging and by a "trace comprehension" library.

Anyway, as I was telling you, I am inspired by the idea of doing something for these stats which is not specific to DistSQL and also about a metrics-like API, and I guess that reflecting these things in tags rather than messages makes sense. In fact, I think we should access all the metrics that we already have through such an API so that the metrics that a request contributed to are reflected in that request's trace. "What if CountEvent is called multiple times for a span" is something that we really need to decide on though, as is what we want to aggregate per span and what we want to aggregate across spans too. For example, when I run a SELECT, I want to see how many ranges were touched as a top-level summed up value (at least summed up across all KV requests done by a KV fetcher).


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@knz
Copy link
Copy Markdown
Contributor

knz commented Apr 10, 2018

@tschottdorf while I am agnostic on whether tags are the best way to do things conceptually, I'd like to point out that our current implementation of tags (using a linked list of tag structs implementing the Context interface, which need to be traversed to arrive to the actual context) is not performance-friendly with larger tag sets.

Until we solve that implementation problem I'd say we shouldn't use tags unless a specific benchmark measures the corresponding overhead.

@tbg
Copy link
Copy Markdown
Member

tbg commented Apr 10, 2018

@knz you're talking about the wrong tags 😄 My "tags" are opentracing span tags. They are internally backed by a map[string]string.

On a related note though, I was thinking what we are really winning by the use of opentracing tags and it's not much. Sure, the various opentracing-compatible tools will allow us to search spans by tag which is nice. But for a metrics-y use case, that seems mostly not very interesting, and being forced into all-strings is pretty terrible for things that are mostly integers.

So what if instead we put a struct with performance counters in (our impl of) span which is then recorded in RecordedSpan? I think this means that we need to widen the opentracing.Span interface, but that seems acceptable. We won't get the "benefit" of having these counters show up on lightstep or wherever, but that seems OK (and we could emulate it by also writing a tag whenever a counter is used).

@andreimatei, what do you think? This seems relatively straightforward to prototype.

@asubiotto
Copy link
Copy Markdown
Contributor Author

@tschottdorf I think this RecordedSpan idea is the way to go because of the ability to have any layer add its own stats (not just distsql) and keep the benefits of structuring these stats as protos. This PR is in line with the idea of adding these stats to a proto so I don't think there is any extra work to do at this stage. I appreciate the input.

@solongordon, this PR is ready for a review after I resolve the merge conflicts.

@solongordon
Copy link
Copy Markdown
Contributor

:lgtm: Couple quibbles but looks like a good start in general.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion.


pkg/sql/distsqlrun/stats.go, line 57 at r3 (raw file):

func (is InputStats) SummarizeStats() string {
	var buffer bytes.Buffer
	buffer.WriteString(fmt.Sprintf("stat summary for %s\n", is.Name))

Nit: Do you know what the deal is with the blank line in the trace output?


pkg/sql/distsqlrun/stats.go, line 58 at r3 (raw file):

	var buffer bytes.Buffer
	buffer.WriteString(fmt.Sprintf("stat summary for %s\n", is.Name))
	buffer.WriteString(fmt.Sprintf("%10s%10d", "rows read", is.NumRows))

I would personally find a simple "rows read: 7" format to be more readable, but I'm also happy to wait on bikeshedding this.


Comments from Reviewable

Will be used by processors in future commits. Currently only counts the
number of rows read but will be extended.

Release note: None
Just number of rows read for now. This information is only collected and
output if the tracing span is recording.

Part of work for #19476

Release note: None
@asubiotto
Copy link
Copy Markdown
Contributor Author

Review status: 0 of 7 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/distsqlrun/stats.go, line 57 at r3 (raw file):

Previously, solongordon (Solon) wrote…

Nit: Do you know what the deal is with the blank line in the trace output?

No idea. I changed to follow the same format as runtime stats:

320 I180412 20:25:00.604783 230 server/status/runtime.go:219  [n1] runtime stats: 238 MiB RSS, 163 goroutines, 94 MiB/45 MiB/164 MiB GO alloc/idle/total, 96 MiB/134 MiB CGO alloc/total, 89.70cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)

pkg/sql/distsqlrun/stats.go, line 58 at r3 (raw file):

Previously, solongordon (Solon) wrote…

I would personally find a simple "rows read: 7" format to be more readable, but I'm also happy to wait on bikeshedding this.

Changed the format, I agree with you. Had the width to make more stats easier to read.


Comments from Reviewable

@asubiotto
Copy link
Copy Markdown
Contributor Author

bors r+

craig Bot pushed a commit that referenced this pull request Apr 12, 2018
24529: distsql: collect tableReader stats and output to trace r=asubiotto a=asubiotto

This PR is split into two commits. The first commit lays the groundwork for stat collection code. The stats are modeled as a proto to be sent over the wire in the future (we will be sending these stats as metadata). The second commit modifies the tableReader to use this InputStatCollector to count the number of rows it reads. Example usage:

```
root@:26257/> SELECT COUNT(*) FROM test.xyz;
+-------+
| count |
+-------+
|     7 |
+-------+
(1 row)

Time: 257.249037ms

root@:26257/> SET tracing = ON;
SET

Time: 362.575µs

root@:26257/> SELECT x,y,z FROM test.xyz ORDER BY x;
+---+---+------+
| x | y |  z   |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 1 |    2 |
| 1 | 1 |    2 |
| 1 | 2 |    1 |
| 2 | 2 |    3 |
| 4 | 5 |    6 |
| 4 | 1 |    6 |
+---+---+------+
(7 rows)

Time: 207.334037ms

root@:26257/> SHOW TRACE FOR SESSION;

... rest of trace elided ...

| 2018-04-05 20:53:01.571467+00:00 | 21s794ms540µs951ns | waiting for read lock                                                    | [n1,s1,r23/1:/Table/5{2-3}]                                |     | /cockroach.roachpb.Internal/Batch |   13 |
| 2018-04-05 20:53:01.777272+00:00 | 22s345µs429ns      | read completed                                                           | [n1,s1,r23/1:/Table/5{2-3}]                                |     | /cockroach.roachpb.Internal/Batch |   13 |
| 2018-04-05 20:53:01.777435+00:00 | 22s508µs797ns      | stat summary for tableReader input                                       | [client=[::1]:64047,user=root,n1,TableReader]              |     | table reader                      |   11 |
|                                  |                    |                                                                          |                                                            |     |                                   |      |
|                                  |                    |  rows read         7                                                     |                                                            |     |                                   |      |

... rest of trace elided ...
```
And then through jaeger:
<img width="1427" alt="screen shot 2018-04-05 at 4 53 55 pm" src="https://user-images.githubusercontent.com/10560359/38392141-740ae6d8-38f4-11e8-9406-2c167142eba6.png">

Note that we can't currently run `SHOW TRACE FOR <QUERY>` in distsql mode, which I hope to fix soon (#16562)

Any suggestions to improve the output welcome.

cc @knz @jordanlewis 

24715: opt: Fix flaky test r=andy-kimball a=andy-kimball

Update the TestNoMallocs to use the SucceedsSoon method in order
to repeatedly retry until zero memory allocations are detected.

Release note: None

24752: cli: Fix number of args expected by `debug gossip-values` r=a-robinson a=a-robinson

Fixes #24751

Release note: None

It looks like the problem is that my original usage string (`gossip-values <directory>`) was badly copied and pasted from other debug commands, and then assumed necessary by #24399 when in reality no positional argument is used by the command.

Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig Bot commented Apr 12, 2018

Build succeeded

@craig craig Bot merged commit 6015b79 into cockroachdb:master Apr 12, 2018
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Apr 13, 2018
The recent cockroachdb#24529 made a change: it switched to starting the tr's input
using the tr's ctx, instead of the ctx passed to tr.Start(). I think the
change was inadvertent. It diverges from the pattern used by every other
processor that implements RowSource: a proc's ctx is not supposed to be
used by other procs - e.g. it would lead to contradictory logging tags.
In this particular case of the tableReader, perhaps this is not the
worst thing in the world: the tr's input is not a generic rowSource, it
is precisely a RowFetcher (wrapped in layers and layers). So maybe it's
not so bad for the RowFetcher in particular to use a tr's ctx. But now
we've also introduced the InputStatCollector as a possible input, so
we're really pretending that the tr is just like any other proc with an
input. So, I went for uniformity.

Release note: None
craig Bot pushed a commit that referenced this pull request Apr 16, 2018
24793: distsqlrun: fix tableReader ctx r=andreimatei a=andreimatei

The recent #24529 made a change: it switched to starting the tr's input
using the tr's ctx, instead of the ctx passed to tr.Start(). I think the
change was inadvertent. It diverges from the pattern used by every other
processor that implements RowSource: a proc's ctx is not supposed to be
used by other procs - e.g. it would lead to contradictory logging tags.
In this particular case of the tableReader, perhaps this is not the
worst thing in the world: the tr's input is not a generic rowSource, it
is precisely a RowFetcher (wrapped in layers and layers). So maybe it's
not so bad for the RowFetcher in particular to use a tr's ctx. But now
we've also introduced the InputStatCollector as a possible input, so
we're really pretending that the tr is just like any other proc with an
input. So, I went for uniformity.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@asubiotto asubiotto deleted the pstats branch May 22, 2018 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants