Tidy up lifecycle, query, and ingestion logging. by gianm · Pull Request #8889 · apache/druid

gianm · 2019-11-17T17:54:45Z

The goal of this patch is to improve the clarity and usefulness of
Druid's logging for cluster operators. For more information, see
https://twitter.com/cowtowncoder/status/1195469299814555648.

Concretely, this patch does the following:

Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the
goal of reducing redundancy and improving clarity by avoiding
showing rarely-useful log messages. This includes most "starting"
and "stopping" messages, and most messages related to individual
columns.
Adds new log4j2 templates that show operators how to enabled DEBUG
logging for certain important packages. These templates can be found in
the _common configuration folder in the example clusters under conf/druid.
Eliminate stack traces for query errors, unless log level is DEBUG
or more. This is useful because query errors often indicate user
error rather than system error, but dumping stack trace often gave
operators the impression that there was a system failure.
Adds task id to Appenderator, AppenderatorDriver thread names. In
the default log4j2 configuration, this will put them in log lines
as well. It's very useful if a user is using the Indexer, where
multiple tasks run in the same JVM.
More consistent terminology when it comes to "sequences" (sets of
segments that are handed-off together by Kafka ingestion) and
"offsets" (cursors in partitions). These terms had been confused in
some log messages due to the fact that Kinesis calls offsets
"sequence numbers".
Replaces some ugly toString calls with either the JSONification or
something more operator-accessible (like a URL or segment identifier,
instead of JSON object representing the same).

The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same).

gianm · 2019-11-17T18:06:30Z

The principles I'm trying to follow in this patch are:

Logs at INFO / WARN / ERROR level should be designed for cluster operators.
Logs at TRACE / DEBUG level should be designed for Druid developers.
Less is more. Having too many logs means the useful messages will get lost in a sea of irrelevant messages, because there will be too many to read and operators won't know what to search for.
Avoid logging "business as usual" things (starting, stopping, initializing, refreshing, found nothing to do, etc).
Ideally log one message per important lifecycle event (i.e. one log message per segment push to deep storage, one per publish, etc). Not zero and not more than one.

fjy · 2019-11-17T18:37:13Z

+3000

This is super useful

lgtm-com · 2019-11-19T01:02:51Z

This pull request introduces 2 alerts when merging b2d3d41 into f139903 - view on LGTM.com

new alerts:

2 for Unused format argument

gianm · 2019-11-19T03:20:15Z

Pushed updates to address some issues found by CI.

* Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.

gianm added the Ease of Use label Nov 17, 2019

Merge branch 'master' into operator-logging

b2d3d41

gianm added 2 commits November 18, 2019 18:27

Merge branch 'master' into operator-logging

d762b5c

Adjustments.

e172141

Adjust integration test.

cfbd598

fjy merged commit c44452f into apache:master Nov 19, 2019

gianm deleted the operator-logging branch November 19, 2019 21:58

gianm added this to the 0.17.0 milestone Nov 19, 2019

jon-wei added the Release Notes label Dec 18, 2019

jon-wei mentioned this pull request Dec 28, 2019

0.17.0 release notes #9066

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidy up lifecycle, query, and ingestion logging.#8889

Tidy up lifecycle, query, and ingestion logging.#8889
fjy merged 5 commits intoapache:masterfrom
gianm:operator-logging

gianm commented Nov 17, 2019 •

edited by jon-wei

Loading

Uh oh!

gianm commented Nov 17, 2019

Uh oh!

fjy commented Nov 17, 2019

Uh oh!

lgtm-com Bot commented Nov 19, 2019

Uh oh!

gianm commented Nov 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gianm commented Nov 17, 2019 • edited by jon-wei Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gianm commented Nov 17, 2019

Uh oh!

fjy commented Nov 17, 2019

Uh oh!

lgtm-com Bot commented Nov 19, 2019

Uh oh!

gianm commented Nov 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gianm commented Nov 17, 2019 •

edited by jon-wei

Loading