prometheus-emitter by michaelschiff · Pull Request #8621 · apache/druid

michaelschiff · 2019-10-02T17:26:15Z

Adds a new extension prometheus-emitter to expose Druid metrics for collection directly by a Prometheus server.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

Key changed/added classes in this PR

org.apache.druid.emitter.prometheus.*

michaelschiff · 2019-10-02T17:26:33Z

Opened now to get any comments on approach before finishing enumerating the different metrics

michaelschiff · 2019-10-07T17:51:27Z

CC @xvrl @gianm for review

michaelschiff · 2019-11-01T16:43:06Z

Things are working well in our development clusters. Please let me know if there is anything else you'd like to see before going forward with review.

stale · 2020-01-26T18:26:18Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

jihoonson · 2020-01-26T19:23:47Z

@michaelschiff apologize for the delayed review. I'll take a look the PR soon. I'm closing and reopening the PR to trigger the CI (Druid repository name has changed due to the deincubation and the CI failure logs have gone).

michaelschiff · 2020-01-29T01:14:57Z

@jihoonson Sure - let me know if you need anything from me

jihoonson

Thanks. I quickly looked through the Travis failure (Sadly, reopening this PR didn't trigger the TC 😢) and the below failure looks legit.

    ../docs/development/extensions-contrib/prometheus.md
       39 | e port on which to expose the prometheus HTTPServer.|yes|none| 
       39 | hich to expose the prometheus HTTPServer.|yes|none| 
       41 | ed dimensions, help text, and conversionFactor for every Druid metric.|no|De 
    ../docs/operations/metrics.md
      240 | pool/count`|Bufferpool count.|bufferpoolName.|Varies.| 
      241 | erpool/used`|Bufferpool used.|bufferpoolName.|close to capacity| 
>> 5 spelling errors found in 162 files

The CI does the spell check and this error means there are some unknown words which are not registered on the dictionary. You may want to skip them by adding some of them to the .spelling file.

jihoonson · 2020-03-25T22:00:37Z

Oops, @michaelschiff sorry, I forgot about this PR. Would you please resolve the conflicts? Also, Git says this PR changes 808 files, but are they all yours? Please double check this PR contains only your changes.

cxlRay · 2020-03-26T05:15:51Z

clone this emitter and used in my test cluster
I cant find ingest metric in prometheus, is that a bug?

michaelschiff · 2020-03-26T20:00:41Z

@cxlRay do you see the appropriate port open by each of the druid processes? If so can you try to curl the metrics endpoint there?

michaelschiff · 2020-03-28T00:40:57Z

@jihoonson FYI @Tiaaa will be taking over this PR (I've changed jobs and can't commit to the source fork). I'll help however I can to see this over the line though.

jihoonson · 2020-03-30T16:33:08Z

@michaelschiff thanks for the update. @Tiaaa would you please double check that this PR contains only your changes?

… updated statsd json)

* API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix checksyle * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix spelling * address comments * fix checkstyle * update docs * fix tests * fix doc * address comments * fix typo * fix spelling * address comments * address comments * fix typo in docs

* All aggregators should use vectorization-aware column processor * All aggregators should use vectorization-aware column processor * fix canVectorize * fix canVectorize * add tests * revert back default * address comment * address comments * address comment * address comment

…e#10040) Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions

… accesses replaceable with Collections.empty*()" (apache#9690) * IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" * Reverted checkstyle rule * Added tests to pass CI * Codestyle

Co-authored-by: tomscut <tomscut@gmail.com>

* global table if only joinable * oops * fix style, add more tests * Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java * better information schema columns, distinguish broadcast from joinable * fix javadoc * fix mistake Co-authored-by: Jihoon Son <jihoonson@apache.org>

…les (apache#10048) * Coordinator loadstatus API full format does not consider Broadcast rules * address comments * fix checkstyle * minor optimization * address comments

…native batch ingestion (apache#10025) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports

* Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

…apache#10053) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures

… loadstatus API (apache#10054) * Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API * address comments * address comments * add tests

…ject (apache#10055)

* fix docs error: google to azure and hdfs to http * fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part * fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part * fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part

apache#10058) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc

* change default number of segment loading threads * fix docs * missed file * min -> max for segment loading threads Co-authored-by: Dylan <dwylie@spotx.tv>

* retry 500 and 503 errors against kinesis * add test that exercises retry logic * more branch coverage * retry 500 and 503 on getRecords request when fetching sequence numberu Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>

* Druid user permissions apply in the console * Update index.md * noting user warning in console page; some minor shuffling * noting user warning in console page; some minor shuffling 1 * touchups * link checking fixes * Updated per suggestions

…o respect output type (apache#10063) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments

Tiaaa · 2020-06-24T04:18:31Z

@exherb
Removed 'query/intervalChunk/time' since it's being deprecated. Any other particular metric you find deprecated?

stale · 2020-08-23T05:39:35Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale · 2020-09-20T06:31:46Z

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

michaelschiff · 2020-09-21T04:49:33Z

@jihoonson @Tiaaa closed again due to inactivity :(

I have been using this extension in my clusters for the last 6 months without issue (Without a push-gateway, so not collecting indexing task metrics yet). I can set this up and start validating this collection path too if that's blocking getting this merged. @Tiaaa can you bring this PR up to date? @jihoonson What do you think about getting this merged after that?

Tiaaa · 2020-09-21T06:06:41Z

@michaelschiff re-opened it here #10412 cherry-picking all relevant commits.

TheRum · 2020-10-06T13:09:23Z

Do we have any Milestone/Timeline to get this emitter released ?

Tiaaa · 2020-10-06T14:35:48Z

@TheRum currently is in pending review state, hopefully get it in around end of October/mid-November.

jinfwhuang reviewed Oct 2, 2019

View reviewed changes

Comment thread ...theus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitterConfig.java

clintropolis added Area - Extension Area - Metrics/Event Emitting labels Oct 3, 2019

stale Bot added the stale label Jan 26, 2020

jihoonson removed the stale label Jan 26, 2020

jihoonson closed this Jan 26, 2020

jihoonson reopened this Jan 26, 2020

jihoonson reviewed Jan 29, 2020

View reviewed changes

Comment thread ...metheus-emitter/src/test/java/org/apache/druid/emitter/prometheus/PrometheusEmitterTest.java Outdated

michaelschiff force-pushed the feature/prometheus-exporter branch from e36a1ca to 88e6f24 Compare January 29, 2020 03:55

Michael Schiff added 10 commits March 30, 2020 10:13

prometheus-emitter

7abe1cf

use existing jetty server to expose prometheus collection endpoint

580e846

unused variables

289b8c8

better variable names

4318176

removed unused dependencies

2b365fc

more metric definitions

3ec989e

reorganize

75095e7

use prometheus HTTPServer instead of hooking into Jetty server

8787cd6

temporary empty help string

31a8d3c

temporary non-empty help. fix incorrect dimension value in JSON (also…

94b37ee

… updated statsd json)

maytasm and others added 20 commits June 23, 2020 21:12

Druid Avatica - Handle escaping of search characters correctly (apach…

1d0571d

…e#10040) Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions

fix docs (apache#9114)

1b9670a

Co-authored-by: tomscut <tomscut@gmail.com>

Coordinator loadstatus API full format does not consider Broadcast ru…

8ccdf71

…les (apache#10048) * Coordinator loadstatus API full format does not consider Broadcast rules * address comments * fix checkstyle * minor optimization * address comments

Remove changes from apache#9114 (apache#10050)

87fb048

Fix join filter rewrites with nested queries (apache#10015)

7510d0d

* Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

fix topn on string columns with non-sorted or non-unique dictionaries (…

7496d37

…apache#10053) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures

Add safeguard to make sure new Rules added are aware of Rule usage in…

05c2278

… loadstatus API (apache#10054) * Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API * address comments * address comments * add tests

SketchAggregator.updateUnion should handle null inside List update ob…

64062dc

…ject (apache#10055)

minor rework of topn algorithm selection for clarity and more javadocs (

db9ab64

apache#10058) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc

change default number of segment loading threads (apache#9856)

fc264b5

* change default number of segment loading threads * fix docs * missed file * min -> max for segment loading threads Co-authored-by: Dylan <dwylie@spotx.tv>

retry 500 and 503 errors against kinesis (apache#10059)

e6c13be

* retry 500 and 503 errors against kinesis * add test that exercises retry logic * more branch coverage * retry 500 and 503 on getRecords request when fetching sequence numberu Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>

Druid user permissions (apache#10047)

94e2a0c

* Druid user permissions apply in the console * Update index.md * noting user warning in console page; some minor shuffling * noting user warning in console page; some minor shuffling 1 * touchups * link checking fixes * Updated per suggestions

Fix HyperUniquesAggregatorFactory.estimateCardinality null handling t…

90735f1

…o respect output type (apache#10063) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments

Remove deprecated metric

5127f81

stale Bot added the stale label Aug 23, 2020

stale Bot closed this Sep 20, 2020

Tiaaa mentioned this pull request Sep 21, 2020

prometheus metric exporter #10412

Merged

8 tasks

Conversation

michaelschiff commented Oct 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changed/added classes in this PR

Uh oh!

michaelschiff commented Oct 2, 2019

Uh oh!

Uh oh!

michaelschiff commented Oct 7, 2019

Uh oh!

michaelschiff commented Nov 1, 2019

Uh oh!

stale Bot commented Jan 26, 2020

Uh oh!

jihoonson commented Jan 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelschiff commented Jan 29, 2020

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jihoonson commented Mar 25, 2020

Uh oh!

cxlRay commented Mar 26, 2020

Uh oh!

michaelschiff commented Mar 26, 2020

Uh oh!

michaelschiff commented Mar 28, 2020

Uh oh!

jihoonson commented Mar 30, 2020

Uh oh!

Tiaaa commented Jun 24, 2020

Uh oh!

stale Bot commented Aug 23, 2020

Uh oh!

stale Bot commented Sep 20, 2020

Uh oh!

michaelschiff commented Sep 21, 2020

Uh oh!

Tiaaa commented Sep 21, 2020

Uh oh!

TheRum commented Oct 6, 2020

Uh oh!

Tiaaa commented Oct 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

michaelschiff commented Oct 2, 2019 •

edited

Loading

jihoonson commented Jan 26, 2020 •

edited

Loading