Skip to content

Ensure Krb auth before killing YARN apps in graceful shutdown of hadoop batch index tasks#9785

Merged
gianm merged 1 commit intoapache:masterfrom
capistrant:fixes-graceful-shutdown-mr-cleanup
Nov 16, 2020
Merged

Ensure Krb auth before killing YARN apps in graceful shutdown of hadoop batch index tasks#9785
gianm merged 1 commit intoapache:masterfrom
capistrant:fixes-graceful-shutdown-mr-cleanup

Conversation

@capistrant
Copy link
Copy Markdown
Contributor

@capistrant capistrant commented Apr 28, 2020

Description

My deployments had struggled with graceful shutdown of our Hadoop Batch Indexing tasks up until now. In the past, we simply patched out graceful shutdown since we didn't really mind the YARN apps not being killed. However, for our Druid 18 upgrade we wanted to get this fixed.

After troubleshooting, we found that the code was getting hung in ToolRunner#run when trying to kill the YARN app for the indexing job. After some trial and error in our fork we discovered it was an authentication issue and that Calling into JobHelper#authenticate fixed the issue with the graceful shutdown code becoming hung. Our analysis of calling into JobHelper#authenticate let us to believe it is a harmless method to call multiple times and also harmless to call in a cluster that is not using Kerberized Hadoop. The call is transparent if the authentication is already done or the Hadoop cluster in use is not using Kerberos. We also slightly refactored the signature of JobHelper#authenticate after intelliJ flagged the config parameter as being unused.

With all this being said, I'm not sure if calling into the authenticate method from a different module is frowned upon as far as design goes. I decided to reuse that method and see what the community had to say about that.


This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • JobHelper
  • HadoopIndexTask

@stale
Copy link
Copy Markdown

stale Bot commented Jun 28, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Jun 28, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Jul 26, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale Bot closed this Jul 26, 2020
@capistrant
Copy link
Copy Markdown
Contributor Author

revive

@capistrant capistrant reopened this Sep 3, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Sep 3, 2020

This pull request/issue is no longer marked as stale.

@stale
Copy link
Copy Markdown

stale Bot commented Nov 5, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Nov 5, 2020
@capistrant
Copy link
Copy Markdown
Contributor Author

prevent stale mark.

@stale
Copy link
Copy Markdown

stale Bot commented Nov 11, 2020

This issue is no longer marked as stale.

@stale stale Bot removed the stale label Nov 11, 2020
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

LGTM

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @capistrant — thanks!

@gianm gianm merged commit 3447934 into apache:master Nov 16, 2020
@jihoonson jihoonson added the Bug label Dec 9, 2020
abhishekagarwal87 pushed a commit to abhishekagarwal87/druid that referenced this pull request Dec 14, 2020
* scaffolding

* readme

* adjust

* more better, janky heap metadata store, primitive job queue that can submit to overlord, it works - sort of

* test scaffolding

* move InputFormat into IngestSchema

* imply-5135 Create & list ingest tables

* Addressed PR comments

* Removed bean IngestTable

* job processing + sql metadata job table (#78)

* Add indexed-table-loader (#65)

* Add indexed-table-loader

* Fix checkstyle

* Fix intelliJ inspections

* Fix analyze dependencies

* fix license check job

* Add imply-druid-security (#66)

* Add imply-druid-security

* fix checkstyle

* Fix analyze dependencies

* Fix license check job

* Update license header for all imply extensions

* fix intelliJ inspections

* code review

* modify access to protected SQLMetadataConnector methods to allow extensions to create SQL metadata tables using implementation specific constructs (payload type, serial type, etc) (apache#10573)

* Correct getRandomBalancerSegmentHolderTest (apache#10569)

* Add missing docs for timeout exceptions (apache#10554)

* Add missing docs for timeout exceptions

* Add info on auth failures

* Fix ingestion failure of pretty-formatted JSON message (apache#10383)

* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check

* Add TravisCI job that builds and tests on ARM64 CPU architecture (apache#10562)

* Ensure Krb auth before killing YARN apps in graceful shutdown (apache#9785)

* job processing + sql metadata

* Web console: fix data loader schema table column ordering bug and other polish (apache#10588)

* remove unused fields

* keep tables live

* advanced

* fix schema view

* better indication

* tests pass

* Show more instead of show advanced

* fix tests

* extract dynamic configs

* update snapshots

* fix issues

* update snapshot

* reword without >

* some javadoc

* modify druid.historical.cache.maxEntrySize property in Unified format (apache#10590)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>

* Fix license header for imply extensions (#76)

* Fix license header for imply extensions

* arm64 packaging should use jdk8

* maybe this time

* jobs and states and status and whatever

* use indexing client and coordinator client instead of leader client

* always running

* simplify

* fix readme

* Add zero period support to TIMESTAMPADD (apache#10550)

* Allow zero period for TIMESTAMPADD

* update test cases

* add empty zone test case

* add unit test cases for TimestampShiftMacro

* add -Pimply-saas distribution profile, table exists check

* update readme

Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>
Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
Co-authored-by: frank chen <frank.chen021@outlook.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
Co-authored-by: zhangyue19921010 <69956021+zhangyue19921010@users.noreply.github.com>
Co-authored-by: yuezhang <yuezhang@freewheel.tv>

* fix style and headers

* fix fails

* fix auth

Co-authored-by: Agustin Gonzalez <agustin.gonzalez@imply.io>
Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>
Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
Co-authored-by: frank chen <frank.chen021@outlook.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
Co-authored-by: zhangyue19921010 <69956021+zhangyue19921010@users.noreply.github.com>
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants