Ensure Krb auth before killing YARN apps in graceful shutdown of hadoop batch index tasks#9785
Conversation
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
revive |
|
This pull request/issue is no longer marked as stale. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
prevent stale mark. |
|
This issue is no longer marked as stale. |
|
LGTM |
gianm
left a comment
There was a problem hiding this comment.
LGTM @capistrant — thanks!
* scaffolding * readme * adjust * more better, janky heap metadata store, primitive job queue that can submit to overlord, it works - sort of * test scaffolding * move InputFormat into IngestSchema * imply-5135 Create & list ingest tables * Addressed PR comments * Removed bean IngestTable * job processing + sql metadata job table (#78) * Add indexed-table-loader (#65) * Add indexed-table-loader * Fix checkstyle * Fix intelliJ inspections * Fix analyze dependencies * fix license check job * Add imply-druid-security (#66) * Add imply-druid-security * fix checkstyle * Fix analyze dependencies * Fix license check job * Update license header for all imply extensions * fix intelliJ inspections * code review * modify access to protected SQLMetadataConnector methods to allow extensions to create SQL metadata tables using implementation specific constructs (payload type, serial type, etc) (apache#10573) * Correct getRandomBalancerSegmentHolderTest (apache#10569) * Add missing docs for timeout exceptions (apache#10554) * Add missing docs for timeout exceptions * Add info on auth failures * Fix ingestion failure of pretty-formatted JSON message (apache#10383) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check * Add TravisCI job that builds and tests on ARM64 CPU architecture (apache#10562) * Ensure Krb auth before killing YARN apps in graceful shutdown (apache#9785) * job processing + sql metadata * Web console: fix data loader schema table column ordering bug and other polish (apache#10588) * remove unused fields * keep tables live * advanced * fix schema view * better indication * tests pass * Show more instead of show advanced * fix tests * extract dynamic configs * update snapshots * fix issues * update snapshot * reword without > * some javadoc * modify druid.historical.cache.maxEntrySize property in Unified format (apache#10590) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * Fix license header for imply extensions (#76) * Fix license header for imply extensions * arm64 packaging should use jdk8 * maybe this time * jobs and states and status and whatever * use indexing client and coordinator client instead of leader client * always running * simplify * fix readme * Add zero period support to TIMESTAMPADD (apache#10550) * Allow zero period for TIMESTAMPADD * update test cases * add empty zone test case * add unit test cases for TimestampShiftMacro * add -Pimply-saas distribution profile, table exists check * update readme Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io> Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com> Co-authored-by: frank chen <frank.chen021@outlook.com> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: zhangyue19921010 <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> * fix style and headers * fix fails * fix auth Co-authored-by: Agustin Gonzalez <agustin.gonzalez@imply.io> Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io> Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com> Co-authored-by: frank chen <frank.chen021@outlook.com> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: zhangyue19921010 <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Description
My deployments had struggled with graceful shutdown of our Hadoop Batch Indexing tasks up until now. In the past, we simply patched out graceful shutdown since we didn't really mind the YARN apps not being killed. However, for our Druid 18 upgrade we wanted to get this fixed.
After troubleshooting, we found that the code was getting hung in
ToolRunner#runwhen trying to kill the YARN app for the indexing job. After some trial and error in our fork we discovered it was an authentication issue and that Calling intoJobHelper#authenticatefixed the issue with the graceful shutdown code becoming hung. Our analysis of calling intoJobHelper#authenticatelet us to believe it is a harmless method to call multiple times and also harmless to call in a cluster that is not using Kerberized Hadoop. The call is transparent if the authentication is already done or the Hadoop cluster in use is not using Kerberos. We also slightly refactored the signature ofJobHelper#authenticateafter intelliJ flagged the config parameter as being unused.With all this being said, I'm not sure if calling into the
authenticatemethod from a different module is frowned upon as far as design goes. I decided to reuse that method and see what the community had to say about that.This PR has:
Key changed/added classes in this PR
JobHelperHadoopIndexTask