Push Spark appId to XCOM for LivyOperator with deferrable mode #31201

pankajkoti · 2023-05-11T08:04:25Z

With the change in PR #27376, we started pushing the Spark appId
to XCOM when executing the LivyOperator in standard (non-defferable)
mode. This commit now pushes the appId to XCOM in deferrable mode
too to keep the expected outcome consistent for the operator so that
subsequent tasks can fetch and use the appId from XCOM.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

pankajkoti · 2023-05-11T08:06:48Z

When running the Livy Operator in deferrable mode,

XCOM results before

XCOM results with the change in this PR:

phanikumv

rebase your PR please

pankajastro · 2023-05-12T04:39:54Z

airflow/providers/apache/livy/operators/livy.py

can we just move L151 context["ti"].xcom_push(key="app_id", value=self.get_hook().get_batch(self._batch_id)["appId"]) just before L148 i.e if not self.deferrable: or we want to push only if it reaches terminal state?

Also, I think we should add a test for this to avoid regression

yes, currently we push the Spark appId not for all terminal states but only in case of successful run of the batch job.

If we implement your suggestion @pankajastro it would be an early push to XCOM irrespective of the batch job final status. It will depend if the downstream tasks need this XCOM value even if the task has failed. I would avoid pushing an additional XCOM record in the metadata database if it is not really needed.

@bdsoha do you think from your experience that pushing the appId to XCOM would benefit or might be needed for downstream tasks even if the task has failed? (Tagging you as you had created the original PR and I am hoping you would have some experiences on the usage & need 🙂)

It was very useful, as I used in to retrieve application logs from yarn following a tasks execution.

@bdsoha thanks. But do we need it in case of failed tasks too? Or just successful tasks? Because at the moment the place you have added this in is only after successful completions.

@potiuk what do you think? Is it an overhead if we push to XCOM backend in case of failed tasks too? Or it could be fine that we push the XCOM early irrespective of the task status?

But if you have the information in hand, what's wrong with storing it? We should not use the presence of such information as a proxy for success or failure anyway (since we have actual status available).

yes, I am curious to understand the general consensus. If no downstream tasks are going to pull the XCOM information, would it clean up the record in the database or it stay in there forever? My thought process is that if it's not going to get used (maybe users need it? not sure) then why populate the db with 1 more record?

And I tried to follow the previous commit to publish only in case of success and apparently, we have not got feedback yet that it might be needed for failed cases too, no? If the general consensus across the project is to push only for success then perhaps we could wait until users come up with this need for this operator that they also need this for failed tasks. What do you think?

i think it's fine to wait until needed.

but i thought that's what @bdsoha was saying here:

It was very useful, as I used in to retrieve application logs from yarn following a tasks execution.

re the "wasted record" concern, i am not concerned with that. we're talking about "exceptional" circumstances, when the creation fails right? well, that isn't what happens ordinarily. so you're talking about a small number of cases. but, if it was successful the record would be created anyway and you it's certainly no worse. in the long run, all airflow databases need cleanup / purging of old records. and that's why we added a helper command for that.

i think @potiuk's concern is about consistency. consistency is good but also we can make steps in a direction without necessarily always having to update all operators to fit a certain convention. and there is inevitably variation in behavior between operators, each of which may be tailored to different services and different use cases, so one always has to know what the behavior is in order to use it properly. and example dags and docs are there to help with that.

related note: xcoms are cleared when tasks are cleared so it doesn't have an impact there.

so, i'm not saying you need to push xcom immediately just saying i don't see a probem with doing so if there's a good use case for it.

understood.

wrt. to @bdsoha's comment, they added the XCOM push in their PR https://github.com/apache/airflow/pull/27376/files#diff-b8586f3007fabbc894632e59111527536b03a2f576bda13bbcec5df9d3c1338bR144 after the call to self.poll_for_termination method which raises an exception if the batch job does not succeed. So, I am guessing they are using the XCOM value in downstream tasks only for success scenarios, but, yes, would definitely like to hear if they are using/needing it for failure scenarios too.

I say for now we stay consistent with the non-deferrable mode in this operator. If we/someone wants to start pushing it always, it can be done in a separate PR that handles both execution paths.

pankajkoti · 2023-05-23T09:34:52Z

hi @potiuk @ashb @bdsoha can I please get your opinion on the comment above 👆🏽

jedcunningham

Change looks good but we need a test.

pankajkoti · 2023-05-23T18:59:03Z

The static check failure is for lint-openapi pre-commit hook (Lint OpenAPI using openapi-spec-validator) and it is unrelated to the change in this PR. I observed other PRs are failing too because of this and I could not reproduce this locally using breeze on the latest main or on my branch. The pre-commit hook runs all fine locally with breeze.

I also tried to validate the file https://github.com/apache/airflow/blob/main/airflow/api_connexion/openapi/v1.yaml and it reported as a valid specification.

Any idea what could be going wrong here?

With the change in PR apache#27376, we started pushing the Spark appId to XCOM when executing the Livy Operator in standard(non-defferable) mode. This commit now pushed the appId to XCOM in deferrable mode too to keep the expected outcome consistent for the operator so that subsequent tasks can fetch and use the appId from XCOM.

boring-cyborg bot added area:providers provider:Apache labels May 11, 2023

pankajkoti requested a review from potiuk May 11, 2023 08:55

pankajkoti changed the title ~~Push Spark appId to XCOM in Livy Operator deferrable mode~~ Push Spark appId to XCOM for LivyOperator with deferrable mode May 11, 2023

phanikumv approved these changes May 11, 2023

View reviewed changes

pankajastro reviewed May 12, 2023

View reviewed changes

pankajkoti requested a review from ashb May 18, 2023 06:45

pankajkoti force-pushed the push-app-id-to-xcom-livy-operator-deferrable branch from 8b0d4ec to cd2eb45 Compare May 23, 2023 09:34

jedcunningham reviewed May 23, 2023

View reviewed changes

pankajkoti force-pushed the push-app-id-to-xcom-livy-operator-deferrable branch from 133305c to 02ef817 Compare May 23, 2023 17:40

pankajkoti mentioned this pull request May 24, 2023

Static check failure in CI #31489

Closed

pankajkoti force-pushed the push-app-id-to-xcom-livy-operator-deferrable branch 2 times, most recently from 3901a65 to efa3ceb Compare May 24, 2023 06:59

bdsoha approved these changes May 24, 2023

View reviewed changes

pankajkoti requested review from jedcunningham and uranusjr May 24, 2023 07:55

pankajkoti mentioned this pull request May 24, 2023

fix(providers/amazon): handle missing LogUri in emr describe_cluster API response #31482

Merged

pankajkoti force-pushed the push-app-id-to-xcom-livy-operator-deferrable branch from efa3ceb to 683d91e Compare May 24, 2023 13:51

pankajkoti added 2 commits May 25, 2023 01:15

Add tests to verify XCOM key app_id is pushed only on success

f1f1d95

pankajkoti force-pushed the push-app-id-to-xcom-livy-operator-deferrable branch from 683d91e to f1f1d95 Compare May 24, 2023 19:45

pankajkoti mentioned this pull request May 24, 2023

Host Python version has been moved to Python 3.8 #31518

Merged

dstandish approved these changes May 24, 2023

View reviewed changes

jedcunningham approved these changes May 25, 2023

View reviewed changes

jedcunningham merged commit bfb14f6 into apache:main May 25, 2023

jedcunningham deleted the push-app-id-to-xcom-livy-operator-deferrable branch May 25, 2023 20:16

eladkal mentioned this pull request Jun 20, 2023

Status of testing Providers that were prepared on June 20, 2023 #32030

Closed

86 tasks

Push Spark appId to XCOM for LivyOperator with deferrable mode #31201

Push Spark appId to XCOM for LivyOperator with deferrable mode #31201

Uh oh!

Conversation

pankajkoti commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pankajkoti commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phanikumv left a comment

Choose a reason for hiding this comment

Uh oh!

pankajastro May 12, 2023

Choose a reason for hiding this comment

Uh oh!

pankajastro May 12, 2023

Choose a reason for hiding this comment

Uh oh!

pankajkoti May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdsoha May 13, 2023

Choose a reason for hiding this comment

Uh oh!

pankajkoti May 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dstandish May 24, 2023

Choose a reason for hiding this comment

Uh oh!

pankajkoti May 24, 2023

Choose a reason for hiding this comment

Uh oh!

dstandish May 24, 2023

Choose a reason for hiding this comment

Uh oh!

pankajkoti May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jedcunningham May 25, 2023

Choose a reason for hiding this comment

Uh oh!

pankajkoti commented May 23, 2023

Uh oh!

jedcunningham left a comment

Choose a reason for hiding this comment

Uh oh!

pankajkoti commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pankajkoti commented May 11, 2023 •

edited

Loading

pankajkoti commented May 11, 2023 •

edited

Loading

pankajkoti May 12, 2023 •

edited

Loading

pankajkoti May 14, 2023 •

edited

Loading

pankajkoti May 24, 2023 •

edited

Loading

pankajkoti commented May 23, 2023 •

edited

Loading