Skip to content

Conversation

@EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Mar 11, 2024

What changes were proposed in this pull request?

Make Kubernetes resource manager support existing config spark.ui.custom.executor.log.url.

Allow for

spark.ui.custom.executor.log.url="https://my.custom.url/logs?app={{APP_ID}}&executor={{EXECUTOR_ID}}"

Supports these variables:

  • APP_ID: The unique application id
  • EXECUTOR_ID: The executor id (a positive integer larger than zero)
  • HOSTNAME: The name of the host where the executor runs
  • KUBERNETES_NAMESPACE: The namespace where the executor pods run
  • KUBERNETES_POD_NAME: The name of the pod that contains the executor
  • FILE_NAME: The name of the log, which is always "log"

Why are the changes needed?

Running Spark on Kubernetes requires persisting the logs elsewhere. Having the Spark UI link to those logs is very useful. This is currently only supported by YARN.

Does this PR introduce any user-facing change?

Spark UI provides links to logs when run on Kubernetes.

How was this patch tested?

Unit test and manually tested on minikube K8S cluster.

Was this patch authored or co-authored using generative AI tooling?

No

@mridulm
Copy link
Contributor

mridulm commented Mar 11, 2024

+CC @thejdeep

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the external log service for K8s is likely to use namespace and pod name to query the logs, could you please expose NAMESPACE too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the namespace.

@pan3793
Copy link
Member

pan3793 commented Mar 11, 2024

@EnricoMi this looks much simpler than my previous attempt #38357

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standalone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Please check the documentation for your cluster manager to see which patterns are supported, if any. is sufficient, there is no need to list which manager supports this conf and which don't. That list easily gets out-dated.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Mar 11, 2024

@EnricoMi this looks much simpler than my previous attempt #38357

@pan3793 Thanks for the pointer! Here is also a PR for driver log support (#45728) which borrows some code from your attempt (#38357).

@EnricoMi
Copy link
Contributor Author

CC @dongjoon-hyun

@EnricoMi EnricoMi force-pushed the k8s-custom-executor-log-url branch from 80070ef to 2f896c8 Compare April 20, 2024 16:23
@EnricoMi
Copy link
Contributor Author

@dongjoon-hyun What do you think?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 30, 2024
@EnricoMi EnricoMi force-pushed the k8s-custom-executor-log-url branch from 2f896c8 to 315a0bb Compare July 30, 2024 08:06
@github-actions github-actions bot closed this Jul 31, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late, @EnricoMi .

Apache Spark has been supported this feature. So, The configuration document is fixed in master/3.5/3.4. Could you try to follow the updated documentation?

spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_APP_ID='$(SPARK_APPLICATION_ID)'
spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_EXECUTOR_ID='$(SPARK_EXECUTOR_ID)'
spark.ui.custom.executor.log.url='https://log-server/log?appId={{APP_ID}}&execId={{

@EnricoMi
Copy link
Contributor Author

Looks like this works in master. Which versions before 4.0.0 support this?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 14, 2024

Looks like this works in master. Which versions before 4.0.0 support this?

All Apache Spark with K8s GA have been supporting it. So, SPARK-49176 is a documentation fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants