-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod #39160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
How many subtasks will SPARK-40887 be divided into? If there are more, do you mind make SPARK-40887 as an |
|
two or three maybe, I plan to split independent parts from SPARK-40887 and rebase it once the split PR gets merged. not sure if valuable to create an |
|
If there are only two or three, it's OK |
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this design again in detail, I have a concern. Currently, Apache Spark uses K8s Service entity via DriverServiceFeatureStep to access Spark driver pod in K8s environment. The proposed design is a kind of exception. Do you think you can revise this log service design to use Driver Service instead?
|
@dongjoon-hyun The design does not force to use POD_NAME nor SVC_NAME as criteria to access driver/executor logs, it totally depends on how the external log service aggregates logs. To allow identifying Spark driver logs by service name, we just need to expose it as an attribute in So, here we need to expose the common attributes so that users can use them as criteria to fetch logs from the external log service. As I said in the proposal, I think the following attributes are generic
we can expose |
|
cross-refer comments from #40392 (comment)
Is the service name a kind of official API to allow 3rd party components to access Spark Driver in K8s? If yes, what about executor? My vision is exposing both driver and executor in an unified way to the log service, and aggregate logs by Pod is much straightforward, just like Yarn does, by container. So my first candidate is Pod Name, the second one is Pod IP. @dongjoon-hyun I do understand we should be careful to add each ENV variable, configuration, etc. If you think the Pod IP is acceptable, then it's sufficient now and we can get Driver Pod IP by env |
|
I found that apple/batch-processing-gateway uses Pod Name to fetch the log as well |
|
Another case, GoogleCloudPlatform/spark-on-k8s-operator also use Pod Name to fetch driver and executor log |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR proposes to expose
SPARK_DRIVER_POD_NAMEas an environment variable in Diver Pod.Why are the changes needed?
This is first part of SPARK-40887, the pod name could be a criteria for fetching logs from external log service.
Does this PR introduce any user-facing change?
Yes, new env variable
SPARK_DRIVER_POD_NAMEavailable in Driver Pod.How was this patch tested?
UT added.