Skip to content

[Spark Load] Use the yarn command to get status and kill the application #4346

@xy720

Description

@xy720

Motivation
Spark load currently needs to get status and kill application running on YARN cluster by hadoop-yarn-client api.However, this approach is not suitable for multiple environments.

For example, KILL operation requires authentication. If user has their own security authentication system which is different from hadoop official authentication (simple, kerberos), the KILL operation will fail.

Therefore, I suggest to add a configurable yarn environment, and use yarn command to get status and kill the application. By default, the official yarn environment is used, but users can configure their own environment.

Description
The format of the yarn command is generally as follows:

yarn --config confdir application <-kill | -status> <Application ID>

We can manage yarn configuration files's directory by --config option, and gernerate configuration files (e.g. core-site.xml) into the specified directory.

Furthermore, I plan to use script to generate configuration files.
The generated files will be like:

core-site.xml

<configuration>
     <property>
        <name>hadoop.job.ugi</name>
        <value>user,password</value>
    </property>
    <property>
      <name>hadoop.security.authentication</name>
      <value>simple</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>host:port</value>
    </property>
</configuration>

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/spark-loadIssues or PRs related to the spark load

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions