Skip to content

Conversation

@sseung00921
Copy link
Contributor

@sseung00921 sseung00921 commented Nov 15, 2023

closes: #35341

  1. Added name field to template_fields in EmrServerlessStartJobOperator as requested in issue Would it be possible to add 'name' to the list of template fields for EmrServerlessStartJobOperator? #35341.
  2. Moved default name setting operation from constructor to execute method
  3. Updated EmrServerlessStartJobOperator test class

Since self.name = name or self.config.pop("name", f"emr_serverless_job_airflow_{uuid4()}") code is in operator constructor, error occurs in specific situation .
As Taragolis showed well in below comment, if name param is not given and config is given no dict type, error occurs. For example, if config is given by upstream operator (like the example in below comment), error occurs because the type of config is PlainXComArg in this case . And it is a kind of bug which should be fixed.

If the problematic section of code is moved from constructor to execute method, the returned_value of upstream operator is converted to dict type (I do not know the exact principle but checked the fact). So the error above does not occur. So I moved the pop operation to execute method. And I think It might be what @Taragolis intended in his first guide.

I'm a beginner. If I am missing something, comments will make me better :)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Nov 15, 2023
@Taragolis
Copy link
Contributor

My initial point was, that this operator won't work well with all cases of templated fields around of name, so better fix it now and test, rather then keep of "one day in the future"

Example when this operator would fail during initialisation

import pendulum

from airflow.decorators import task
from airflow.models.dag import DAG
from airflow.providers.amazon.aws.operators.emr import EmrServerlessStartJobOperator

SPARK_JOB_DRIVER = {
    "sparkSubmit": {
        "entryPoint": "spam",
        "entryPointArguments": ["s3://spam/egg/output"],
        "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g\
            --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1",
    }
}

SPARK_CONFIGURATION_OVERRIDES = {
    "monitoringConfiguration": {"s3MonitoringConfiguration": {"logUri": "s3://spam/egg/logs"}}
}

with DAG(
    "pr_35648",
    start_date=pendulum.datetime(2023, 6, 1, tz="UTC"),
    schedule=None,
    catchup=False,
    tags=["pr", "35648"]
):

    @task
    def emr_config():
        return {
            "name": "new_application"
        }

    config = emr_config()

    EmrServerlessStartJobOperator(
        task_id="create_emr_serverless_task",
        application_id="foo",
        aws_conn_id=None,
        release_label="emr-6.6.0",
        job_type="SPARK",
        config=config,
        configuration_overrides={},
        execution_role_arn="foo-bar",
        job_driver=SPARK_JOB_DRIVER
    )
...
    self.name = name or self.config.pop("name", f"emr_serverless_job_airflow_{uuid4()}")
AttributeError: 'PlainXComArg' object has no attribute 'pop'

I have no objection to add only name as templated field and keep it for "better time" in the future. But I think it is a good time to fix this behaviour.

@sseung00921
Copy link
Contributor Author

@Taragolis I will look into it. Thanks!

@sseung00921 sseung00921 force-pushed the add-name-template-field branch 8 times, most recently from eedf498 to da75a0b Compare November 16, 2023 13:47
@potiuk potiuk force-pushed the add-name-template-field branch from da75a0b to 77b1e06 Compare November 16, 2023 17:35
@sseung00921 sseung00921 force-pushed the add-name-template-field branch from 77b1e06 to 20be958 Compare November 16, 2023 21:36
@potiuk potiuk merged commit 03a0b72 into apache:main Nov 17, 2023
@sseung00921 sseung00921 deleted the add-name-template-field branch November 18, 2023 01:21
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Would it be possible to add 'name' to the list of template fields for EmrServerlessStartJobOperator?

4 participants