Skip to content

Conversation

@uranusjr
Copy link
Member

Finishing and close #19249. Fix #19241.

This coerce template variables to strings to work around a limitation in lazy_object_proxy when used against built-in types.

Some tests against macro functions are added to ensure macro functions all work against lazy object proxy objects and avoid regression in the future.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 17, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented Nov 17, 2021

I think test_airflow_context needs fixing @uranusjr

@uranusjr uranusjr force-pushed the macro-cast-to-string branch from 10ab701 to ba5cd22 Compare November 18, 2021 06:26
@uranusjr
Copy link
Member Author

Weird, this PR does not change anything related to that. I’ll rebase to latest and see if that fixes things...

This coerce template variables to strings to work around a limitation in
lazy_object_proxy when used against built-in types.

Some tests against macro functions are added to ensure macro functions
all work against lazy object proxy objects and avoid regression in the
future.

Co-Authored-By: ShaharPotash1 <67375630+ShaharPotash1@users.noreply.github.com>
@uranusjr uranusjr force-pushed the macro-cast-to-string branch from ba5cd22 to 1d6d179 Compare November 18, 2021 06:27
@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

I'd feel safer if it is rebased again after the fix from Ash :)

@potiuk potiuk merged commit fca2b19 into apache:main Nov 24, 2021
@uranusjr uranusjr deleted the macro-cast-to-string branch November 25, 2021 06:41
dillonjohnson pushed a commit to dillonjohnson/airflow that referenced this pull request Dec 1, 2021
This coerce template variables to strings to work around a limitation in
lazy_object_proxy when used against built-in types.

Some tests against macro functions are added to ensure macro functions
all work against lazy object proxy objects and avoid regression in the
future.
@jedcunningham jedcunningham added this to the Airflow 2.2.3 milestone Dec 7, 2021
jedcunningham pushed a commit that referenced this pull request Dec 7, 2021
This coerce template variables to strings to work around a limitation in
lazy_object_proxy when used against built-in types.

Some tests against macro functions are added to ensure macro functions
all work against lazy object proxy objects and avoid regression in the
future.

(cherry picked from commit fca2b19)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Dec 8, 2021
@abhishekshenoy
Copy link

@potiuk We recently upgraded to 2.2.2 and are facing this issue in our dags which have some macros as below :
SOURCE_OBJECT = "{{ var.json.bucket_path }}/recvd_dt={{ macros.ds_add(next_ds, -1) }}/"

Is upgrading to 2.2.3 the only solution ?

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

Is upgrading to 2.2.3 the only solution ?

It's certainly the best.

@abhishekshenoy
Copy link

@potiuk Upgrading will be an issue as it involves a long wait time. Have come up with the below 2 approaches as a fix for the time being.

  • Using Xcom
    def getRecvdDate(**kwargs):
        next_exec_date = kwargs.get('templates_dict').get('next_exec_date')
        days_delta = kwargs.get('templates_dict').get('time_delta')
        dt=datetime.strptime(next_exec_date, '%Y-%m-%d') + timedelta(days=float(days_delta))
        kwargs['ti'].xcom_push(key='recvd_date', value=dt.strftime("%Y-%m-%d"))
        return


    generate_next_exec_date_minus_one = PythonOperator(task_id='generate_next_exec_date_minus_one',
                        provide_context=True,
                        python_callable=getRecvdDate,
                        templates_dict={'next_exec_date': '{{ next_ds }}',
                                        'time_delta': '{{ var.json.macros_test.time_delta }}'},
                        dag=dag)


    xcom_based_recvd_date = GCSUploadSessionCompleteSensor(
        task_id='xcom_based_recvd_date',
        bucket=BUCKET_NAME,
        prefix="macros-test/recvd_dt={{ task_instance.xcom_pull(task_ids='generate_next_exec_date_minus_one', key='recvd_date') }}/",
        impersonation_chain=IMPERSONATE_SERVICE_ACCOUNT,
        timeout=1,
        soft_fail=True,
        dag=dag
    )
  • Using user-defined-macro
import croniter
 
sched = '0 6 * * sat'    

Define below under dag_args :
'user_defined_macros': {
    'custom_next_exec_date': next_exec_date,
}
 
def next_exec_date(dt,days_delta):
    cron = croniter.croniter(sched, dt)
    return cron.get_next(datetime) + timedelta(days=days_delta) 
 
SOURCE_OBJECT = "{{ var.json.macros-test.sourcedir }}/recvd_dt={{ ( custom_next_exec_date(ds,-1) ) }}/"
 
# Task 1: Checking the directory for file availability
file_check = GCSObjectsWtihPrefixExistenceSensor(
    task_id="check_current_date_directory_existence",
    bucket=INBOX_BUCKET,
    prefix=SOURCE_OBJECT,
    impersonation_chain=GCS_IMPERSONATE_SERVICE_ACCOUNT,
    timeout=5,
    soft_fail=False,
    dag=dag
)

@potiuk
Copy link
Member

potiuk commented Jan 25, 2022

@potiuk Upgrading will be an issue as it involves a long wait time. Have come up with the below 2 approaches as a fix for the time being.

Sure you can come up with your own solutions if "not upgrading to a released patchlevel that contains a fix" is your own choice. But please don't ask maintainers to come up with workarounds if they already provided a fix that you can easily install.

There is no way maintainers can give you another answer here - asking for it is quite unreasonable, because - the fix is there.

What happens if (say) tomorrow we release 2.2.4 with a critical security fix (like Log4j) ? Would you take the risk of scrambling and trying to upgrade everything under the pressure of not only your security team but also your boss who will insist on fixing it "now".

By not upgrading to latest released patchlevel, especially when it contains a fix to a known problem you have and opting to choose a workaround instead, you make yourself vulnerable to this scenario. But do not ask maintainers to either help you with it or endorse it, because we would be shooting ourselves in a foot (and your foot) by suggesting it.

Of course, you are free to do what you want - we cannot stop you. But you've been warned.

BTW. No 2.2.3 does not have any long migrations comparing to 2.2.2: https://airflow.apache.org/docs/apache-airflow/stable/migrations-ref.html?highlight=migrations it just adds one column to one table which should be almost instantenous. I'd really, really recommend you to upgrade.

@abhishekshenoy
Copy link

I understand that airflow version upgrade is the optimal solution for this and we will be doing the same in dev . The long wait time is not w.r.t to Airflow migration in itself but certain process that we need to adhere to , any version upgrade needs to go under sanity test in dev for a period of time before it is pushed to prod.

@potiuk
Copy link
Member

potiuk commented Jan 25, 2022

I can only sympathise with your pain.

But it does not change my answer.

@potiuk
Copy link
Member

potiuk commented Jan 25, 2022

Suggestion (this is the advice I have from one of the best Project Managers I worked with)

Upgrade often, upgrade as early as you get new version released. This way your processes to upgrade get perfected, and you will figure out how to do them faster. And when the time comes you need to upgrade in 2 hours when there is an urgent security fix - you will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

next_ds changed to proxy and it cannot be used in ds_add macro function

4 participants