Change weight priority type from Integer to Float #42410

molcay · 2024-09-23T09:13:43Z

This PR is about changing the type of priority_weight property of the TaskInstance. For different databases, we have different possible max values for the integer columns.

There are several attempts to fix this problem:

This PR contains the changes in #38160 (created by Taragolis) on top of #38222 as mentioned here.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

jscheffl · 2024-09-23T17:59:57Z

I don't understand the reason and need to change the priority weight from int to float. For me this adds just extra complexity and I see no benefit.

molcay · 2024-09-27T07:36:57Z

Hi @jscheffl,
There was this issue/discussion: #22781 which is about the integer out-of-range for the priority weight field. This error might happen for PostgreSQL. There were a lot of discussions around it in different PRs (that I mentioned in this PR's description). As far as I understand; the next planned move was to make this change after #38222 as mentioned #38160 (comment).

pierrejeambrun · 2024-09-27T08:30:50Z

This seems a little odd to me, just to be sure I understand what we are trying to solve, are we trying to address integer overflow for weights and operation on weights ?

I don't see the use case for using such big numbers for priority weights.

molcay · 2024-10-10T08:25:12Z

Hi @pierrejeambrun,

The prior target is to address integer overflow. Because it is different for different database engines.
Since we don't have any given range to the users, I think we just need to be sure that it is not failing or do this to avoid failing. I read all the discussions that were held before; this was the one of the previous implementation but closed and waited to make this change after #38222

jscheffl · 2024-10-10T19:10:32Z

Do we know what the "lowest range" supported in (positive and negative) range is? Then we could build a proper validation to prevent roll-over and document this.

Making this to float I assume will make other problems... like if you have a very large number, adding one will still be the same number :-D

VladaZakharova · 2024-10-31T10:49:10Z

hi @potiuk !
As I understand there was a discussion regarding the way we want to implement this logic (the whole list is in the description of the PR). As I understand, now we have some kind of new ideas what we want here, so maybe we can agree with @jscheffl on the way it will be better to implement. WDYT?

potiuk · 2024-10-31T20:48:57Z

Yeah. Actually I like the idea of changing it to float - this has far less problems than int (rollover etc.) and the problem with +1/-1 is virtually not existing IMHO - because when you get to the value where +1/-1 is the same, then .... it is already such a big number that it does not matter any more.

I'd be for rebasing and merging this change (@jscheffl ?)

jscheffl · 2024-11-01T09:53:07Z

I am not a big fan of floats - maybe because in my job history I had always long long discussions about FMA and reproducibility of execution... but whatever.

I would not block this but I still feel this is somewhat strange. integers have a very wide range in the default and we are talking about potentially a couple of thousand tasks we want to set priority. I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges. Anyway that would be hard to manage from a functional standpoint. Also I don't see a realistic setup where the priority weight strategy because of a very hghe long DAG would get to such boundaries.
And the argument that at a certain level a big big float with +1 is the same big big float and no harm... then we could also figure out the smallest bound of INT supported by Postgres/MySQL/whatsoever DB and just cap the upper/lower boundaries to prevent a roll-over.

It would be more convincing if there is a real problem/use case for such big priorities and not caused because DAG develops just add t00000000 many zeroes to the priority.

potiuk · 2024-11-01T20:35:41Z

I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges

Let's not forget that we have the priority-weight cumulation: upstream, downstream and eventually Custom: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule

I can quite easily imagine some big numbers when we have huge 1000s task dags with multiple upstream/downstream tasks (and layers) and especially with custom weights I can easily imagine those numbers to add up (or maybe multiply in custom rules if one wants to have more aggressive priority propagation).

jscheffl · 2024-11-02T18:37:44Z

I don't see a point that there is a real "functional need" to set priorities exponential or in another manner that - with a normal modelling - yo need such large ranges

Let's not forget that we have the priority-weight cumulation: upstream, downstream and eventually Custom: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/priority-weight.html#custom-weight-rule

I can quite easily imagine some big numbers when we have huge 1000s task dags with multiple upstream/downstream tasks (and layers) and especially with custom weights I can easily imagine those numbers to add up (or maybe multiply in custom rules if one wants to have more aggressive priority propagation).

I still don't see a real need and use of such high numbers. Yes we accumulate priority weights by making sums. Assume we have a DAG with 1000 tasks chained (I hope nobody is modelling this, will really run a long time) and we use a priority of 10k (=10000). Then the accumulated priority is at 10 million.

Looking into the INT value we use today the supported database have integer ranges with:

postgres: -2147483648 to +2147483647 (see https://www.postgresql.org/docs/current/datatype-numeric.html)
mysql: -2147483648 to +2147483647 (see https://dev.mysql.com/doc/refman/8.4/en/integer-types.html)
sqlite: -9223372036854775808 to +9223372036854775807 (see https://sqlite.org/datatype3.html)

This means I still can have 1000 tasks with a priority of 1 million in my DAG. Which is also something in the range of values you can model to fit into the INT range.

Instead of switching to float I think we should rather cap the values and ensure they can not roll-over. And add documentation about the limits. The limtis of postgres and mysql are the same and sound reasonable (else: there is also the option to switch to bigint of course if you want to support incredible numbers non-float).

pierrejeambrun · 2024-11-04T17:59:12Z

I am a little undecided on this one. On one hand if that is a use case, then why not, on the other hand, I tend to agree with Jens. I am having a hard time imagining a legitimate scenario for such big priority weights where downscaling to more reasonable values is not possible.

potiuk · 2024-11-04T18:28:38Z

@molcay -> is there a case that you might give as an example where it would be needed?

molcay · 2024-11-07T09:07:56Z

Hi @potiuk, I am sorry but I don't have any specific case that can be used as an example. I only know #22781 was the start. Maybe, @kosteev has an idea?

kosteev · 2024-11-07T13:42:00Z

Not pretending that this is very practical case, however in Cloud Composer we saw this (and not only once):

Imagine that you give customer an example DAG that looks like this:

...

dag = DAG(
    'dag_id',
    default_args=default_args,
    schedule_interval='*/10 * * * *',
)

# task with highest priority
t1 = BashOperator(
    task_id='task_id',
    bash_command='echo test',
    dag=dag,
    priority_weight=2**31 - 1)

Customer modifies DAG and adds extra task t2 by copying t1 and setting dependencies between them:

...

dag = DAG(
    'dag_id',
    default_args=default_args,
    schedule_interval='*/10 * * * *',
)

# task with highest priority
t1 = BashOperator(
    task_id='task_id',
    bash_command='echo test',
    dag=dag,
    priority_weight=2**31 - 1)

t2 = BashOperator(
    task_id='task_id2',
    bash_command='echo test2',
    dag=dag,
    priority_weight=2**31 - 1)

t1 >> t2

Then this DAG will cause an issue and break scheduler (because t2 priority_weight will overflow).

Btw, I found and example DAG like this on stackoverflow https://stackoverflow.com/questions/66098050/airflow-dag-not-triggered-at-schedule-time.

I am not saying that this is at all common, but it is very unexpected for user to have scheduler broken after slight modification of the DAG like this.

pierrejeambrun · 2024-11-07T14:29:24Z

Those are just unreasonable values and it is not a surprise to me that things break down the line.

If I code a mapped task that expand to (2*32 - 1) most likely things will also break.

We can always add a safeguard, or a better error handling on the scheduler side to make that more explicit, but changing the underlying datatype to allow usage of extremely high values does not achieve much (IMO).

Also I believe that this can already be solved by a custom cluster policy if a company encounter this edge case. (preventing people from using too high values)

kosteev · 2024-11-07T15:15:33Z

I agree that changing datatype doesn't solve issue radically.

I haven't read all the comments here, but my vote is to have validation/safeguard for this (e.g. validation during DAG parsing and throwing import error for such a DAG, that customer could see it).

In general the fact that Airflow scheduler (or DAG processor) can break because of user code doesn't look right, IMHO.

potiuk · 2024-11-11T11:41:38Z

Yeah. I am convinced now, that we should rather cap the values, not change the type.

molcay · 2024-11-13T09:56:42Z

Since #43611 is already approved and looks like it is about to merge, I am closing this PR

molcay requested review from XD-DENG, ashb, bbovenzi, ephraimbuddy, jscheffl, kaxil, pierrejeambrun, potiuk, ryanahamilton and uranusjr as code owners September 23, 2024 09:13

boring-cyborg bot added area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Sep 23, 2024

Change weight priority type from Integer to Float

7385aad

molcay force-pushed the fix/out-of-range-for-priority-weight branch from a82c385 to 7385aad Compare September 23, 2024 13:06

jscheffl mentioned this pull request Nov 2, 2024

Ensure priority weight is capped at 32-bit integer to prevent roll-over #43611

Merged

molcay closed this Nov 13, 2024

Change weight priority type from Integer to Float #42410

Change weight priority type from Integer to Float #42410

Uh oh!

Conversation

molcay commented Sep 23, 2024

Uh oh!

jscheffl commented Sep 23, 2024

Uh oh!

molcay commented Sep 27, 2024

Uh oh!

pierrejeambrun commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

molcay commented Oct 10, 2024

Uh oh!

jscheffl commented Oct 10, 2024

Uh oh!

VladaZakharova commented Oct 31, 2024

Uh oh!

potiuk commented Oct 31, 2024

Uh oh!

jscheffl commented Nov 1, 2024

Uh oh!

potiuk commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jscheffl commented Nov 2, 2024

Uh oh!

pierrejeambrun commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Nov 4, 2024

Uh oh!

molcay commented Nov 7, 2024

Uh oh!

kosteev commented Nov 7, 2024

Uh oh!

pierrejeambrun commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosteev commented Nov 7, 2024

Uh oh!

potiuk commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

molcay commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pierrejeambrun commented Sep 27, 2024 •

edited

Loading

potiuk commented Nov 1, 2024 •

edited

Loading

pierrejeambrun commented Nov 4, 2024 •

edited

Loading

pierrejeambrun commented Nov 7, 2024 •

edited

Loading

potiuk commented Nov 11, 2024 •

edited

Loading

molcay commented Nov 13, 2024 •

edited

Loading