Skip to content

Conversation

@kosteev
Copy link
Contributor

@kosteev kosteev commented Apr 6, 2022

closes: #22781

@kosteev kosteev requested review from XD-DENG, ashb and kaxil as code owners April 6, 2022 14:40
@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

The DAG shared on the issue runs just fine with Sqlite:
Screen Shot 2022-04-06 at 17 55 52

Screen Shot 2022-04-06 at 17 55 41

I'm not sure if this is the right fix. Why should the model be concerned with issues originated from a specific DB?
I suspect that this could be also reported on retries, pool_slots etc.
I think the fix should be in the scheduler level (at least by not crashing) / raising broken DAG?

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

The fix is specific to Postgres I believe. Every database has different integer ranges so this will need to contain some if-else branches to cover each of them. Although honestly maybe it’s even better to just document this is a database limitation and you just can’t go over the limit.

@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

The fix is specific to Postgres I believe. Every database has different integer ranges so this will need to contain some if-else branches to cover each of them. Although honestly maybe it’s even better to just document this is a database limitation and you just can’t go over the limit.

I agree but the point of concern is that this cause the scheduler to crash which is not good.
It means that one dag can cause cluster wide problems. I think we need to find a way to locallize the effect so only the problematic dag will be effected.

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

Yeah the scheduler should not crash, but perhaps we could catch the exception or something like that. I seem to recall there’s a similar situation for something else a while ago (datetime out of range?)

@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

I recall only #17003 which also caused scheduler crash with divide by zero

@ashb
Copy link
Member

ashb commented Apr 6, 2022

Is silently clamping it right, or should this be a parse time DAG import error?

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

Ah found it, 2213635

So I guess the better fix here is to do something similar, check for Postgres, and prevent an out-of-bounds value to be assigned for Postgres specifically.

+1 to an import error.

@kosteev
Copy link
Contributor Author

kosteev commented Apr 7, 2022

All right, validating it during DAG import and throwing error makes sense.
What about different databases? Should we handle all the databases taking into account different constraints of each?

@uranusjr
Copy link
Member

uranusjr commented Apr 7, 2022

Let’s start with adding just a check for Postgres now, we can add more later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error "NumericValueOutOfRange integer out of range" in scheduler, task_insance.priority_weight column overflow

4 participants