-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[AIRFLOW-1467] Dynamic pooling via allowing tasks to use more than one pool slot (depending upon the need) #6975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
https://issues.apache.org/jira/browse/AIRFLOW-6227 could be another approach |
Codecov Report
@@ Coverage Diff @@
## master #6975 +/- ##
=========================================
+ Coverage 84.81% 84.9% +0.08%
=========================================
Files 679 680 +1
Lines 38491 38903 +412
=========================================
+ Hits 32648 33032 +384
- Misses 5843 5871 +28
Continue to review full report at Codecov.
|
|
@tooptoop4 Yes the approach looks good when multiple pools are required as described in the jira ticket. The problem statement mentioned in the jira ticket https://issues.apache.org/jira/browse/AIRFLOW-6227, can be handled via locking a file for write. That is, if the ask is to keep one writer on a table, then before triggering spark job, create another task that will put a file write lock on a file (name same as table name) in the file system (libraries such as fasteners or lockfile in a python operator can be used). This will make sure that at a time only one job will be triggered for the said table and makes the code more dynamic rather than creating pools every time a new table is introduced. and once the spark job finishes (weather the job fail or pass) then release the lock from the file. |
|
@lokeshlal +1 this is a great idea! Could you please make the unit tests a bit more comprehensive? (e.g. multiple pools) |
|
@dimberman Sorry for the confusion I created with my previous comment. I have not implemented multiple pools... only implemented a way to use more than one pool slot in a task. specifically trying to resolve https://issues.apache.org/jira/browse/AIRFLOW-1467 for example, provide number of pool slots a task will be using using pool_capacity property |
|
@dimberman I have updated pools test cases for task instances. Could you please review the code again. Thank you. |
|
@lokeshlal why do all of the test cases have priority_weight of 1? Could you do a test of weight=2 and of weight=0? |
|
@dimberman Yes I have modified test cases to include pool_capacity to be 0, 1 and 2. I have again modified to include the pool_capacity in other test cases as well. the build is under the process. hopefully I have not missed the any other test case. |
|
Thank you @lokeshlal. I'll merge this once tests pass. |
|
Thank you @dimberman. |
|
Hi @dimberman, I have done changes in few more test cases to pass pool_capacity property in the Mock object. Test cases also executed successfully. Could you please review the latest commit as well (commit after your approval). |
|
Hi @dimberman, Could you please review the remaining changes as well and if everything looks good, please merge the changes as well. thanks. |
|
@lokeshlal LGTM thanks for setting this up! |
|
Awesome work, congrats on your first merged pull request! |
|
Hello @lokeshlal @dimberman - I had to revert this change from master as it created two heads in sqlalchemy migrations and we have test to detect that. The root cause was that this PR was not rebased after the second head was added (otherwise it would have been detected by failing Travis CI run - last time Travis was run 6 days ago for this PR and the other head (#6489 ) was merged 3 days ago. It's no-one fault, it can happen in our process, but we will have to add better protection against such problems in the future. In the meantime can you please create another PR with this change, fix the duplicated heads (by making your migration based on the new head migration) and re-submit/merge the PR again. Sorry for the inconvenience @lokeshlal - I understand it was your first PR merged and you hit this - very rare - problem :(. We will do better in the future to avoid it. |
|
I prefer the "hacky" way. |
|
Thank you @dimberman and @potiuk - I have created another PR with a single migration head. #7160 |
|
We still have "kerberos" related instability. I restarted the build and hopefully it will pass. |
|
(working to fix the instability as well) |
…e pool slot (depending upon the need) (apache#6975) * adding pool capacity required for each task for dynamic pooling * Added pool_capacity column migration script * removed test checkedin file * removed extra space * correct test_database_schema_and_sqlalchemy_model_are_in_sync test case * Added description for pool_capacity property for task instance * Modified test cases to include pool_capacity along with pool in task instances * Modified test cases to include pool_capacity along with pool in task instances * Removed Column.name property, since property value is same as actual variable * check for pool_capacity property to be always >= 1 * removed unused variable ti * modified test cases for pool_capacity * modified test cases for pool_capacity
… than one pool slot (depending upon the need) (apache#6975)" This reverts commit 277d01d.
PR contains changes in pool and task instance to provide functionality to tasks to use more than one pool slot.
[AIRFLOW-XXXX]for document-only changesIn case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.