-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[EDGE]Enable edge worker maintenance mode #45958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EDGE]Enable edge worker maintenance mode #45958
Conversation
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope CI is turning green now, then LGTM!
* Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * add new colors for maintenance modes * modfiy host html * fix quote * remove more debug prints * fix mypy * another mypy fix * final mypy fix * add pytests * Update edge worker versions * fix spelling * update versions * create pydantic class * fix pytest * update docs * apply review findings * moved logic from plugins * immidiate hertbeat if state cahgnes * openapi fix * fix condition * exclude csfr checks * return WorkerSetStateReturn by worker_set_state * add debug print * try new return * fix heartbeat state * fix expresiion * fix variable isue * fix logic * fix pytest * fix pytest * minor fix * fix airflow 3 compatibility --------- Co-authored-by: Majoros Donat (XC-DX/EET2-Bp) <donat.majoros2@hu.bosch.com>
* Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * add new colors for maintenance modes * modfiy host html * fix quote * remove more debug prints * fix mypy * another mypy fix * final mypy fix * add pytests * Update edge worker versions * fix spelling * update versions * create pydantic class * fix pytest * update docs * apply review findings * moved logic from plugins * immidiate hertbeat if state cahgnes * openapi fix * fix condition * exclude csfr checks * return WorkerSetStateReturn by worker_set_state * add debug print * try new return * fix heartbeat state * fix expresiion * fix variable isue * fix logic * fix pytest * fix pytest * minor fix * fix airflow 3 compatibility --------- Co-authored-by: Majoros Donat (XC-DX/EET2-Bp) <donat.majoros2@hu.bosch.com>
providers/edge/src/airflow/providers/edge/plugins/edge_executor_plugin.py
Show resolved
Hide resolved
The diagram suggest changing mode depends only on the count of jobs. Is there a way for cluster admin to force entering to mantanince mode? (Force kill all existed jobs)? |
This is no option directly, although it is visible that which jobs are executed by the worker, so they can be killed individually. |
I think a force maintenance still could be implemented. If somebody wants/needs this. The intend of the current implementation is a graceful drain of running jobs. The "pending" stzate is the transition, same like if you send a SIGINT to a Celery worker, then also the worker does not pull new jobs (stps consuming from queue) but will attempt to complete all jobs and then terminate. Yeah, if there is a urgent demand (that is how I do during testing to be faster) I check the jobs list page and then find the task in execution and mark as failed/success. Not a single click solution but basically a manual workaround. Not often used by me, just in testing :-D |
* Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * add new colors for maintenance modes * modfiy host html * fix quote * remove more debug prints * fix mypy * another mypy fix * final mypy fix * add pytests * Update edge worker versions * fix spelling * update versions * create pydantic class * fix pytest * update docs * apply review findings * moved logic from plugins * immidiate hertbeat if state cahgnes * openapi fix * fix condition * exclude csfr checks * return WorkerSetStateReturn by worker_set_state * add debug print * try new return * fix heartbeat state * fix expresiion * fix variable isue * fix logic * fix pytest * fix pytest * minor fix * fix airflow 3 compatibility --------- Co-authored-by: Majoros Donat (XC-DX/EET2-Bp) <donat.majoros2@hu.bosch.com>
* Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * Initial implementation * fix import * fix brackets * try harder * fix name * change to get request * debug print * try-except * new return type * fix expose * use str in db * link fix * fix variable length * MAINTENANCE_REQUEST * add valid return * add valid return statement * try different naming * more debug print * extend enum * new button to enter maintenance mode * Implement maintenance off * try new redirect * fix redirect url * commit sesion * try another endpoint * remove debug print * exit print * static checks * add new colors for maintenance modes * modfiy host html * fix quote * remove more debug prints * fix mypy * another mypy fix * final mypy fix * add pytests * Update edge worker versions * fix spelling * update versions * create pydantic class * fix pytest * update docs * apply review findings * moved logic from plugins * immidiate hertbeat if state cahgnes * openapi fix * fix condition * exclude csfr checks * return WorkerSetStateReturn by worker_set_state * add debug print * try new return * fix heartbeat state * fix expresiion * fix variable isue * fix logic * fix pytest * fix pytest * minor fix * fix airflow 3 compatibility --------- Co-authored-by: Majoros Donat (XC-DX/EET2-Bp) <donat.majoros2@hu.bosch.com>
Maintenance mode is enabled for the edge worker. In maintenance mode, the worker is alive, but cannot consume any jobs.
The maintenance mode can be triggered by a button from the edge worker status page. It writes the state "maintenance request" directly to the database as worker state. Then the worker will go to maintenance pending if there are running jobs, and maintenance mode if all jobs have finished.
When exiting maintenance mode, maintenance exit is written to the database. Then the worker will switch to running state if it was in state maintenance pending, and to idle if it was in maintenance mode.
Why do we need the state maintenance exit?
If the user requested maintenance, so the maintenance request is in the database, and the user wants to exit maintenance immidiately e.g. for misclick, then we will not know if we should write running or idle to the database.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.