-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Chart: Enhance Celery Worker Sets support for multi-queue configurations #58547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chart: Enhance Celery Worker Sets support for multi-queue configurations #58547
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
5676fb7 to
513586f
Compare
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this looks good but would like another pair of eyes for review.
|
I haven't looked at either PR, but #56589 has been open for a month with this same feature. |
07c19f9 to
c9f7a1b
Compare
c9f7a1b to
f1f9831
Compare
|
Was also puzzled that I overlooked that there are two PRs for the same. I actually reviewed both and now after some days coming back and realized the overlap attempted to compare. I like THIS PR a bit more compared to #56589 because (1) is is leaner and better to read as diff and (2) also extends KEDA and HPA in the PR which is explicitly excluded in the other. @jedcunningham Can you check and compare and make a second pair of eyes? I would propose to merge this one. |
5583e84 to
2de9609
Compare
|
@jedcunningham ping? |
Miretpl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice change! Maybe it would be worth to add a newsfragment for this feature? 🤔
e1e7b26 to
6ef74c8
Compare
Miretpl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the change in the newsfragment mentioned by Jens, everything looks good to me. Just small potential nits to some comments
Miretpl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits in the newsfragment, despite that looks good.
6c7a56a to
088db1a
Compare
|
Would love to see helm charts 1.19.0 released soonish. |
280e025 to
4290dbd
Compare
|
Oh, unfortunately a conflict (again) from another PR merged in parallel :-( Sorry. |
This commit improves the configuration and scaling logic for Celery Worker Sets,
allowing for more robust multi-queue setups.
Key changes:
1. **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include
`AND queue = '...'`. Previously, the default worker's KEDA query could
incorrectly include tasks from other queues (e.g., those assigned to specific
worker sets), leading to incorrect scaling behavior.
2. **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`).
This allows users to easily disable the default worker if they wish to rely
solely on custom worker sets for specific queues, improving configuration clarity.
3. **Independent Resource Generation**: Refactored HPA and KEDA templates to
generate resources for the default worker and worker sets independently.
This resolves issues where the default worker's autoscaling resources were
sometimes suppressed when sets were defined.
4. **Test Updates**: Updated Helm tests to verify the new queue filtering logic
and the independent generation of worker resources.
This change updates the KEDA SQL query generation in the Helm chart to
properly handle multiple queues defined in `workers.queue`.
By using `splitList` and iterating over the queues, the generated SQL
now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`)
instead of a simple equality check. This ensures that the Horizontal
Pod Autoscaler scales correctly when workers are listening to multiple
queues.
This change updates `values.schema.json` to reflect the recent changes
in `values.yaml` for KEDA multi-queue support.
Specific changes:
- Sync default values for `workers.args` and `workers.keda.query`.
- Fix `lint-chart-schema` failure by changing `additionalProperties: true`
to `additionalProperties: {}` in `workers.sets`.
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets.
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Helm's mustMerge and merge functions do not preserve boolean false values because false is considered a "zero value" in Go templates. This caused workers.celery.persistence.enabled=false to be incorrectly overwritten by the deprecated workers.persistence.enabled default value (true) during context merging. The fix saves the persistence.enabled value before merge operations and restores it afterward. Also corrects the reference path from .Values.workers.celery.persistence.enabled to .Values.workers.persistence.enabled since the worker configuration is placed under .Values.workers in the new context.
4290dbd to
989dba0
Compare
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
|
Thanks for your patience as well making this PR! |
| ({"celery": {"replicas": None}}, 1), | ||
| ({"replicas": 2, "celery": {"replicas": 3}}, 3), | ||
| ({"replicas": 2, "celery": {"replicas": None}}, 2), | ||
| ({"replicas": 2, "celery": {"replicas": 2}}, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With moving from workers.replicas to workers.celery.replicas, we want to make sure that when users unset the workers.celery.replicas field, the behaviour will be as in previous releases (because we changed how chart behaves in replicas handling). Why was this change made?
I see now locally that this was changed as the default behaviour was changed. workers.replicas, if workers.celery.replicas is unset, does't change the value of replicas. I think it should be fixed in the template logic, not in the test case itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ons (apache#58547) * Chart: Enhance Celery Worker Sets support for multi-queue configurations This commit improves the configuration and scaling logic for Celery Worker Sets, allowing for more robust multi-queue setups. Key changes: 1. **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include `AND queue = '...'`. Previously, the default worker's KEDA query could incorrectly include tasks from other queues (e.g., those assigned to specific worker sets), leading to incorrect scaling behavior. 2. **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`). This allows users to easily disable the default worker if they wish to rely solely on custom worker sets for specific queues, improving configuration clarity. 3. **Independent Resource Generation**: Refactored HPA and KEDA templates to generate resources for the default worker and worker sets independently. This resolves issues where the default worker's autoscaling resources were sometimes suppressed when sets were defined. 4. **Test Updates**: Updated Helm tests to verify the new queue filtering logic and the independent generation of worker resources. * Chart: Support multiple queues in KEDA autoscaling query This change updates the KEDA SQL query generation in the Helm chart to properly handle multiple queues defined in `workers.queue`. By using `splitList` and iterating over the queues, the generated SQL now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`) instead of a simple equality check. This ensures that the Horizontal Pod Autoscaler scales correctly when workers are listening to multiple queues. * Chart: Update schema defaults and fix lint errors This change updates `values.schema.json` to reflect the recent changes in `values.yaml` for KEDA multi-queue support. Specific changes: - Sync default values for `workers.args` and `workers.keda.query`. - Fix `lint-chart-schema` failure by changing `additionalProperties: true` to `additionalProperties: {}` in `workers.sets`. * Chart: Move Celery worker set configuration under workers.celery * Chart: Add significant Helm chart newsfragment for multiple Celery worker sets * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Enhance Helm chart for multiple Celery worker sets This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets. * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Fix worker set persistence.enabled value not being preserved Helm's mustMerge and merge functions do not preserve boolean false values because false is considered a "zero value" in Go templates. This caused workers.celery.persistence.enabled=false to be incorrectly overwritten by the deprecated workers.persistence.enabled default value (true) during context merging. The fix saves the persistence.enabled value before merge operations and restores it afterward. Also corrects the reference path from .Values.workers.celery.persistence.enabled to .Values.workers.persistence.enabled since the worker configuration is placed under .Values.workers in the new context. --------- Co-authored-by: Glenn Huang 黃瀚陞 <glenn.hs.huang@foxconn.com> Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
…ons (apache#58547) * Chart: Enhance Celery Worker Sets support for multi-queue configurations This commit improves the configuration and scaling logic for Celery Worker Sets, allowing for more robust multi-queue setups. Key changes: 1. **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include `AND queue = '...'`. Previously, the default worker's KEDA query could incorrectly include tasks from other queues (e.g., those assigned to specific worker sets), leading to incorrect scaling behavior. 2. **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`). This allows users to easily disable the default worker if they wish to rely solely on custom worker sets for specific queues, improving configuration clarity. 3. **Independent Resource Generation**: Refactored HPA and KEDA templates to generate resources for the default worker and worker sets independently. This resolves issues where the default worker's autoscaling resources were sometimes suppressed when sets were defined. 4. **Test Updates**: Updated Helm tests to verify the new queue filtering logic and the independent generation of worker resources. * Chart: Support multiple queues in KEDA autoscaling query This change updates the KEDA SQL query generation in the Helm chart to properly handle multiple queues defined in `workers.queue`. By using `splitList` and iterating over the queues, the generated SQL now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`) instead of a simple equality check. This ensures that the Horizontal Pod Autoscaler scales correctly when workers are listening to multiple queues. * Chart: Update schema defaults and fix lint errors This change updates `values.schema.json` to reflect the recent changes in `values.yaml` for KEDA multi-queue support. Specific changes: - Sync default values for `workers.args` and `workers.keda.query`. - Fix `lint-chart-schema` failure by changing `additionalProperties: true` to `additionalProperties: {}` in `workers.sets`. * Chart: Move Celery worker set configuration under workers.celery * Chart: Add significant Helm chart newsfragment for multiple Celery worker sets * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Enhance Helm chart for multiple Celery worker sets This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets. * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Fix worker set persistence.enabled value not being preserved Helm's mustMerge and merge functions do not preserve boolean false values because false is considered a "zero value" in Go templates. This caused workers.celery.persistence.enabled=false to be incorrectly overwritten by the deprecated workers.persistence.enabled default value (true) during context merging. The fix saves the persistence.enabled value before merge operations and restores it afterward. Also corrects the reference path from .Values.workers.celery.persistence.enabled to .Values.workers.persistence.enabled since the worker configuration is placed under .Values.workers in the new context. --------- Co-authored-by: Glenn Huang 黃瀚陞 <glenn.hs.huang@foxconn.com> Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Description
This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling more flexible resource allocation and precise autoscaling configurations.
Why is this needed?
1. Flexible Worker Topologies
As Airflow adoption grows, platform teams often need to route tasks exclusively to specialized worker sets (e.g., GPU-optimized, Memory-optimized) without maintaining a generic "default" worker.
workers.enableDefaultflag allows users to configure a deployment consisting only of specialized worker sets defined inworkers.sets. This provides greater flexibility for teams to design their worker architecture exactly as needed.2. Multi-Queue Autoscaling Support
Complex workflows often require a single worker set to handle tasks from multiple specific queues (e.g.,
queue: "high-priority,vip").ScaledObjectgeneration to support comma-separated queue lists. By using the SQLIN (...)clause, we ensure that KEDA scales worker sets based on the precise aggregate workload of all their assigned queues.3. Granular Configuration Overrides
Different worker sets may require different operational strategies within the same cluster.
Changes
workers.enableDefault(default:true) tovalues.yaml.worker-kedaautoscaler.yamlto use SQLINclause for queue filtering, supporting multi-queue configurations (e.g.,queue: "a,b"->AND queue IN ('a','b')).workers.sets.Testing
helm-tests/tests/helm_tests/other/test_keda.pyto verify:INclause.workers.enableDefaultcorrectly controls the rendering of the default worker deployment.closes: #56591
closes: #34219
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.