Chart: Enhance Celery Worker Sets support for multi-queue configurations #58547

glennhsh · 2025-11-21T07:05:03Z

Description

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling more flexible resource allocation and precise autoscaling configurations.

Why is this needed?

1. Flexible Worker Topologies
As Airflow adoption grows, platform teams often need to route tasks exclusively to specialized worker sets (e.g., GPU-optimized, Memory-optimized) without maintaining a generic "default" worker.

Enhancement: The new workers.enableDefault flag allows users to configure a deployment consisting only of specialized worker sets defined in workers.sets. This provides greater flexibility for teams to design their worker architecture exactly as needed.

2. Multi-Queue Autoscaling Support
Complex workflows often require a single worker set to handle tasks from multiple specific queues (e.g., queue: "high-priority,vip").

Enhancement: This PR updates the KEDA ScaledObject generation to support comma-separated queue lists. By using the SQL IN (...) clause, we ensure that KEDA scales worker sets based on the precise aggregate workload of all their assigned queues.

3. Granular Configuration Overrides
Different worker sets may require different operational strategies within the same cluster.

Enhancement: This change improves the configuration merge logic, allowing individual worker sets to override global settings. For example, a user can now enable KEDA globally but explicitly disable it for a specific worker set that requires a static number of replicas.

Changes

New Feature: Added workers.enableDefault (default: true) to values.yaml.
Enhancement: Updated worker-kedaautoscaler.yaml to use SQL IN clause for queue filtering, supporting multi-queue configurations (e.g., queue: "a,b" -> AND queue IN ('a','b')).
Refactor: Standardized template rendering to ensure consistent behavior between the default worker and workers.sets.

Testing

Added test cases in helm-tests/tests/helm_tests/other/test_keda.py to verify:
- Correct SQL generation for single queues.
- Correct SQL generation for comma-separated queue lists using the IN clause.
- Proper handling of whitespace in queue configurations.
Verified that workers.enableDefault correctly controls the rendering of the default worker deployment.

closes: #56591
closes: #34219

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

boring-cyborg · 2025-11-21T07:05:06Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

jscheffl

For me this looks good but would like another pair of eyes for review.

jedcunningham · 2025-11-24T03:14:40Z

I haven't looked at either PR, but #56589 has been open for a month with this same feature.

jscheffl · 2025-12-07T22:22:21Z

Was also puzzled that I overlooked that there are two PRs for the same. I actually reviewed both and now after some days coming back and realized the overlap attempted to compare.

I like THIS PR a bit more compared to #56589 because (1) is is leaner and better to read as diff and (2) also extends KEDA and HPA in the PR which is explicitly excluded in the other.
Even though the other was there previously I'd propose to merge this one.

@jedcunningham Can you check and compare and make a second pair of eyes? I would propose to merge this one.

jscheffl · 2025-12-14T14:26:02Z

@jedcunningham ping?

Miretpl

Really nice change! Maybe it would be worth to add a newsfragment for this feature? 🤔

chart/templates/workers/worker-deployment.yaml

chart/newsfragments/58547.significant.rst

Miretpl

Despite the change in the newsfragment mentioned by Jens, everything looks good to me. Just small potential nits to some comments

chart/values.yaml

Miretpl

Small nits in the newsfragment, despite that looks good.

chart/newsfragments/58547.significant.rst

ronaldorcampos · 2026-01-05T12:22:27Z

Would love to see helm charts 1.19.0 released soonish.

jscheffl · 2026-01-10T15:00:18Z

Oh, unfortunately a conflict (again) from another PR merged in parallel :-( Sorry.

This commit improves the configuration and scaling logic for Celery Worker Sets, allowing for more robust multi-queue setups. Key changes: 1. **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include `AND queue = '...'`. Previously, the default worker's KEDA query could incorrectly include tasks from other queues (e.g., those assigned to specific worker sets), leading to incorrect scaling behavior. 2. **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`). This allows users to easily disable the default worker if they wish to rely solely on custom worker sets for specific queues, improving configuration clarity. 3. **Independent Resource Generation**: Refactored HPA and KEDA templates to generate resources for the default worker and worker sets independently. This resolves issues where the default worker's autoscaling resources were sometimes suppressed when sets were defined. 4. **Test Updates**: Updated Helm tests to verify the new queue filtering logic and the independent generation of worker resources.

This change updates the KEDA SQL query generation in the Helm chart to properly handle multiple queues defined in `workers.queue`. By using `splitList` and iterating over the queues, the generated SQL now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`) instead of a simple equality check. This ensures that the Horizontal Pod Autoscaler scales correctly when workers are listening to multiple queues.

This change updates `values.schema.json` to reflect the recent changes in `values.yaml` for KEDA multi-queue support. Specific changes: - Sync default values for `workers.args` and `workers.keda.query`. - Fix `lint-chart-schema` failure by changing `additionalProperties: true` to `additionalProperties: {}` in `workers.sets`.

…rker sets

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets.

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

Helm's mustMerge and merge functions do not preserve boolean false values because false is considered a "zero value" in Go templates. This caused workers.celery.persistence.enabled=false to be incorrectly overwritten by the deprecated workers.persistence.enabled default value (true) during context merging. The fix saves the persistence.enabled value before merge operations and restores it afterward. Also corrects the reference path from .Values.workers.celery.persistence.enabled to .Values.workers.persistence.enabled since the worker configuration is placed under .Values.workers in the new context.

boring-cyborg · 2026-01-12T05:40:17Z

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

jscheffl · 2026-01-12T05:40:37Z

Thanks for your patience as well making this PR!

Miretpl · 2026-01-12T18:19:57Z

helm-tests/tests/helm_tests/airflow_core/test_worker.py

            ({"celery": {"replicas": None}}, 1),
            ({"replicas": 2, "celery": {"replicas": 3}}, 3),
-            ({"replicas": 2, "celery": {"replicas": None}}, 2),
+            ({"replicas": 2, "celery": {"replicas": 2}}, 2),


With moving from workers.replicas to workers.celery.replicas, we want to make sure that when users unset the workers.celery.replicas field, the behaviour will be as in previous releases (because we changed how chart behaves in replicas handling). Why was this change made?

I see now locally that this was changed as the default behaviour was changed. workers.replicas, if workers.celery.replicas is unset, does't change the value of replicas. I think it should be fixed in the template logic, not in the test case itself

I think I've managed to fix it. @glennhsh, could you do a review of the change #60420? For making sure that workers.celery.sets are working correctly, I've only used the current unit tests

…ons (apache#58547) * Chart: Enhance Celery Worker Sets support for multi-queue configurations This commit improves the configuration and scaling logic for Celery Worker Sets, allowing for more robust multi-queue setups. Key changes: 1. **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include `AND queue = '...'`. Previously, the default worker's KEDA query could incorrectly include tasks from other queues (e.g., those assigned to specific worker sets), leading to incorrect scaling behavior. 2. **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`). This allows users to easily disable the default worker if they wish to rely solely on custom worker sets for specific queues, improving configuration clarity. 3. **Independent Resource Generation**: Refactored HPA and KEDA templates to generate resources for the default worker and worker sets independently. This resolves issues where the default worker's autoscaling resources were sometimes suppressed when sets were defined. 4. **Test Updates**: Updated Helm tests to verify the new queue filtering logic and the independent generation of worker resources. * Chart: Support multiple queues in KEDA autoscaling query This change updates the KEDA SQL query generation in the Helm chart to properly handle multiple queues defined in `workers.queue`. By using `splitList` and iterating over the queues, the generated SQL now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`) instead of a simple equality check. This ensures that the Horizontal Pod Autoscaler scales correctly when workers are listening to multiple queues. * Chart: Update schema defaults and fix lint errors This change updates `values.schema.json` to reflect the recent changes in `values.yaml` for KEDA multi-queue support. Specific changes: - Sync default values for `workers.args` and `workers.keda.query`. - Fix `lint-chart-schema` failure by changing `additionalProperties: true` to `additionalProperties: {}` in `workers.sets`. * Chart: Move Celery worker set configuration under workers.celery * Chart: Add significant Helm chart newsfragment for multiple Celery worker sets * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/values.yaml Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Enhance Helm chart for multiple Celery worker sets This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets. * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Update chart/newsfragments/58547.significant.rst Co-authored-by: Przemysław Mirowski <miretpl@gmail.com> * Fix worker set persistence.enabled value not being preserved Helm's mustMerge and merge functions do not preserve boolean false values because false is considered a "zero value" in Go templates. This caused workers.celery.persistence.enabled=false to be incorrectly overwritten by the deprecated workers.persistence.enabled default value (true) during context merging. The fix saves the persistence.enabled value before merge operations and restores it afterward. Also corrects the reference path from .Values.workers.celery.persistence.enabled to .Values.workers.persistence.enabled since the worker configuration is placed under .Values.workers in the new context. --------- Co-authored-by: Glenn Huang 黃瀚陞 <glenn.hs.huang@foxconn.com> Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

glennhsh requested review from dstandish, hussein-awala, jedcunningham and jscheffl as code owners November 21, 2025 07:05

boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Nov 21, 2025

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 3 times, most recently from 5676fb7 to 513586f Compare November 23, 2025 01:37

jscheffl approved these changes Nov 23, 2025

View reviewed changes

jscheffl added this to the Airflow Helm Chart 1.19.0 milestone Nov 23, 2025

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 2 times, most recently from 07c19f9 to c9f7a1b Compare November 26, 2025 08:03

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from c9f7a1b to f1f9831 Compare December 4, 2025 02:00

jscheffl mentioned this pull request Dec 7, 2025

Add support for multiple Celery worker groups with queue-specific configurations in Helm chart #56589

Closed

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 2 times, most recently from 5583e84 to 2de9609 Compare December 11, 2025 00:31

Miretpl reviewed Dec 15, 2025

View reviewed changes

chart/templates/workers/worker-deployment.yaml Outdated Show resolved Hide resolved

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 3 times, most recently from e1e7b26 to 6ef74c8 Compare December 16, 2025 15:14

jscheffl reviewed Dec 16, 2025

View reviewed changes

chart/newsfragments/58547.significant.rst Show resolved Hide resolved

Miretpl reviewed Dec 19, 2025

View reviewed changes

chart/values.yaml Outdated Show resolved Hide resolved

chart/values.yaml Outdated Show resolved Hide resolved

Miretpl approved these changes Dec 22, 2025

View reviewed changes

chart/newsfragments/58547.significant.rst Outdated Show resolved Hide resolved

chart/newsfragments/58547.significant.rst Outdated Show resolved Hide resolved

chart/newsfragments/58547.significant.rst Outdated Show resolved Hide resolved

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 6c7a56a to 088db1a Compare December 23, 2025 00:48

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 280e025 to 4290dbd Compare January 10, 2026 06:11

Glenn Huang 黃瀚陞 and others added 12 commits January 12, 2026 08:47

Chart: Move Celery worker set configuration under workers.celery

9498272

Chart: Add significant Helm chart newsfragment for multiple Celery wo…

e820684

…rker sets

Update chart/values.yaml

30b2062

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

Update chart/values.yaml

cb15387

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

Update chart/newsfragments/58547.significant.rst

a8b4b48

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

Update chart/newsfragments/58547.significant.rst

d03dcc7

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

Update chart/newsfragments/58547.significant.rst

90b089d

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 4290dbd to 989dba0 Compare January 12, 2026 01:00

jscheffl merged commit 26a9d3b into apache:main Jan 12, 2026
95 checks passed

Miretpl reviewed Jan 12, 2026

View reviewed changes

Miretpl mentioned this pull request Jan 12, 2026

Fix Compatibility of Celery Worker Sets with Workers Separation #60420

Merged

1 task

Miretpl mentioned this pull request Jan 17, 2026

Revert "Separate workers service accounts (#52357)" #60721

Merged

1 task

jedcunningham mentioned this pull request Jan 30, 2026

Status of testing of Apache Airflow Helm Chart 1.19.0rc1 #61268

Open

96 tasks

Chart: Enhance Celery Worker Sets support for multi-queue configurations #58547

Chart: Enhance Celery Worker Sets support for multi-queue configurations #58547

Conversation

glennhsh commented Nov 21, 2025 • edited by eladkal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why is this needed?

Changes

Testing

Uh oh!

boring-cyborg bot commented Nov 21, 2025

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

jedcunningham commented Nov 24, 2025

Uh oh!

jscheffl commented Dec 7, 2025

Uh oh!

jscheffl commented Dec 14, 2025

Uh oh!

Miretpl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Miretpl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Miretpl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ronaldorcampos commented Jan 5, 2026

Uh oh!

jscheffl commented Jan 10, 2026

Uh oh!

Uh oh!

boring-cyborg bot commented Jan 12, 2026

Uh oh!

jscheffl commented Jan 12, 2026

Uh oh!

Miretpl Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miretpl Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

glennhsh commented Nov 21, 2025 •

edited by eladkal

Loading

Miretpl Jan 12, 2026 •

edited

Loading