Skip to content

Conversation

@glennhsh
Copy link
Contributor

@glennhsh glennhsh commented Nov 21, 2025

Description

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling more flexible resource allocation and precise autoscaling configurations.

Why is this needed?

1. Flexible Worker Topologies
As Airflow adoption grows, platform teams often need to route tasks exclusively to specialized worker sets (e.g., GPU-optimized, Memory-optimized) without maintaining a generic "default" worker.

  • Enhancement: The new workers.enableDefault flag allows users to configure a deployment consisting only of specialized worker sets defined in workers.sets. This provides greater flexibility for teams to design their worker architecture exactly as needed.

2. Multi-Queue Autoscaling Support
Complex workflows often require a single worker set to handle tasks from multiple specific queues (e.g., queue: "high-priority,vip").

  • Enhancement: This PR updates the KEDA ScaledObject generation to support comma-separated queue lists. By using the SQL IN (...) clause, we ensure that KEDA scales worker sets based on the precise aggregate workload of all their assigned queues.

3. Granular Configuration Overrides
Different worker sets may require different operational strategies within the same cluster.

  • Enhancement: This change improves the configuration merge logic, allowing individual worker sets to override global settings. For example, a user can now enable KEDA globally but explicitly disable it for a specific worker set that requires a static number of replicas.

Changes

  • New Feature: Added workers.enableDefault (default: true) to values.yaml.
  • Enhancement: Updated worker-kedaautoscaler.yaml to use SQL IN clause for queue filtering, supporting multi-queue configurations (e.g., queue: "a,b" -> AND queue IN ('a','b')).
  • Refactor: Standardized template rendering to ensure consistent behavior between the default worker and workers.sets.

Testing

  • Added test cases in helm-tests/tests/helm_tests/other/test_keda.py to verify:
    • Correct SQL generation for single queues.
    • Correct SQL generation for comma-separated queue lists using the IN clause.
    • Proper handling of whitespace in queue configurations.
  • Verified that workers.enableDefault correctly controls the rendering of the default worker deployment.

closes: #56591
closes: #34219


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Nov 21, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Nov 21, 2025
@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 3 times, most recently from 5676fb7 to 513586f Compare November 23, 2025 01:37
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this looks good but would like another pair of eyes for review.

@jscheffl jscheffl added this to the Airflow Helm Chart 1.19.0 milestone Nov 23, 2025
@jedcunningham
Copy link
Member

I haven't looked at either PR, but #56589 has been open for a month with this same feature.

@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 2 times, most recently from 07c19f9 to c9f7a1b Compare November 26, 2025 08:03
@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from c9f7a1b to f1f9831 Compare December 4, 2025 02:00
@jscheffl
Copy link
Contributor

jscheffl commented Dec 7, 2025

Was also puzzled that I overlooked that there are two PRs for the same. I actually reviewed both and now after some days coming back and realized the overlap attempted to compare.

I like THIS PR a bit more compared to #56589 because (1) is is leaner and better to read as diff and (2) also extends KEDA and HPA in the PR which is explicitly excluded in the other.
Even though the other was there previously I'd propose to merge this one.

@jedcunningham Can you check and compare and make a second pair of eyes? I would propose to merge this one.

@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 2 times, most recently from 5583e84 to 2de9609 Compare December 11, 2025 00:31
@jscheffl
Copy link
Contributor

@jedcunningham ping?

Copy link
Contributor

@Miretpl Miretpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice change! Maybe it would be worth to add a newsfragment for this feature? 🤔

@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch 3 times, most recently from e1e7b26 to 6ef74c8 Compare December 16, 2025 15:14
Copy link
Contributor

@Miretpl Miretpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the change in the newsfragment mentioned by Jens, everything looks good to me. Just small potential nits to some comments

Copy link
Contributor

@Miretpl Miretpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits in the newsfragment, despite that looks good.

@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 6c7a56a to 088db1a Compare December 23, 2025 00:48
@ronaldorcampos
Copy link
Contributor

Would love to see helm charts 1.19.0 released soonish.

@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 280e025 to 4290dbd Compare January 10, 2026 06:11
@jscheffl
Copy link
Contributor

Oh, unfortunately a conflict (again) from another PR merged in parallel :-( Sorry.

Glenn Huang 黃瀚陞 and others added 12 commits January 12, 2026 08:47
This commit improves the configuration and scaling logic for Celery Worker Sets,
allowing for more robust multi-queue setups.

Key changes:
1.  **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include
    `AND queue = '...'`. Previously, the default worker's KEDA query could
    incorrectly include tasks from other queues (e.g., those assigned to specific
    worker sets), leading to incorrect scaling behavior.

2.  **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`).
    This allows users to easily disable the default worker if they wish to rely
    solely on custom worker sets for specific queues, improving configuration clarity.

3.  **Independent Resource Generation**: Refactored HPA and KEDA templates to
    generate resources for the default worker and worker sets independently.
    This resolves issues where the default worker's autoscaling resources were
    sometimes suppressed when sets were defined.

4.  **Test Updates**: Updated Helm tests to verify the new queue filtering logic
    and the independent generation of worker resources.
This change updates the KEDA SQL query generation in the Helm chart to
properly handle multiple queues defined in `workers.queue`.

By using `splitList` and iterating over the queues, the generated SQL
now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`)
instead of a simple equality check. This ensures that the Horizontal
Pod Autoscaler scales correctly when workers are listening to multiple
queues.
This change updates `values.schema.json` to reflect the recent changes
in `values.yaml` for KEDA multi-queue support.

Specific changes:
- Sync default values for `workers.args` and `workers.keda.query`.
- Fix `lint-chart-schema` failure by changing `additionalProperties: true`
  to `additionalProperties: {}` in `workers.sets`.
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets.
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Helm's mustMerge and merge functions do not preserve boolean false
values because false is considered a "zero value" in Go templates.
This caused workers.celery.persistence.enabled=false to be
incorrectly overwritten by the deprecated workers.persistence.enabled
default value (true) during context merging.

The fix saves the persistence.enabled value before merge operations
and restores it afterward. Also corrects the reference path from
.Values.workers.celery.persistence.enabled to
.Values.workers.persistence.enabled since the worker configuration
is placed under .Values.workers in the new context.
@glennhsh glennhsh force-pushed the feature/airflow-multiple-celery-worker-support branch from 4290dbd to 989dba0 Compare January 12, 2026 01:00
@jscheffl jscheffl merged commit 26a9d3b into apache:main Jan 12, 2026
95 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 12, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@jscheffl
Copy link
Contributor

Thanks for your patience as well making this PR!

({"celery": {"replicas": None}}, 1),
({"replicas": 2, "celery": {"replicas": 3}}, 3),
({"replicas": 2, "celery": {"replicas": None}}, 2),
({"replicas": 2, "celery": {"replicas": 2}}, 2),
Copy link
Contributor

@Miretpl Miretpl Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With moving from workers.replicas to workers.celery.replicas, we want to make sure that when users unset the workers.celery.replicas field, the behaviour will be as in previous releases (because we changed how chart behaves in replicas handling). Why was this change made?

I see now locally that this was changed as the default behaviour was changed. workers.replicas, if workers.celery.replicas is unset, does't change the value of replicas. I think it should be fixed in the template logic, not in the test case itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've managed to fix it. @glennhsh, could you do a review of the change #60420? For making sure that workers.celery.sets are working correctly, I've only used the current unit tests

iharsh02 pushed a commit to iharsh02/airflow that referenced this pull request Jan 13, 2026
…ons (apache#58547)

* Chart: Enhance Celery Worker Sets support for multi-queue configurations

This commit improves the configuration and scaling logic for Celery Worker Sets,
allowing for more robust multi-queue setups.

Key changes:
1.  **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include
    `AND queue = '...'`. Previously, the default worker's KEDA query could
    incorrectly include tasks from other queues (e.g., those assigned to specific
    worker sets), leading to incorrect scaling behavior.

2.  **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`).
    This allows users to easily disable the default worker if they wish to rely
    solely on custom worker sets for specific queues, improving configuration clarity.

3.  **Independent Resource Generation**: Refactored HPA and KEDA templates to
    generate resources for the default worker and worker sets independently.
    This resolves issues where the default worker's autoscaling resources were
    sometimes suppressed when sets were defined.

4.  **Test Updates**: Updated Helm tests to verify the new queue filtering logic
    and the independent generation of worker resources.

* Chart: Support multiple queues in KEDA autoscaling query

This change updates the KEDA SQL query generation in the Helm chart to
properly handle multiple queues defined in `workers.queue`.

By using `splitList` and iterating over the queues, the generated SQL
now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`)
instead of a simple equality check. This ensures that the Horizontal
Pod Autoscaler scales correctly when workers are listening to multiple
queues.

* Chart: Update schema defaults and fix lint errors

This change updates `values.schema.json` to reflect the recent changes
in `values.yaml` for KEDA multi-queue support.

Specific changes:
- Sync default values for `workers.args` and `workers.keda.query`.
- Fix `lint-chart-schema` failure by changing `additionalProperties: true`
  to `additionalProperties: {}` in `workers.sets`.

* Chart: Move Celery worker set configuration under workers.celery

* Chart: Add significant Helm chart newsfragment for multiple Celery worker sets

* Update chart/values.yaml

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/values.yaml

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Enhance Helm chart for multiple Celery worker sets

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets.

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Fix worker set persistence.enabled value not being preserved

Helm's mustMerge and merge functions do not preserve boolean false
values because false is considered a "zero value" in Go templates.
This caused workers.celery.persistence.enabled=false to be
incorrectly overwritten by the deprecated workers.persistence.enabled
default value (true) during context merging.

The fix saves the persistence.enabled value before merge operations
and restores it afterward. Also corrects the reference path from
.Values.workers.celery.persistence.enabled to
.Values.workers.persistence.enabled since the worker configuration
is placed under .Values.workers in the new context.

---------

Co-authored-by: Glenn Huang 黃瀚陞 <glenn.hs.huang@foxconn.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
jason810496 pushed a commit to jason810496/airflow that referenced this pull request Jan 22, 2026
…ons (apache#58547)

* Chart: Enhance Celery Worker Sets support for multi-queue configurations

This commit improves the configuration and scaling logic for Celery Worker Sets,
allowing for more robust multi-queue setups.

Key changes:
1.  **Strict KEDA Queue Filtering**: Updated KEDA SQL queries to always include
    `AND queue = '...'`. Previously, the default worker's KEDA query could
    incorrectly include tasks from other queues (e.g., those assigned to specific
    worker sets), leading to incorrect scaling behavior.

2.  **Explicit Default Worker Toggle**: Introduced `workers.enableDefault` (default: `true`).
    This allows users to easily disable the default worker if they wish to rely
    solely on custom worker sets for specific queues, improving configuration clarity.

3.  **Independent Resource Generation**: Refactored HPA and KEDA templates to
    generate resources for the default worker and worker sets independently.
    This resolves issues where the default worker's autoscaling resources were
    sometimes suppressed when sets were defined.

4.  **Test Updates**: Updated Helm tests to verify the new queue filtering logic
    and the independent generation of worker resources.

* Chart: Support multiple queues in KEDA autoscaling query

This change updates the KEDA SQL query generation in the Helm chart to
properly handle multiple queues defined in `workers.queue`.

By using `splitList` and iterating over the queues, the generated SQL
now uses an `IN` clause (e.g., `queue IN ('default', 'high-cpu')`)
instead of a simple equality check. This ensures that the Horizontal
Pod Autoscaler scales correctly when workers are listening to multiple
queues.

* Chart: Update schema defaults and fix lint errors

This change updates `values.schema.json` to reflect the recent changes
in `values.yaml` for KEDA multi-queue support.

Specific changes:
- Sync default values for `workers.args` and `workers.keda.query`.
- Fix `lint-chart-schema` failure by changing `additionalProperties: true`
  to `additionalProperties: {}` in `workers.sets`.

* Chart: Move Celery worker set configuration under workers.celery

* Chart: Add significant Helm chart newsfragment for multiple Celery worker sets

* Update chart/values.yaml

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/values.yaml

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Enhance Helm chart for multiple Celery worker sets

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling flexible resource allocation and precise autoscaling configurations. It introduces new features like workers.enableDefault, multi-queue autoscaling support, and granular configuration overrides for worker sets.

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Update chart/newsfragments/58547.significant.rst

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* Fix worker set persistence.enabled value not being preserved

Helm's mustMerge and merge functions do not preserve boolean false
values because false is considered a "zero value" in Go templates.
This caused workers.celery.persistence.enabled=false to be
incorrectly overwritten by the deprecated workers.persistence.enabled
default value (true) during context merging.

The fix saves the persistence.enabled value before merge operations
and restores it afterward. Also corrects the reference path from
.Values.workers.celery.persistence.enabled to
.Values.workers.persistence.enabled since the worker configuration
is placed under .Values.workers in the new context.

---------

Co-authored-by: Glenn Huang 黃瀚陞 <glenn.hs.huang@foxconn.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for multiple Celery worker groups with queue-specific configurations in Helm chart Deploy multiple (celery) workers with the Helm Chart

6 participants