Add TTL-enabled LRU cache for StatsD metrics aggregation #51792

shubham36deshpande · 2025-06-16T14:12:52Z

…Helm chart

This PR introduces a TTL (time-to-live) mechanism in conjunction with an LRU (least-recently-used) cache for StatsD metric aggregation. The change is designed to automatically clean up stale or unused metric entries, preventing uncontrolled memory growth in long-running statsd daemons—a problem highlighted in issue #50645.

closes: #50645

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

…Helm chart

boring-cyborg · 2025-06-16T14:12:55Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

jscheffl · 2025-06-16T20:44:24Z

Some static checks fail....

though I am not an statsd expert... @AutomationDev85 Do you have an opinion to the configs?

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

chart/values.schema.json

chart/templates/statsd/statsd-deployment.yaml

…d user defined args as well

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

jscheffl

Good for my side. Thanks.

But as I am not a real expert in Helm... @jedcunningham can you make a second pass opinion on the PR?

eladkal · 2025-07-03T17:56:34Z

tests are failing

jedcunningham

We should also add a newsfragment describing the new values and explaining how to get the old behavior back.

chart/values.schema.json

chart/values.yaml

…values.schema.json

…ated schema.json properly

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

Miretpl

Just to drop the re-review request, as there were no changes related to my previous review (despite newsfragment).

@shubham36deshpande, when the changes will be introduced, please request re-review

shubham36deshpande · 2025-12-20T12:13:27Z

@Miretpl can you please review he latest changes, I have refactored the options like cache.sizem cache.type and cache.ttl as you suggested.

chart/templates/statsd/statsd-deployment.yaml

jscheffl · 2025-12-21T16:49:03Z

@shubham36deshpande Can you please also run (at least) static checks before pushing? There are not 83 commits on this PR but either it was semantically incorrect of in most cases simple tests failed in CI. Would save time for you, CI and me taking a look at this PR if it is basic checked locally before pushing over-and-over basic errors.

Miretpl

+ what Jens mentioned. I guess you did not configure the pre-commit for checking the code. You have a nice guide here https://github.com/apache/airflow/blob/main/contributing-docs/README.rst

Miretpl · 2025-12-22T17:17:02Z

chart/newsfragments/51792.significant.rst

@@ -0,0 +1,11 @@
+StatsD metrics aggregation now supports configurable TTL-enabled LRU cache to prevent memory growth in long-running daemons.


After changing in the values, you need to update the information in newsfragment file

Yes, thats for pointing it out, I have fixed the mistake.

Miretpl · 2025-12-22T17:18:01Z

chart/templates/statsd/statsd-deployment.yaml

+          {{- if .Values.statsd.cache.size }}
+          - "--statsd.cache-size={{ .Values.statsd.cache.size }}"
+          {{- end }}
+          {{- if .Values.statsd.cache.type }}
+          - "--statsd.cache-type={{ .Values.statsd.cache.type }}"
+          {{- end }}
+          {{- if .Values.statsd.cache.ttl }}
+          - "--ttl={{ .Values.statsd.cache.ttl }}"
+          {{- end }}
+          {{- if .Values.statsd.args }}
+          {{- range $arg := .Values.statsd.args }}
+          - {{ $arg | quote }}
+          {{- end }}


Maybe inside of statsd.cache.enabled flag 🤔, but I don't have a strong opinion about that

@Miretpl - statsd.enabled is outside of cache as people might want to enable statsd without cache options.

@shubham36deshpande I mentioned statsd.cache.enabled (it does not exist in the current version, but it replicates the behaviour of statsd.cache flag from the first version of the PR), not statsd.enabled. As you mentioned, people might want to have statsd enabled without cache (it is also for preserving backward compatibility). In the first version of the PR, when we had the statsd.cache flag for it (btw. it is still mentioned in the newsfragment file) it was easier to do and I think that when we have the statsd.cache section now, it would be beneficial to add a flag for it e.g. statsd.cache.enabled with default false value.

Yes, makes sense, I will implement it in the next commit.

shubham36deshpande · 2025-12-25T09:55:10Z

@jscheffl - Sorry for that, I was running breeze tests but didnt run the CI static check, thats why it failed. I have fixed the error and ran the CI and helm-tests again.
I will be careful from next time.

chart/values.yaml

chart/templates/statsd/statsd-deployment.yaml

kunaljubce · 2026-01-02T06:33:22Z

@jscheffl - Sorry for that, I was running breeze tests but didnt run the CI static check, thats why it failed. I have fixed the error and ran the CI and helm-tests again. I will be careful from next time.

Hey @shubham36deshpande, most of the static checks are configured as prek pre-commit hooks, so just have the pre-commit hooks installed and most of your linting/static check issues would get autofixed when you're pushing your changes.

helm-tests/tests/helm_tests/airflow_core/test_api_server.py

shubham36deshpande · 2026-01-10T05:18:55Z

@jscheffl , the test for airflow_core are passing in my local but it is failing here, what might be the reason? Is there anything I am doing incorrectly in PR?

jscheffl · 2026-01-10T14:55:29Z

@jscheffl , the test for airflow_core are passing in my local but it is failing here, what might be the reason? Is there anything I am doing incorrectly in PR?

Also not sure but I can say that the same error like in the CI happens on my machine. Maybe can you breeze cleanup and rebuild all containers?

jscheffl · 2026-01-14T12:52:35Z

chart/values.yaml

+    # from the exported /metrics output.
+    # Format: Go duration string (e.g. "30s", "5m", "1h")
+    # Default: "0s" (disabled, never expires)
+    ttl: "0s"


I tried to apply this parameter but then statsd fails to start with:
statsd_exporter: error: unknown long flag '--ttl', try --help

--help gives me:

usage: statsd_exporter [<flags>] Flags: -h, --[no-]help Show context-sensitive help (also try --help-long and --help-man). --web.listen-address=":9102" The address on which to expose the web interface and generated Prometheus metrics. --[no-]web.enable-lifecycle Enable shutdown and reload via HTTP request. --web.telemetry-path="/metrics" Path under which to expose metrics. --statsd.listen-udp=":9125" The UDP address on which to receive statsd metric lines. "" disables it. --statsd.listen-tcp=":9125" The TCP address on which to receive statsd metric lines. "" disables it. --statsd.listen-unixgram="" The Unixgram socket path to receive statsd metric lines in datagram. "" disables it. --statsd.unixsocket-mode="755" The permission mode of the unix socket. --statsd.mapping-config=STATSD.MAPPING-CONFIG Metric mapping configuration file name. --statsd.read-buffer=STATSD.READ-BUFFER Size (in bytes) of the operating system's transmit read buffer associated with the UDP or Unixgram connection. Please make sure the kernel parameters net.core.rmem_max is set to a value greater than the value specified. --statsd.cache-size=1000 Maximum size of your metric mapping cache. Relies on least recently used replacement policy if max size is reached. --statsd.cache-type=lru Metric mapping cache type. Valid options are "lru" and "random" --statsd.event-queue-size=10000 Size of internal queue for processing events. --statsd.event-flush-threshold=1000 Number of events to hold in queue before flushing. --statsd.event-flush-interval=200ms Maximum time between event queue flushes. --debug.dump-fsm="" The path to dump internal FSM generated for glob matching as Dot file. --[no-]check-config Check configuration and exit. --[no-]statsd.parse-dogstatsd-tags Parse DogStatsd style tags. Enabled by default. --[no-]statsd.parse-influxdb-tags Parse InfluxDB style tags. Enabled by default. --[no-]statsd.parse-librato-tags Parse Librato style tags. Enabled by default. --[no-]statsd.parse-signalfx-tags Parse SignalFX style tags. Enabled by default. --statsd.relay.address=STATSD.RELAY.ADDRESS The UDP relay target address (host:port) --statsd.relay.packet-length=1400 Maximum relay output packet length to avoid fragmentation --statsd.udp-packet-queue-size=10000 Size of internal queue for processing UDP packets. --log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error] --log.format=logfmt Output format of log messages. One of: [logfmt, json] --[no-]version Show application version.

I tested with statsd_expoerter v0.28.0 - is this only available of a fork or a manually built image?

jscheffl · 2026-01-23T22:22:04Z

Indirectly via PR #60933 the fix is landed on main, closing this PR - thanks for the (indirect) contribution! With this also your efforts landed on main!

Making statsd-exporter TTL & cache-size/type configurable in Airflow …

3bb29a0

…Helm chart

shubham36deshpande requested review from dstandish, hussein-awala and jedcunningham as code owners June 16, 2025 14:12

boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Jun 16, 2025

shubham36deshpande and others added 6 commits June 28, 2025 12:01

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

cb074aa

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

3a0f48a

Added statsd configs in values.schema.json

a56631e

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

fff9186

Fixing test errors

b9cb2b2

Merge branch 'issue-50645-statsd-ttl-lru-cache' of https://github.com…

788353f

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

jscheffl reviewed Jun 28, 2025

View reviewed changes

chart/values.schema.json Outdated Show resolved Hide resolved

chart/values.schema.json Outdated Show resolved Hide resolved

chart/values.schema.json Outdated Show resolved Hide resolved

shubham36deshpande and others added 2 commits June 30, 2025 20:40

updated the default values for statsd in schema.json

b40e914

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

eb098bf

jscheffl reviewed Jun 30, 2025

View reviewed changes

chart/templates/statsd/statsd-deployment.yaml Outdated Show resolved Hide resolved

shubham36deshpande and others added 3 commits July 1, 2025 14:15

Added default argumentd as cache sieze cache type and ttl, accomodate…

9acdcb9

…d user defined args as well

Merge branch 'issue-50645-statsd-ttl-lru-cache' of https://github.com…

9328a7a

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

39376ea

jscheffl approved these changes Jul 1, 2025

View reviewed changes

jedcunningham reviewed Jul 3, 2025

View reviewed changes

chart/values.schema.json Outdated Show resolved Hide resolved

chart/values.yaml Outdated Show resolved Hide resolved

shubham36deshpande and others added 7 commits July 4, 2025 22:49

Removed EOF error from statsd deployment and improved description in …

90bbd98

…values.schema.json

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

78e69d8

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

aa79fdb

Added default values in deployment.yaml, removed spelling errors form…

67d056b

…ated schema.json properly

Merge branch 'issue-50645-statsd-ttl-lru-cache' of https://github.com…

bdbfa76

…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

9f0c2d2

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

8fe76fb

jscheffl requested a review from Miretpl December 10, 2025 21:36

Miretpl suggested changes Dec 19, 2025

View reviewed changes

shubham36deshpande and others added 3 commits December 20, 2025 12:10

updated values schema

3111f5d

restructured cache options in statsd

8b303bd

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

25fa4c3

shubham36deshpande requested review from Miretpl and wind0r December 20, 2025 12:12

jscheffl reviewed Dec 20, 2025

View reviewed changes

chart/templates/statsd/statsd-deployment.yaml Outdated Show resolved Hide resolved

shubham36deshpande and others added 2 commits December 21, 2025 10:49

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

5b63792

fixing indentation in deployment.yaml

7777484

Miretpl suggested changes Dec 22, 2025

View reviewed changes

shubham36deshpande and others added 2 commits December 25, 2025 09:51

fix: updated newsfragment file and fixed CI static check

db09631

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

13f3603

jscheffl reviewed Dec 28, 2025

View reviewed changes

chart/values.yaml Outdated Show resolved Hide resolved

chart/templates/statsd/statsd-deployment.yaml Outdated Show resolved Hide resolved

chart/templates/statsd/statsd-deployment.yaml Outdated Show resolved Hide resolved

shubham36deshpande and others added 2 commits January 3, 2026 06:58

updated the indentations in values and statsd deployment

16afd21

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

f81d0f0

jscheffl reviewed Jan 3, 2026

View reviewed changes

helm-tests/tests/helm_tests/airflow_core/test_api_server.py Show resolved Hide resolved

shubham36deshpande and others added 3 commits January 3, 2026 16:20

reverted the apiserver changes

4ef7bab

changed the formatting

cc067ad

Merge branch 'main' into issue-50645-statsd-ttl-lru-cache

5b8ad58

jscheffl reviewed Jan 14, 2026

View reviewed changes

AutomationDev85 mentioned this pull request Jan 22, 2026

Support PR: Add TTL-enabled LRU cache for StatsD metrics aggregation #60933

Merged

jscheffl closed this Jan 23, 2026

		@@ -0,0 +1,11 @@
		StatsD metrics aggregation now supports configurable TTL-enabled LRU cache to prevent memory growth in long-running daemons.

Add TTL-enabled LRU cache for StatsD metrics aggregation #51792

Add TTL-enabled LRU cache for StatsD metrics aggregation #51792

Uh oh!

Conversation

shubham36deshpande commented Jun 16, 2025 • edited by eladkal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boring-cyborg bot commented Jun 16, 2025

Uh oh!

jscheffl commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

eladkal commented Jul 3, 2025

Uh oh!

jedcunningham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Miretpl left a comment

Choose a reason for hiding this comment

Uh oh!

shubham36deshpande commented Dec 20, 2025

Uh oh!

Uh oh!

jscheffl commented Dec 21, 2025

Uh oh!

Miretpl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miretpl Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

shubham36deshpande Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Miretpl Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

shubham36deshpande Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Miretpl Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

shubham36deshpande Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

shubham36deshpande commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kunaljubce commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shubham36deshpande commented Jan 10, 2026

Uh oh!

jscheffl commented Jan 10, 2026

Uh oh!

jscheffl Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

jscheffl commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

shubham36deshpande commented Jun 16, 2025 •

edited by eladkal

Loading

Miretpl left a comment •

edited

Loading

kunaljubce commented Jan 2, 2026 •

edited

Loading