-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add TTL-enabled LRU cache for StatsD metrics aggregation #51792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TTL-enabled LRU cache for StatsD metrics aggregation #51792
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
|
Some static checks fail.... though I am not an statsd expert... @AutomationDev85 Do you have an opinion to the configs? |
…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache
…d user defined args as well
…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for my side. Thanks.
But as I am not a real expert in Helm... @jedcunningham can you make a second pass opinion on the PR?
|
tests are failing |
jedcunningham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add a newsfragment describing the new values and explaining how to get the old behavior back.
…values.schema.json
…ated schema.json properly
…/shubham36deshpande/airflow into issue-50645-statsd-ttl-lru-cache
Miretpl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to drop the re-review request, as there were no changes related to my previous review (despite newsfragment).
@shubham36deshpande, when the changes will be introduced, please request re-review
|
@Miretpl can you please review he latest changes, I have refactored the options like |
|
@shubham36deshpande Can you please also run (at least) static checks before pushing? There are not 83 commits on this PR but either it was semantically incorrect of in most cases simple tests failed in CI. Would save time for you, CI and me taking a look at this PR if it is basic checked locally before pushing over-and-over basic errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ what Jens mentioned. I guess you did not configure the pre-commit for checking the code. You have a nice guide here https://github.com/apache/airflow/blob/main/contributing-docs/README.rst
| @@ -0,0 +1,11 @@ | |||
| StatsD metrics aggregation now supports configurable TTL-enabled LRU cache to prevent memory growth in long-running daemons. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing in the values, you need to update the information in newsfragment file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thats for pointing it out, I have fixed the mistake.
| {{- if .Values.statsd.cache.size }} | ||
| - "--statsd.cache-size={{ .Values.statsd.cache.size }}" | ||
| {{- end }} | ||
| {{- if .Values.statsd.cache.type }} | ||
| - "--statsd.cache-type={{ .Values.statsd.cache.type }}" | ||
| {{- end }} | ||
| {{- if .Values.statsd.cache.ttl }} | ||
| - "--ttl={{ .Values.statsd.cache.ttl }}" | ||
| {{- end }} | ||
| {{- if .Values.statsd.args }} | ||
| {{- range $arg := .Values.statsd.args }} | ||
| - {{ $arg | quote }} | ||
| {{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe inside of statsd.cache.enabled flag 🤔, but I don't have a strong opinion about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Miretpl - statsd.enabled is outside of cache as people might want to enable statsd without cache options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham36deshpande I mentioned statsd.cache.enabled (it does not exist in the current version, but it replicates the behaviour of statsd.cache flag from the first version of the PR), not statsd.enabled. As you mentioned, people might want to have statsd enabled without cache (it is also for preserving backward compatibility). In the first version of the PR, when we had the statsd.cache flag for it (btw. it is still mentioned in the newsfragment file) it was easier to do and I think that when we have the statsd.cache section now, it would be beneficial to add a flag for it e.g. statsd.cache.enabled with default false value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense, I will implement it in the next commit.
|
@jscheffl - Sorry for that, I was running |
Hey @shubham36deshpande, most of the static checks are configured as prek pre-commit hooks, so just have the pre-commit hooks installed and most of your linting/static check issues would get autofixed when you're pushing your changes. |
@jscheffl , the test for airflow_core are passing in my local but it is failing here, what might be the reason?
Is there anything I am doing incorrectly in PR?
|
Also not sure but I can say that the same error like in the CI happens on my machine. Maybe can you |
| # from the exported /metrics output. | ||
| # Format: Go duration string (e.g. "30s", "5m", "1h") | ||
| # Default: "0s" (disabled, never expires) | ||
| ttl: "0s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to apply this parameter but then statsd fails to start with:
statsd_exporter: error: unknown long flag '--ttl', try --help
--help gives me:
usage: statsd_exporter [<flags>]
Flags:
-h, --[no-]help Show context-sensitive help (also try
--help-long and --help-man).
--web.listen-address=":9102"
The address on which to expose the web interface
and generated Prometheus metrics.
--[no-]web.enable-lifecycle
Enable shutdown and reload via HTTP request.
--web.telemetry-path="/metrics"
Path under which to expose metrics.
--statsd.listen-udp=":9125"
The UDP address on which to receive statsd
metric lines. "" disables it.
--statsd.listen-tcp=":9125"
The TCP address on which to receive statsd
metric lines. "" disables it.
--statsd.listen-unixgram=""
The Unixgram socket path to receive statsd
metric lines in datagram. "" disables it.
--statsd.unixsocket-mode="755"
The permission mode of the unix socket.
--statsd.mapping-config=STATSD.MAPPING-CONFIG
Metric mapping configuration file name.
--statsd.read-buffer=STATSD.READ-BUFFER
Size (in bytes) of the operating system's
transmit read buffer associated with the UDP or
Unixgram connection. Please make sure the kernel
parameters net.core.rmem_max is set to a value
greater than the value specified.
--statsd.cache-size=1000 Maximum size of your metric mapping cache.
Relies on least recently used replacement policy
if max size is reached.
--statsd.cache-type=lru Metric mapping cache type. Valid options are
"lru" and "random"
--statsd.event-queue-size=10000
Size of internal queue for processing events.
--statsd.event-flush-threshold=1000
Number of events to hold in queue before
flushing.
--statsd.event-flush-interval=200ms
Maximum time between event queue flushes.
--debug.dump-fsm="" The path to dump internal FSM generated for glob
matching as Dot file.
--[no-]check-config Check configuration and exit.
--[no-]statsd.parse-dogstatsd-tags
Parse DogStatsd style tags. Enabled by default.
--[no-]statsd.parse-influxdb-tags
Parse InfluxDB style tags. Enabled by default.
--[no-]statsd.parse-librato-tags
Parse Librato style tags. Enabled by default.
--[no-]statsd.parse-signalfx-tags
Parse SignalFX style tags. Enabled by default.
--statsd.relay.address=STATSD.RELAY.ADDRESS
The UDP relay target address (host:port)
--statsd.relay.packet-length=1400
Maximum relay output packet length to avoid
fragmentation
--statsd.udp-packet-queue-size=10000
Size of internal queue for processing UDP
packets.
--log.level=info Only log messages with the given severity or
above. One of: [debug, info, warn, error]
--log.format=logfmt Output format of log messages. One of: [logfmt,
json]
--[no-]version Show application version.
I tested with statsd_expoerter v0.28.0 - is this only available of a fork or a manually built image?
|
Indirectly via PR #60933 the fix is landed on main, closing this PR - thanks for the (indirect) contribution! With this also your efforts landed on main! |

…Helm chart
This PR introduces a TTL (time-to-live) mechanism in conjunction with an LRU (least-recently-used) cache for StatsD metric aggregation. The change is designed to automatically clean up stale or unused metric entries, preventing uncontrolled memory growth in long-running statsd daemons—a problem highlighted in issue #50645.
closes: #50645
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.