Skip to content

perf and bugfix (dogstatsd): replace regex parser with position-flexible implementation#52

Merged
litianningdatadog merged 1 commit intomainfrom
tianning.li/SVLS-8122
Dec 10, 2025
Merged

perf and bugfix (dogstatsd): replace regex parser with position-flexible implementation#52
litianningdatadog merged 1 commit intomainfrom
tianning.li/SVLS-8122

Conversation

@litianningdatadog
Copy link
Copy Markdown
Contributor

@litianningdatadog litianningdatadog commented Dec 8, 2025

https://datadoghq.atlassian.net/browse/SVLS-8122

What does this PR do?

Replace regex-based DogStatsD metric parsing with a loop-based approach that accepts optional fields (tags, timestamp, sample_rate, container_id) in any order, matching the Datadog Agent's behavior.

Motivation

The previous implementation assumed a fixed ordering in serialized metric data, leading to valid payloads being rejected. This update corrects that behavior and improves overall performance.

Additional Notes

Changes:

  • Remove regex dependency and METRIC_REGEX static
  • Add next_field() helper for pipe-delimited field splitting
  • Rewrite parse() to use iterative parsing with prefix matching
  • Support position-flexible optional fields (e.g., |T|# or |#|T)
  • Optimize by getting current time once instead of twice
  • Ignore unknown fields for forward compatibility

Describe how to test/QA your changes

image - New behavior: no error was found in the log (https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn-fullinstrument-bn-10bst-java11-lambda/log-events/2025$252F12$252F08$252F$255B$2524LATEST$255Dac96c0850bc04488b91473f11caa3262) and the custom metric is visible image

@litianningdatadog litianningdatadog marked this pull request as ready for review December 8, 2025 18:34
@duncanpharvey
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/dogstatsd/src/metric.rs Outdated
Comment thread crates/dogstatsd/src/metric.rs
Copy link
Copy Markdown
Collaborator

@duncanpharvey duncanpharvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I tested this in Azure Functions and confirmed it works with the Java, .NET, Python, and Node.js DogStatsD clients.

It looks like Saluki already has some dogstatsd metric parsing logic so I wonder if this is something we could use in our dogstatsd crate instead of rewriting it.

https://github.com/DataDog/saluki/blob/f47a7ef588c53aa1da35dcfd93808595ebeb1291/lib/saluki-io/src/deser/codec/dogstatsd/metric.rs#L38

@litianningdatadog
Copy link
Copy Markdown
Contributor Author

Looks good! I tested this in Azure Functions and confirmed it works with the Java, .NET, Python, and Node.js DogStatsD clients.

It looks like Saluki already has some dogstatsd metric parsing logic so I wonder if this is something we could use in our dogstatsd crate instead of rewriting it.

https://github.com/DataDog/saluki/blob/f47a7ef588c53aa1da35dcfd93808595ebeb1291/lib/saluki-io/src/deser/codec/dogstatsd/metric.rs#L38

After testing Saluki’s parsing, I think adopting it now is riskier for two reasons:
• It doesn’t provide meaningful code reduction. (see this draft change)
• It would require significant dependency changes since we switched to the crate-based approach instead of hard-coding SHA values.

Given that, I’d prefer to keep the current implementation and revisit the idea once we’ve fully migrated to crate-based dependencies.

@duncanpharvey
Copy link
Copy Markdown
Collaborator

Looks good! I tested this in Azure Functions and confirmed it works with the Java, .NET, Python, and Node.js DogStatsD clients.
It looks like Saluki already has some dogstatsd metric parsing logic so I wonder if this is something we could use in our dogstatsd crate instead of rewriting it.
https://github.com/DataDog/saluki/blob/f47a7ef588c53aa1da35dcfd93808595ebeb1291/lib/saluki-io/src/deser/codec/dogstatsd/metric.rs#L38

After testing Saluki’s parsing, I think adopting it now is riskier for two reasons: • It doesn’t provide meaningful code reduction. (see this draft change) • It would require significant dependency changes since we switched to the crate-based approach instead of hard-coding SHA values.

Given that, I’d prefer to keep the current implementation and revisit the idea once we’ve fully migrated to crate-based dependencies.

Sounds good! Thanks for exploring this option

@litianningdatadog litianningdatadog merged commit bffa9aa into main Dec 10, 2025
26 checks passed
@litianningdatadog litianningdatadog deleted the tianning.li/SVLS-8122 branch December 10, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants