-
Notifications
You must be signed in to change notification settings - Fork 2k
enhancement(kubernetes_logs source): Allow specification of a maximum line size to be applied after merging instead of just before #22582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ize and max_merged_line_size
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
… to be partial; fix test
pront
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ganelo, this mostly looks reasonable.
| total_read += used; | ||
|
|
||
| if !discarding && buf.len() > max_size { | ||
| warn!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with this code. I wonder why this warn! was moved only to be emitted later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of plumbing through the actual contents that were being dropped so they could be included in the error.
| let mut bytes_mut = BytesMut::new(); | ||
| if let Some(bucket) = self.buckets.get_mut(file) { | ||
| // don't bother continuing to process new partial events that match existing ones that are already too big | ||
| if bucket.too_big { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I wonder if we can maintain a separate data structure for these buckets so that we don't mix with them with the emitted ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally sure I follow - can you elaborate what the concern is that you're wanting to mitigate?
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
|
This is starting to look great, will do a final review shortly. |
|
Hey @pront - just checking whether there's anything further you need from me on this? |
Apologies for the long delay, I was away for 2+ weeks. My plan is to review this / give prompt feedback and include it in the next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new configuration option, max_merged_line_bytes, to the kubernetes_logs source so that users can limit the size of merged log lines even when auto_partial_merge is enabled. Key changes include updates to the CUE reference documentation, modifications to the partial events merger logic in Rust to enforce the new limit (along with corresponding tests), and adjustments to internal events and file source modules for consistent error reporting.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| website/cue/reference/components/sources/base/kubernetes_logs.cue | Added documentation for the max_merged_line_bytes field. |
| src/sources/kubernetes_logs/partial_events_merger.rs | Integrated checking for the new max_merged_line_bytes and adjusted event merging and filtering behavior. |
| src/sources/kubernetes_logs/mod.rs | Updated configuration parsing and adjusted max_line_bytes value based on the new max_merged_line_bytes setting. |
| src/internal_events/kubernetes_logs.rs | Added a new internal event for merged lines that exceed the configured limit. |
| src/internal_events/file.rs, lib/file-source/* | Made supportive changes in error emitting and tests to properly handle the updated behavior for oversized lines. |
| changelog.d/22581_max_merged_line_bytes.feature.md | Added a changelog entry describing the new feature. |
pront
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and apologies for the delays. Left a couple of nits.
… the whole buffer and then truncating, use more idiomatic Rust for handling both configured and unconfigured cases of max_merged_line_bytes
@pront - all good! Thanks for the review, pushed up requested changes. |
Head branch was pushed to by a user without write access
|
@pront - sorry, mind enabling automerge again? Had to push a fix for failing spelling check coming from master |
Sure. Spell checker failures do not block PRs. But thanks for fixing it anyway. |
Oops, good to know, will keep in mind for next time. |
Head branch was pushed to by a user without write access
|
@pront - I ended up having to push a fix for one of the nit-prompted changes |
Head branch was pushed to by a user without write access
… line size to be applied after merging instead of just before (vectordotdev#22582) * Add config for maximum allowed line size after merging * Add warns when we drop partial logs for being too big; shift some comments around * Add changelog * Format * Increment component_discarded_events_total on violation of max_line_size and max_merged_line_size * Update changelog.d/22581_max_merged_line_bytes.feature.md Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> * Don't emit expired events that are too big nor ones that don't appear to be partial; fix test * Fix another test * Update src/sources/kubernetes_logs/mod.rs Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> * Update src/sources/kubernetes_logs/mod.rs Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com> * Remove inadvertently added file * Include Value rather than Event in error struct * Rename field in bucket struct * Move max_merged_line_bytes from being a param to being a field on the state struct * Make new config field optional, defaulting to old behavior * Format * Appease check-events * docs regen * Tweak wording of doc; emit only first 1k bytes of dropped lines in error * Rename fields for clarity * Per PR feedback: copy just the initial 1000 bytes rather than cloning the whole buffer and then truncating, use more idiomatic Rust for handling both configured and unconfigured cases of max_merged_line_bytes * Allow spelling of already-merged changelog filename * Don't try to include more characters than there actually are in the slice * Don't just get enough capacity, make sure length matches too * Formatting --------- Co-authored-by: Orri Ganel <oganel@palantir.com> Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
|
Happy to see this merged! |
Summary
Add a new config field max_merged_line_bytes which allows limiting maximum line size even when auto_partial_merge is enabled (see #22581).
Change Type
Is this a breaking change?
How did you test this PR?
Added unit tests to exercise the new configuration.
Does this PR include user facing changes?
Checklist
make check-allis a good command to run locally. This check isdefined here. Some of these
checks might not be relevant to your PR. For Rust changes, at the very least you should run:
cargo fmt --allcargo clippy --workspace --all-targets -- -D warningscargo nextest run --workspace(alternatively, you can runcargo test --all)Cargo.lock), pleaserun
dd-rust-license-tool writeto regenerate the license inventory and commit the changes (if any). More details here.References