Skip to content

Conversation

@ganelo
Copy link
Contributor

@ganelo ganelo commented Mar 3, 2025

Summary

Add a new config field max_merged_line_bytes which allows limiting maximum line size even when auto_partial_merge is enabled (see #22581).

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Added unit tests to exercise the new configuration.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
    • make check-all is a good command to run locally. This check is
      defined here. Some of these
      checks might not be relevant to your PR. For Rust changes, at the very least you should run:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@ganelo
Copy link
Contributor Author

ganelo commented Mar 5, 2025

Tagging @pront / @jszwedko explicitly since it doesn't look like anyone was auto-added as a reviewer.

@pront pront self-assigned this Mar 7, 2025
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
@ganelo ganelo requested a review from pront March 14, 2025 14:18
@pront pront requested a review from Copilot March 27, 2025 20:00

This comment was marked as outdated.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ganelo, this mostly looks reasonable.

total_read += used;

if !discarding && buf.len() > max_size {
warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with this code. I wonder why this warn! was moved only to be emitted later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of plumbing through the actual contents that were being dropped so they could be included in the error.

let mut bytes_mut = BytesMut::new();
if let Some(bucket) = self.buckets.get_mut(file) {
// don't bother continuing to process new partial events that match existing ones that are already too big
if bucket.too_big {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wonder if we can maintain a separate data structure for these buckets so that we don't mix with them with the emitted ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure I follow - can you elaborate what the concern is that you're wanting to mitigate?

@pront
Copy link
Member

pront commented Apr 8, 2025

This is starting to look great, will do a final review shortly.

@pront pront self-requested a review April 8, 2025 15:06
@ganelo
Copy link
Contributor Author

ganelo commented May 27, 2025

Hey @pront - just checking whether there's anything further you need from me on this?

@pront
Copy link
Member

pront commented Jun 10, 2025

Hey @pront - just checking whether there's anything further you need from me on this?

Apologies for the long delay, I was away for 2+ weeks. My plan is to review this / give prompt feedback and include it in the next release.

@pront pront requested a review from Copilot June 10, 2025 20:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new configuration option, max_merged_line_bytes, to the kubernetes_logs source so that users can limit the size of merged log lines even when auto_partial_merge is enabled. Key changes include updates to the CUE reference documentation, modifications to the partial events merger logic in Rust to enforce the new limit (along with corresponding tests), and adjustments to internal events and file source modules for consistent error reporting.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
website/cue/reference/components/sources/base/kubernetes_logs.cue Added documentation for the max_merged_line_bytes field.
src/sources/kubernetes_logs/partial_events_merger.rs Integrated checking for the new max_merged_line_bytes and adjusted event merging and filtering behavior.
src/sources/kubernetes_logs/mod.rs Updated configuration parsing and adjusted max_line_bytes value based on the new max_merged_line_bytes setting.
src/internal_events/kubernetes_logs.rs Added a new internal event for merged lines that exceed the configured limit.
src/internal_events/file.rs, lib/file-source/* Made supportive changes in error emitting and tests to properly handle the updated behavior for oversized lines.
changelog.d/22581_max_merged_line_bytes.feature.md Added a changelog entry describing the new feature.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and apologies for the delays. Left a couple of nits.

… the whole buffer and then truncating, use more idiomatic Rust for handling both configured and unconfigured cases of max_merged_line_bytes
@ganelo
Copy link
Contributor Author

ganelo commented Jun 11, 2025

Thanks and apologies for the delays. Left a couple of nits.

@pront - all good! Thanks for the review, pushed up requested changes.

@pront pront enabled auto-merge June 11, 2025 19:31
auto-merge was automatically disabled June 11, 2025 19:41

Head branch was pushed to by a user without write access

@ganelo
Copy link
Contributor Author

ganelo commented Jun 11, 2025

@pront - sorry, mind enabling automerge again? Had to push a fix for failing spelling check coming from master

@pront pront enabled auto-merge June 11, 2025 19:55
@pront
Copy link
Member

pront commented Jun 11, 2025

@pront - sorry, mind enabling automerge again? Had to push a fix for failing spelling check coming from master

Sure. Spell checker failures do not block PRs. But thanks for fixing it anyway.

@ganelo
Copy link
Contributor Author

ganelo commented Jun 11, 2025

@pront - sorry, mind enabling automerge again? Had to push a fix for failing spelling check coming from master

Sure. Spell checker failures do not block PRs. But thanks for fixing it anyway.

Oops, good to know, will keep in mind for next time.

auto-merge was automatically disabled June 11, 2025 20:22

Head branch was pushed to by a user without write access

@ganelo
Copy link
Contributor Author

ganelo commented Jun 12, 2025

@pront - I ended up having to push a fix for one of the nit-prompted changes

@pront pront enabled auto-merge June 12, 2025 17:57
auto-merge was automatically disabled June 12, 2025 20:13

Head branch was pushed to by a user without write access

@pront pront enabled auto-merge June 12, 2025 20:15
@pront pront added this pull request to the merge queue Jun 12, 2025
Merged via the queue into vectordotdev:master with commit f31839b Jun 12, 2025
43 checks passed
aramperes pushed a commit to aramperes/vector that referenced this pull request Jun 12, 2025
… line size to be applied after merging instead of just before (vectordotdev#22582)

* Add config for maximum allowed line size after merging

* Add warns when we drop partial logs for being too big; shift some comments around

* Add changelog

* Format

* Increment component_discarded_events_total on violation of max_line_size and max_merged_line_size

* Update changelog.d/22581_max_merged_line_bytes.feature.md

Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>

* Don't emit expired events that are too big nor ones that don't appear to be partial; fix test

* Fix another test

* Update src/sources/kubernetes_logs/mod.rs

Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>

* Update src/sources/kubernetes_logs/mod.rs

Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>

* Remove inadvertently added file

* Include Value rather than Event in error struct

* Rename field in bucket struct

* Move max_merged_line_bytes from being a param to being a field on the state struct

* Make new config field optional, defaulting to old behavior

* Format

* Appease check-events

* docs regen

* Tweak wording of doc; emit only first 1k bytes of dropped lines in error

* Rename fields for clarity

* Per PR feedback: copy just the initial 1000 bytes rather than cloning the whole buffer and then truncating, use more idiomatic Rust for handling both configured and unconfigured cases of max_merged_line_bytes

* Allow spelling of already-merged changelog filename

* Don't try to include more characters than there actually are in the slice

* Don't just get enough capacity, make sure length matches too

* Formatting

---------

Co-authored-by: Orri Ganel <oganel@palantir.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
@pront
Copy link
Member

pront commented Jun 13, 2025

Happy to see this merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kubernetes_logs source permits arbitrarily large lines due to interaction of auto_partial_merge and max_line_bytes

4 participants