-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
When using vector to collect logs using a kubernetes_logs source downstream of eg crio, which splits loglines, setting auto_partial_merge to true results in max_line_bytes being essentially ignored.
Attempted Solutions
Consider the following scenarios:
- max_line_bytes = 1 MiB
lines split by crio into 2.5 MiB chunks
auto_partial_merge = N/A (due to Vector stopping before reaching the continuation character)
result: lines greater than 1 MiB always dropped, including all lines that were split by crio - max_line_bytes = 3 MiB
lines split by crio into 2.5 MiB chunks
auto_partial_merge = true
result: no lines ever dropped (due to max_line_bytes being applied before merging -> all partial lines are automatically below 3 MiB limit since they're split into 2.5 MiB chunks)
Proposal
It would be nice if it were possible to specify an additional configuration limit for line size to be applied after merging to protect downstream pipeline/consumers from huge lines. That would allow simultaneously benefitting from the auto_partial_merge feature without allowing arbitrarily large lines into the pipeline.
Another option would be to change the behavior of max_line_bytes when auto_partial_merge is set to true, but it might be better for backcompat reasons to avoid changing the behavior of an existing config field.
References
No response
Version
0.45.0