-
Notifications
You must be signed in to change notification settings - Fork 2k
fix(loki sinks): Fix loki event timestamp out of range panic #20780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/sinks/loki/tests.rs
Outdated
| let mut sink = LokiSink::new(config, client).unwrap(); | ||
|
|
||
| let mut e1 = LogEvent::from("hello world"); | ||
| use vector_lib::config::log_schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this use statement to the beginning of this tests module?
jszwedko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @suikammd ! I left a question below about the behavior.
|
|
||
| impl InternalEvent for LokiEventTimestampOutOfRangeError { | ||
| fn emit(self) { | ||
| error!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this a warn! since it doesn't actually inhibit processing? Also, what happens when we send an event to Loki without a timestamp? Does it just use the current timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Loki api, timestamp is must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I missed that you are returning early. In that case we'd want to emit the metrics for errors and discarded events. See an example here:
vector/src/internal_events/splunk_hec.rs
Lines 55 to 77 in 8325300
| impl<'a> InternalEvent for SplunkInvalidMetricReceivedError<'a> { | |
| fn emit(self) { | |
| error!( | |
| message = "Invalid metric received.", | |
| error = ?self.error, | |
| error_type = error_type::INVALID_METRIC, | |
| stage = error_stage::PROCESSING, | |
| value = ?self.value, | |
| kind = ?self.kind, | |
| internal_log_rate_limit = true, | |
| ); | |
| counter!( | |
| "component_errors_total", 1, | |
| "error_type" => error_type::INVALID_METRIC, | |
| "stage" => error_stage::PROCESSING, | |
| ); | |
| counter!( | |
| "component_discarded_events_total", 1, | |
| "error_type" => error_type::INVALID_METRIC, | |
| "stage" => error_stage::PROCESSING, | |
| ); | |
| } | |
| } |
src/sinks/loki/tests.rs
Outdated
|
|
||
| #[tokio::test] | ||
| async fn timestamp_out_of_range() { | ||
| use vector_lib::config::log_schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the beggining of this module, about line 10. It is more idiomatic to have all the use statements at the begginning of the module, so we don't have to dive into each scope and find those statements.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it, check the new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thats it! Thanks :)
| Some(timestamp) => timestamp, | ||
| None => { | ||
| emit!(LokiEventTimestampOutOfRangeError); | ||
| return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to use the finalizers here and explicitly nack the event via update_status like finalizers.update_status(EventStatus::Errored). Otherwise Vector will consider them to have been successfully processed.
src/sinks/loki/sink.rs
Outdated
| Some(Value::Timestamp(ts)) => match ts.timestamp_nanos_opt() { | ||
| Some(timestamp) => timestamp, | ||
| None => { | ||
| let finalizers = event.take_finalizers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already have the finalizers separated above on line 250.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I missed this line
jszwedko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @suikammd !
|
@jszwedko It seems like I don't have the permissions to add the |
|
Thanks for the bump @suikammd . This change should actually have a changelog entry. Do you mind adding one? See https://github.com/vectordotdev/vector/blob/master/changelog.d/README.md for details. |
Head branch was pushed to by a user without write access
I add a changelog, but I accidentally rebased the master branch :(, sorry for that. |
jszwedko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still need to fill in the changelog fragment :) See https://github.com/vectordotdev/vector/blob/master/changelog.d/README.md for details about what the content should look like.
| @@ -0,0 +1,3 @@ | |||
| Loki sink drop events with timestamp can not parsed by chrono. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like this?
| Loki sink drop events with timestamp can not parsed by chrono. | |
| Loki sink now drops events with non-parseable timestamps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not an english native speaker, maybe @jszwedko can help us here😁. I think the spelling is correct
| @@ -0,0 +1,3 @@ | |||
| Loki sink now drops events with non-processable timestamps. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Loki sink now drops events with non-processable timestamps. | |
| Loki sink now drops events with non-parsable timestamps. |
I think parsable is actually the correct spelling, but what you have here seems fine to me too. Thanks for adding the changelog!
Head branch was pushed to by a user without write access
Head branch was pushed to by a user without write access
Regression Detector ResultsRun ID: f0915b07-01f9-49db-8378-34607e979f24 Metrics dashboard Baseline: 5a10aa2 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | links |
|---|---|---|---|---|---|
| ❌ | file_to_blackhole | egress throughput | -9.48 | [-16.31, -2.64] |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | links |
|---|---|---|---|---|---|
| ➖ | http_elasticsearch | ingress throughput | +2.83 | [+2.60, +3.07] | |
| ➖ | datadog_agent_remap_datadog_logs | ingress throughput | +2.75 | [+2.55, +2.94] | |
| ➖ | http_to_http_acks | ingress throughput | +1.38 | [+0.04, +2.72] | |
| ➖ | fluent_elasticsearch | ingress throughput | +1.12 | [+0.62, +1.62] | |
| ➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +0.70 | [+0.59, +0.81] | |
| ➖ | syslog_humio_logs | ingress throughput | +0.69 | [+0.56, +0.83] | |
| ➖ | otlp_http_to_blackhole | ingress throughput | +0.69 | [+0.51, +0.87] | |
| ➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.44 | [+0.31, +0.56] | |
| ➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +0.36 | [+0.16, +0.56] | |
| ➖ | datadog_agent_remap_blackhole | ingress throughput | +0.26 | [+0.16, +0.37] | |
| ➖ | socket_to_socket_blackhole | ingress throughput | +0.15 | [+0.08, +0.21] | |
| ➖ | http_text_to_http_json | ingress throughput | +0.12 | [-0.00, +0.25] | |
| ➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.02 | [-0.07, +0.10] | |
| ➖ | http_to_s3 | ingress throughput | +0.01 | [-0.25, +0.28] | |
| ➖ | http_to_http_noack | ingress throughput | +0.01 | [-0.12, +0.14] | |
| ➖ | splunk_hec_route_s3 | ingress throughput | +0.01 | [-0.32, +0.34] | |
| ➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.13, +0.12] | |
| ➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.01 | [-0.10, +0.09] | |
| ➖ | http_to_http_json | ingress throughput | -0.14 | [-0.22, -0.05] | |
| ➖ | syslog_loki | ingress throughput | -0.42 | [-0.51, -0.33] | |
| ➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.51 | [-0.65, -0.37] | |
| ➖ | otlp_grpc_to_blackhole | ingress throughput | -0.93 | [-1.06, -0.81] | |
| ➖ | syslog_splunk_hec_logs | ingress throughput | -1.02 | [-1.12, -0.92] | |
| ➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.63 | [-1.72, -1.53] | |
| ➖ | syslog_log2metric_humio_metrics | ingress throughput | -2.80 | [-2.95, -2.65] | |
| ❌ | file_to_blackhole | egress throughput | -9.48 | [-16.31, -2.64] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
…otdev#20780) * fix(loki sinks): Fix loki event timestamp out of range panic * change LokiEventTimestampOutOfRangeError error to warn * move use to begin of file * update skipped events finalizer to errored * fix: remove duplicate take finalizers * add changelog * fix changelog name * fix spelling issue * modify changelog contents * modify changelog contents * fix by check-events script error * fix word * fix word * fix word * fix cargo fmt
…otdev#20780) * fix(loki sinks): Fix loki event timestamp out of range panic * change LokiEventTimestampOutOfRangeError error to warn * move use to begin of file * update skipped events finalizer to errored * fix: remove duplicate take finalizers * add changelog * fix changelog name * fix spelling issue * modify changelog contents * modify changelog contents * fix by check-events script error * fix word * fix word * fix word * fix cargo fmt
…otdev#20780) * fix(loki sinks): Fix loki event timestamp out of range panic * change LokiEventTimestampOutOfRangeError error to warn * move use to begin of file * update skipped events finalizer to errored * fix: remove duplicate take finalizers * add changelog * fix changelog name * fix spelling issue * modify changelog contents * modify changelog contents * fix by check-events script error * fix word * fix word * fix word * fix cargo fmt
…otdev#20780) * fix(loki sinks): Fix loki event timestamp out of range panic * change LokiEventTimestampOutOfRangeError error to warn * move use to begin of file * update skipped events finalizer to errored * fix: remove duplicate take finalizers * add changelog * fix changelog name * fix spelling issue * modify changelog contents * modify changelog contents * fix by check-events script error * fix word * fix word * fix word * fix cargo fmt


Fix #20763