Skip to content

Conversation

@ktff
Copy link
Contributor

@ktff ktff commented Oct 19, 2022

Ref: #140

Retries whole payload on partial bulk failure if partial failure contains retriable status. At the moment those are:

  • 429 : too many requests
  • 500 - 599 : any server error

Adds option request_retry_partial to enable this behavior. It's disabled by default.

Open questions

  • Better name/placement for option request_retry_partial? Adding it to request.* option set would also add it to all of the other http based sinks. (EDIT: Let's keep it at current placement for now.)

cc. @jszwedko , @sim0nx

ktff added 3 commits October 19, 2022 23:48
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@netlify
Copy link

netlify bot commented Oct 19, 2022

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit e783a70
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/636135fbb0088a0008b82ea9
😎 Deploy Preview https://deploy-preview-14891--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks labels Oct 19, 2022
@ktff ktff changed the title enhancement(elasticsearch sink): Retry on partial bulk failure enhancement(elasticsearch sink): Retry whole payload on partial bulk failure Oct 19, 2022
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@jszwedko jszwedko requested a review from a team October 26, 2022 16:49
Copy link
Contributor

@neuronull neuronull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Looking good just dropped some mostly minor initial observations:

Co-authored-by: neuronull <kyle.criddle@datadoghq.com>
@netlify
Copy link

netlify bot commented Oct 28, 2022

Deploy Preview for vrl-playground canceled.

Name Link
🔨 Latest commit e783a70
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/636135fb987e92000839d9e2

ktff and others added 3 commits October 28, 2022 15:25
Co-authored-by: neuronull <kyle.criddle@datadoghq.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@neuronull
Copy link
Contributor

I just noticed this question:

Better name/placement for option request_retry_partial? Adding it to request.* option set would also add it to all of the other http based sinks.

I think it is ok where it is? Tagging @spencergilbert and @jszwedko for thoughts on that.

@github-actions
Copy link

Soak Test Results

Baseline: ac69a47
Comparison: 2ecda4b
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

Changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

experiment Δ mean Δ mean % confidence
file_to_blackhole -35.12MiB -36.83 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_to_http_acks 96.85KiB 0.55 31.72% 17.06MiB 8.14MiB 170.23KiB 0 0.477399 17.15MiB 7.9MiB 164.93KiB 0 0.46075 True True
http_pipelines_blackhole 7.3KiB 0.46 99.98% 1.56MiB 34.12KiB 714.29B 0 0.0213175 1.57MiB 88.53KiB 1.81KiB 0 0.0550703 False False
syslog_regex_logs2metric_ddmetrics 33.39KiB 0.42 81.55% 7.68MiB 883.33KiB 17.99KiB 0 0.112334 7.71MiB 863.49KiB 17.58KiB 0 0.109346 False False
syslog_log2metric_splunk_hec_metrics 45.1KiB 0.27 99.22% 16.22MiB 547.4KiB 11.15KiB 0 0.0329499 16.26MiB 627.03KiB 12.77KiB 0 0.0376406 False False
http_pipelines_blackhole_acks 2.45KiB 0.2 59.58% 1.21MiB 112.75KiB 2.29KiB 0 0.0912978 1.21MiB 89.98KiB 1.83KiB 0 0.0727204 False False
splunk_hec_to_splunk_hec_logs_noack 22.97KiB 0.09 93.38% 23.82MiB 510.14KiB 10.41KiB 0 0.0209126 23.84MiB 339.46KiB 6.93KiB 0 0.0139025 False False
splunk_hec_indexer_ack_blackhole 22.14KiB 0.09 61.72% 23.75MiB 918.95KiB 18.69KiB 0 0.0377822 23.77MiB 842.31KiB 17.15KiB 0 0.0345997 False False
http_pipelines_no_grok_blackhole 4.4KiB 0.04 16.03% 10.41MiB 309.88KiB 6.33KiB 0 0.0290544 10.42MiB 1022.19KiB 20.79KiB 0 0.0958009 False False
enterprise_http_to_http -1.52KiB -0.01 16.44% 23.85MiB 252.32KiB 5.15KiB 0 0.0103307 23.85MiB 253.41KiB 5.18KiB 0 0.0103762 False False
syslog_humio_logs -9.73KiB -0.06 99.90% 16.17MiB 102.98KiB 2.1KiB 0 0.00621805 16.16MiB 102.34KiB 2.1KiB 0 0.00618258 False False
splunk_hec_to_splunk_hec_logs_acks -15.62KiB -0.06 46.94% 23.76MiB 838.06KiB 17.05KiB 0 0.0344423 23.74MiB 893.06KiB 18.16KiB 0 0.0367264 False False
http_to_http_json -25.51KiB -0.1 95.00% 23.84MiB 378.82KiB 7.74KiB 0 0.0155127 23.82MiB 512.17KiB 10.47KiB 0 0.0209951 False False
http_to_http_noack -67.34KiB -0.28 99.59% 23.83MiB 519.87KiB 10.63KiB 0 0.0213016 23.76MiB 1.0MiB 20.87KiB 0 0.0421037 False False
socket_to_socket_blackhole -74.92KiB -0.31 100.00% 23.59MiB 192.91KiB 3.94KiB 0 0.00798604 23.51MiB 148.09KiB 3.02KiB 0 0.00614936 False False
syslog_loki -72.83KiB -0.46 99.99% 15.39MiB 382.63KiB 7.84KiB 0 0.0242761 15.32MiB 796.99KiB 16.2KiB 0 0.0508006 False False
datadog_agent_remap_blackhole -410.99KiB -0.71 99.98% 56.24MiB 4.04MiB 84.25KiB 0 0.0718686 55.84MiB 3.53MiB 73.55KiB 0 0.0631518 False False
datadog_agent_remap_blackhole_acks -457.44KiB -0.77 100.00% 57.99MiB 4.08MiB 85.05KiB 0 0.0704107 57.55MiB 2.68MiB 56.03KiB 0 0.046511 False False
datadog_agent_remap_datadog_logs_acks -515.55KiB -0.84 99.99% 60.26MiB 3.98MiB 83.33KiB 0 0.0660866 59.76MiB 4.85MiB 100.93KiB 0 0.0811254 False False
fluent_elasticsearch -785.41KiB -0.97 100.00% 79.47MiB 53.34KiB 1.07KiB 0 0.000655266 78.71MiB 6.7MiB 137.4KiB 0 0.0851717 False False
splunk_hec_route_s3 -219.56KiB -1 99.97% 21.54MiB 2.12MiB 44.1KiB 0 0.0983001 21.33MiB 2.0MiB 41.76KiB 0 0.0935417 False False
syslog_log2metric_humio_metrics -116.31KiB -1.17 100.00% 9.68MiB 237.78KiB 4.85KiB 0 0.0239761 9.57MiB 385.25KiB 7.85KiB 0 0.0393074 False False
syslog_splunk_hec_logs -326.49KiB -2.08 100.00% 15.32MiB 966.28KiB 19.67KiB 0 0.0615832 15.0MiB 1.05MiB 21.85KiB 0 0.0697082 False False
datadog_agent_remap_datadog_logs -1.42MiB -2.36 100.00% 59.93MiB 848.92KiB 17.38KiB 0 0.0138298 58.52MiB 4.08MiB 85.01KiB 0 0.0697592 False False
http_text_to_http_json -1.62MiB -4.04 100.00% 40.02MiB 836.18KiB 17.07KiB 0 0.0203997 38.4MiB 931.59KiB 19.02KiB 0 0.0236837 False False
file_to_blackhole -35.12MiB -36.83 100.00% 95.34MiB 3.38MiB 70.01KiB 0 0.035415 60.22MiB 39.49MiB 1.4MiB 0 0.655317 True False

@neuronull
Copy link
Contributor

The file to black hole soak regression is expected, it's been addressed on master which this branch hasn't pulled that change in yet.

Let's roll with the current placement of request_retry_partial option and we can change it later if needed.

Thanks @ktff !

@neuronull
Copy link
Contributor

I just noticed this question:

Better name/placement for option request_retry_partial? Adding it to request.* option set would also add it to all of the other http based sinks.

I think it is ok where it is? Tagging @spencergilbert and @jszwedko for thoughts on that.

We discussed this internally and the consensus was to roll with the current placement for now since future changes to it should still be able to maintain backwards comparability.

We might consider adding a wrapper type that includes the shared config struct along with the ES specific option.

@neuronull neuronull enabled auto-merge (squash) November 1, 2022 16:25
@neuronull neuronull merged commit 4271431 into vectordotdev:master Nov 1, 2022
@github-actions
Copy link

github-actions bot commented Nov 1, 2022

Soak Test Results

Baseline: 943c616
Comparison: e783a70
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_humio_metrics 173.88KiB 1.84 100.00% 9.24MiB 187.7KiB 3.83KiB 0 0.0198241 9.41MiB 415.89KiB 8.46KiB 0 0.0431332 False False
syslog_humio_logs 133.73KiB 0.81 100.00% 16.07MiB 99.85KiB 2.04KiB 0 0.0060668 16.2MiB 100.26KiB 2.05KiB 0 0.00604306 False False
splunk_hec_route_s3 162.7KiB 0.75 99.30% 21.17MiB 2.07MiB 43.13KiB 0 0.0978617 21.33MiB 2.01MiB 42.16KiB 0 0.094411 False False
syslog_splunk_hec_logs 108.13KiB 0.68 100.00% 15.61MiB 642.9KiB 13.1KiB 0 0.0402166 15.71MiB 545.5KiB 11.13KiB 0 0.0338944 False False
syslog_log2metric_splunk_hec_metrics 102.63KiB 0.61 100.00% 16.33MiB 489.88KiB 9.99KiB 0 0.029298 16.43MiB 664.37KiB 13.52KiB 0 0.0394911 False False
syslog_loki 49.74KiB 0.32 99.56% 14.96MiB 361.65KiB 7.4KiB 0 0.0236012 15.01MiB 778.59KiB 15.83KiB 0 0.0506457 False False
syslog_regex_logs2metric_ddmetrics 15.01KiB 0.2 41.19% 7.35MiB 970.43KiB 19.77KiB 0 0.12883 7.37MiB 951.98KiB 19.41KiB 0 0.126129 False False
http_pipelines_blackhole_acks 2.28KiB 0.18 65.18% 1.22MiB 92.87KiB 1.89KiB 0 0.0743312 1.22MiB 75.03KiB 1.53KiB 0 0.059944 False False
splunk_hec_to_splunk_hec_logs_noack 32.22KiB 0.13 98.46% 23.81MiB 554.7KiB 11.32KiB 0 0.0227453 23.84MiB 340.63KiB 6.96KiB 0 0.013949 False False
splunk_hec_indexer_ack_blackhole 3.66KiB 0.02 11.41% 23.75MiB 902.62KiB 18.36KiB 0 0.0371023 23.76MiB 870.8KiB 17.72KiB 0 0.035789 False False
enterprise_http_to_http -548.89B -0 5.78% 23.85MiB 254.32KiB 5.19KiB 0 0.0104131 23.85MiB 257.58KiB 5.27KiB 0 0.0105466 False False
file_to_blackhole -55.22KiB -0.06 39.63% 95.34MiB 3.33MiB 69.13KiB 0 0.0349694 95.29MiB 3.89MiB 80.83KiB 0 0.0407934 False False
splunk_hec_to_splunk_hec_logs_acks -29.37KiB -0.12 77.40% 23.77MiB 779.51KiB 15.87KiB 0 0.0320134 23.75MiB 902.75KiB 18.35KiB 0 0.0371193 False False
http_to_http_json -32.02KiB -0.13 98.26% 23.84MiB 378.15KiB 7.72KiB 0 0.0154845 23.81MiB 539.69KiB 11.02KiB 0 0.022128 False False
fluent_elasticsearch -150.14KiB -0.18 100.00% 79.47MiB 52.91KiB 1.07KiB 0 0.000650077 79.33MiB 1.5MiB 30.95KiB 0 0.0189408 False False
datadog_agent_remap_blackhole -131.82KiB -0.23 77.65% 56.87MiB 4.26MiB 88.69KiB 0 0.0748914 56.74MiB 2.98MiB 62.1KiB 0 0.0524475 False False
http_to_http_acks -42.79KiB -0.24 14.27% 17.18MiB 8.09MiB 169.23KiB 0 0.471184 17.13MiB 8.01MiB 167.26KiB 0 0.467617 True True
http_pipelines_no_grok_blackhole -30.79KiB -0.29 86.04% 10.43MiB 152.82KiB 3.12KiB 0 0.014303 10.4MiB 1012.85KiB 20.6KiB 0 0.095073 False False
http_to_http_noack -78.24KiB -0.32 99.87% 23.83MiB 514.75KiB 10.52KiB 0 0.0210906 23.75MiB 1.05MiB 22.0KiB 0 0.0444042 False False
http_pipelines_blackhole -5.66KiB -0.33 98.48% 1.67MiB 15.25KiB 319.13B 0 0.00893117 1.66MiB 113.25KiB 2.31KiB 0 0.0665662 False False
datadog_agent_remap_datadog_logs -290.0KiB -0.53 99.94% 53.78MiB 1.33MiB 27.97KiB 0 0.0247982 53.5MiB 3.82MiB 79.54KiB 0 0.0713816 False False
http_text_to_http_json -311.99KiB -0.8 100.00% 38.31MiB 811.59KiB 16.57KiB 0 0.0206825 38.01MiB 808.98KiB 16.52KiB 0 0.0207813 False False
datadog_agent_remap_blackhole_acks -535.61KiB -0.89 100.00% 58.62MiB 4.17MiB 86.82KiB 0 0.0711375 58.1MiB 2.52MiB 52.69KiB 0 0.0433219 False False
datadog_agent_remap_datadog_logs_acks -535.06KiB -0.98 100.00% 53.12MiB 2.25MiB 47.2KiB 0 0.0424179 52.59MiB 3.81MiB 79.24KiB 0 0.0723621 False False
socket_to_socket_blackhole -337.34KiB -1.46 100.00% 22.49MiB 489.33KiB 9.99KiB 0 0.0212459 22.16MiB 463.24KiB 9.46KiB 0 0.0204121 False False

davidhuie-dd pushed a commit that referenced this pull request Nov 1, 2022
…failure (#14891)

Adds a new config option for the elasticsearch sink (request_retry_partial), which is disabled by default but when enabled, retries the whole payload on partial bulk failure if the partial failure contains a retriable status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants