Skip to content

Fix data race on metricDescriptorsFunction start and end times#158

Merged
SuperQ merged 1 commit intoprometheus-community:masterfrom
thepalbi:pablo/fix-data-race
May 19, 2022
Merged

Fix data race on metricDescriptorsFunction start and end times#158
SuperQ merged 1 commit intoprometheus-community:masterfrom
thepalbi:pablo/fix-data-race

Conversation

@thepalbi
Copy link
Contributor

Doing some trials with the exporter, I found a data race condition in the go-routines launched by each metricDescriptorsFunction when about to pull time series.

The raw stacktrace goes as follows:

==================
WARNING: DATA RACE
Write at 0x00c00000f7b8 by goroutine 104:
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:245 +0x834
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.2()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:283 +0x58

Previous read at 0x00c00000f7b8 by goroutine 41:
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:260 +0x1240
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.2()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:283 +0x58

Goroutine 104 (running) created at:
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:223 +0x3c8
level=debug ts=2022-05-19T15:42:57.467941Z revision=fd8d54ee-WIP caller=monitoring_collector.go:255 traceID=1272ddad166d68b1 instance_id=123 metrics_id=123 job_name=holis msg="retrieving Google Stackdriver Monitoring metrics with filter" filter="project=\"wired-height-350515\" AND metric.type=\"serviceruntime.googleapis.com/api/request_latencies_backend\""
  google.golang.org/api/monitoring/v3.(*ProjectsMetricDescriptorsListCall).Pages()
      /Users/pablo/go/pkg/mod/google.golang.org/api@v0.75.0/monitoring/v3/monitoring-gen.go:10545 +0x180
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func2()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:311 +0x564
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func3()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:314 +0x58

Goroutine 41 (running) created at:
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:223 +0x3c8
  google.golang.org/api/monitoring/v3.(*ProjectsMetricDescriptorsListCall).Pages()
      /Users/pablo/go/pkg/mod/google.golang.org/api@v0.75.0/monitoring/v3/monitoring-gen.go:10545 +0x180
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func2()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:311 +0x564
  github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func3()
      /Users/pablo/work/stackdriver_exporter/collectors/monitoring_collector.go:314 +0x58

Diving a bit deeper, the data race arises when one go routine, launched here, tries to write the endTime object; while another one attempts to read endTime. Since the endTime date object is shared across all go-routines this leads to a data race.
This is easily fixed by passing both startTime and endTime by copy to each MetricsDescriptor processor go-routine.

Notice that this problem ONLY happens if the ingest delay flag is turned on, because the offending write path is flagged with it.

@SuperQ
Copy link
Contributor

SuperQ commented May 19, 2022

Ohh, good catch.

This needs a DCO sign-off. You can use git commit -s --amend to add it.

Signed-off-by: Pablo Balbi <pablo.l.balbi@gmail.com>
@thepalbi thepalbi force-pushed the pablo/fix-data-race branch from 0b74d3f to 23fbbad Compare May 19, 2022 17:10
@thepalbi thepalbi changed the title Fix data race on List MetricDescriptors start and end times Fix data race on metricDescriptorsFunction start and end times May 19, 2022
@SuperQ SuperQ merged commit ed76fbf into prometheus-community:master May 19, 2022
@kgeckhart kgeckhart mentioned this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants