The results from this issue were broken up into several issues. The part that remains here is:
- graphite_exporter should implement a counter of rejected metrics and expose it as parts of its own metrics.
- graphite_exporter should not panic and exit when an invalid metric is received.
- Graphite_exporter in debug mode should log the offending metrics in a log file (or in the debug output of the web page for simplicity sake)
Original investigation:
==========
The majority of our metrics contain the field _ in the various parts of the metric.
labels.go has a validation function, validateLabelValues, that checks if the expected number of labels is consistent with the original number of dot delimited fields.
with a metric like:
hostname_function.source.description.metric_blah.count 2.0 timestamp
The exporter will get confused between hostname_function and hostname function. Due to the splitting the underscores.
_ is a valid Graphite AND Prometheus value and should be handled.
UPDATE:When using matching this problem can be bypassed by assigning parts of the graphite metric name to labels and extracting a good metric name. See comment below. Issue is still valid as it should not panic when encountering an unexpected/unsupported/supported_but_problematic metric name.
Suggested fix: A temporary token should be used to treat initial underscores detected and restoring them when creating the label names to be exposed to Prometheus.
0.5 Panics
0.4.2 Drops the offending metrics and only exposes the "valid" metrics. But it considers the graphite_Exporter as Down and throws errors in the syslog metric after HELP is INVALID.
func validateLabelValues(vals []string, expectedNumberOfValues int) error {
if len(vals) != expectedNumberOfValues {
return fmt.Errorf(
"%s: expected %d label values but got %d in %#v",
errInconsistentCardinality, expectedNumberOfValues,
len(vals), vals,
)
}
for _, val := range vals {
if !utf8.ValidString(val) {
return fmt.Errorf("label value %q is not valid UTF-8", val)
}
}
return nil
}
The results from this issue were broken up into several issues. The part that remains here is:
Original investigation:
==========
The majority of our metrics contain the field _ in the various parts of the metric.
labels.go has a validation function, validateLabelValues, that checks if the expected number of labels is consistent with the original number of dot delimited fields.
with a metric like:
hostname_function.source.description.metric_blah.count 2.0 timestamp
The exporter will get confused between hostname_function and hostname function. Due to the splitting the underscores.
_ is a valid Graphite AND Prometheus value and should be handled.
UPDATE:When using matching this problem can be bypassed by assigning parts of the graphite metric name to labels and extracting a good metric name. See comment below. Issue is still valid as it should not panic when encountering an unexpected/unsupported/supported_but_problematic metric name.
Suggested fix: A temporary token should be used to treat initial underscores detected and restoring them when creating the label names to be exposed to Prometheus.0.5 Panics
0.4.2 Drops the offending metrics and only exposes the "valid" metrics. But it considers the graphite_Exporter as Down and throws errors in the syslog metric after HELP is INVALID.