Allow HCP metrics collection for Envoy proxies#16511
Conversation
3dd9c99 to
655443b
Compare
|
FYI @DanStough @wilkermichael we're thinking of holding off on merging this until after next week's patch releases. The follow-up PR will be chained off of this one and will have the changelog entry. |
|
|
||
| if len(stats_sinks) > 0 { | ||
| args.StatsSinksJSON = "[\n" + strings.Join(stats_sinks, ",\n") + "\n]" | ||
| args.StatsSinksJSON = strings.Join(stats_sinks, ",\n") |
There was a problem hiding this comment.
Moved the [ and ] to the template in order to add the HCP metrics sink and cluster configuration in the upper level function.
| wantArgs: BootstrapTplArgs{ | ||
| StatsConfigJSON: defaultStatsConfigJSON, | ||
| StatsSinksJSON: `[{ | ||
| StatsSinksJSON: `{ |
There was a problem hiding this comment.
The tests below just remove the [ and ] which are now in the template, see this comment
| }, | ||
| {{- if .StatsSinksJSON }} | ||
| "stats_sinks": {{ .StatsSinksJSON }}, | ||
| "stats_sinks": [ |
There was a problem hiding this comment.
Move the brackets into the template file
Related to https://github.com/hashicorp/consul/pull/16511/files#r1123725950
| return nil, fmt.Errorf("failed parsing Proxy.Config: %s", err) | ||
| } | ||
|
|
||
| if bsCfg.HCPMetricsBindPort < 0 || bsCfg.HCPMetricsBindPort > 65535 { |
trujillo-adam
left a comment
There was a problem hiding this comment.
I left a suggestion on the docs, but I'm not convinced that it's the perfect way to describe the param.
| - `envoy_hcp_metrics_bind_port` - By setting this port, Envoy will be configured to | ||
| send metrics to an HCP collector. By default, this is disabled. |
There was a problem hiding this comment.
| - `envoy_hcp_metrics_bind_port` - By setting this port, Envoy will be configured to | |
| send metrics to an HCP collector. By default, this is disabled. | |
| - `envoy_hcp_metrics_bind_port` - Specifies the port number for Envoy to send metrics to. HCP collectors can collect the metrics at this port. The port is not configured by default. |
Attempted to reword this so that we are more concrete about how to configure the param. Better? Worse? No different?
There was a problem hiding this comment.
8dbbeae to
3896dfe
Compare
| // appendHCPMetricsConfig generates config to enable a socket at path: <hcpMetricsBindSocketDir>/<namespace>_<proxy_id>.sock | ||
| // or <hcpMetricsBindSocketDir>/<proxy_id>.sock, if namespace is empty. | ||
| func appendHCPMetricsConfig(args *BootstrapTplArgs, hcpMetricsBindSocketDir string) { | ||
| dir := hcpMetricsBindSocketDir |
There was a problem hiding this comment.
Handle / addition if needed from user input.
| In cases where either assumption is violated this flag will prevent the | ||
| command attempting to resolve config from the local agent. | ||
|
|
||
| - `envoy_hcp_metrics_bind_socket_dir` - Specifies the directory of a unix socket |
There was a problem hiding this comment.
Would Specifies the directory where a unix socket is created be better here?
There was a problem hiding this comment.
What creates the directory? The passive voice sounds as if either Envoy or Consul creates the directory. If that's true, then we should say something like "Specifies the directory where <Envoy/Consul> creates a unix socket. Envoy sends metrics to the socket so that HCP collectors can connect collect them."
There was a problem hiding this comment.
Fixed that's much better, thank you.
|
@freddygv had the idea of using a unix socket instead of a port to avoid ports collisions, and after realizing services run on the same container. PR has been updated to:
See this commit 3896dfe |
|
See discussion on Windows support concern here: hashicorp/consul-dataplane#90 (comment) TLDR: We'd like to move forward with this approach, and revisit at a later stage if needed. |
There was a problem hiding this comment.
Since this is a common situation we have acl.NamespaceOrDefault to handle defaulting:
path := fmt.Sprintf("%s%s_%s.sock", dir, acl.NamespaceOrDefault(args.Namespace), args.ProxyID)
There was a problem hiding this comment.
Using this function fails the tests for a set namespace, as they all get normalized to default. I will add a test in enterprise to cover the other namespace case.
d5dc932 to
8c332b9
Compare
307c579 to
8cda1df
Compare
There was a problem hiding this comment.
I would use the path package here:
| dir := hcpMetricsBindSocketDir | |
| if !strings.HasSuffix(dir, "/") { | |
| dir += "/" | |
| } | |
| // Normalize namespace to "default". This ensures we match the namespace behaviour in proxycfg package, | |
| // where a dynamic listener will be created at the same socket path via xDS. | |
| path := fmt.Sprintf("%s%s_%s.sock", dir, acl.NamespaceOrDefault(args.Namespace), args.ProxyID) | |
| sock := fmt.Sprintf("%s_%s.sock", acl.NamespaceOrDefault(args.Namespace), args.ProxyID) | |
| // Normalize namespace to "default". This ensures we match the namespace behaviour in proxycfg package, | |
| // where a dynamic listener will be created at the same socket path via xDS. | |
| path := path.Join(hcpMetricsBindSocketDir, sock) |
There was a problem hiding this comment.
thanks for the suggestion, fixed! That looks prettier 💯
8c65c58 to
4eb82e1
Compare
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com> This commit builds on top of the previous one by configuring a dynamic listener at that statically configured socket. The changes below inject the HCP metrics collector as an upstream for connect proxies since it is intended to be a mesh service deployed onto the user's cluster. Why is there dynamic configuration when the cluster is statically defined at bootstrap time? - We want to secure the metrics stream using TLS, but the stats sink can only be defined in bootstrap config. With dynamic listeners/clusters we can use certificates issued by the connect CA, which aren't available at bootstrap time. - We want to intelligently route to the HCP collector. Configuring its address at bootstrap time limits our flexibility routing-wise compared to providing clusters/endpoints dynamically using xDS. More on this below. Why define the collector as an upstream in `proxycfg`? - Service discovery and routing logic is automatically taken care of, meaning that no code changes are required in the `xds` package. - Certificate management is taken care of. Each proxy will dial using the certificate of the proxy it represents, and the HCP collector can present its own certificate as well as check intentions. - Custom routing rules can be added for the collector using discovery chain config entries. Initially the collector is expected to be deployed to each admin partition, but in the future could be deployed centrally in the default partition. These config entries could be managed by HCP itself.
Description
Add a new envoy flag:
envoy_hcp_metrics_bind_socket_dir, a directory where a unix socket will be created with the name<namespace>_<proxy_id>.sockto forward Envoy metrics.If set, this will configure a local
stats_sinkandSTATICcluster to forward Envoy metrics to an HCP collector.The HCP Cloud team is working on enabling observability metrics in HCP.
There will be a follow-up PR to generate a listener and cluster dynamically that will be listening on this unix socket.
That listener and cluster will be configured to route to collector instances through the service mesh.
Caveat: we needed to add this local listener indirection as a workaround due to this issue
Testing & Reproduction steps
PR Checklist