Skip to content

Allow HCP metrics collection for Envoy proxies#16511

Merged
freddygv merged 6 commits into
mainfrom
cc-4361/hcp-metrics-bootstrap-config
Mar 10, 2023
Merged

Allow HCP metrics collection for Envoy proxies#16511
freddygv merged 6 commits into
mainfrom
cc-4361/hcp-metrics-bootstrap-config

Conversation

@Achooo
Copy link
Copy Markdown
Contributor

@Achooo Achooo commented Mar 2, 2023

Description

Add a new envoy flag: envoy_hcp_metrics_bind_socket_dir, a directory where a unix socket will be created with the name <namespace>_<proxy_id>.sock to forward Envoy metrics.
If set, this will configure a local stats_sink and STATIC cluster to forward Envoy metrics to an HCP collector.

The HCP Cloud team is working on enabling observability metrics in HCP.

There will be a follow-up PR to generate a listener and cluster dynamically that will be listening on this unix socket.
That listener and cluster will be configured to route to collector instances through the service mesh.

Caveat: we needed to add this local listener indirection as a workaround due to this issue

Testing & Reproduction steps

  • Added unit tests
  • Tested locally by running consul, registering a proxy with the flag configured and starting Envoy with consul connect command.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

@Achooo Achooo requested a review from a team March 2, 2023 20:08
@Achooo Achooo requested a review from a team as a code owner March 2, 2023 20:08
@Achooo Achooo requested review from DanStough, freddygv and wilkermichael and removed request for a team March 2, 2023 20:08
@github-actions github-actions Bot added theme/cli Flags and documentation for the CLI interface type/docs Documentation needs to be created/updated/clarified labels Mar 2, 2023
@Achooo Achooo force-pushed the cc-4361/hcp-metrics-bootstrap-config branch from 3dd9c99 to 655443b Compare March 2, 2023 20:12
@freddygv
Copy link
Copy Markdown
Contributor

freddygv commented Mar 2, 2023

FYI @DanStough @wilkermichael we're thinking of holding off on merging this until after next week's patch releases.

The follow-up PR will be chained off of this one and will have the changelog entry.


if len(stats_sinks) > 0 {
args.StatsSinksJSON = "[\n" + strings.Join(stats_sinks, ",\n") + "\n]"
args.StatsSinksJSON = strings.Join(stats_sinks, ",\n")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the [ and ] to the template in order to add the HCP metrics sink and cluster configuration in the upper level function.

wantArgs: BootstrapTplArgs{
StatsConfigJSON: defaultStatsConfigJSON,
StatsSinksJSON: `[{
StatsSinksJSON: `{
Copy link
Copy Markdown
Contributor Author

@Achooo Achooo Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests below just remove the [ and ] which are now in the template, see this comment

},
{{- if .StatsSinksJSON }}
"stats_sinks": {{ .StatsSinksJSON }},
"stats_sinks": [
Copy link
Copy Markdown
Contributor Author

@Achooo Achooo Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the brackets into the template file

Related to https://github.com/hashicorp/consul/pull/16511/files#r1123725950

Comment thread command/connect/envoy/envoy.go Outdated
return nil, fmt.Errorf("failed parsing Proxy.Config: %s", err)
}

if bsCfg.HCPMetricsBindPort < 0 || bsCfg.HCPMetricsBindPort > 65535 {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the port

@DanStough DanStough added the pr/no-changelog PR does not need a corresponding .changelog entry label Mar 2, 2023
Copy link
Copy Markdown
Contributor

@DanStough DanStough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@trujillo-adam trujillo-adam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a suggestion on the docs, but I'm not convinced that it's the perfect way to describe the param.

Comment on lines +78 to +79
- `envoy_hcp_metrics_bind_port` - By setting this port, Envoy will be configured to
send metrics to an HCP collector. By default, this is disabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `envoy_hcp_metrics_bind_port` - By setting this port, Envoy will be configured to
send metrics to an HCP collector. By default, this is disabled.
- `envoy_hcp_metrics_bind_port` - Specifies the port number for Envoy to send metrics to. HCP collectors can collect the metrics at this port. The port is not configured by default.

Attempted to reword this so that we are more concrete about how to configure the param. Better? Worse? No different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the review , and so sorry for the delay. We made some changes to our approach, see here. I tried to reword it based on your suggestions - which I liked.

I'm not 💯 on the new changes. Would love to get your feedback on the new version, and maybe @freddygv has thoughts here.

@Achooo Achooo force-pushed the cc-4361/hcp-metrics-bootstrap-config branch 3 times, most recently from 8dbbeae to 3896dfe Compare March 7, 2023 15:29
@Achooo Achooo requested a review from DanStough March 7, 2023 15:37
// appendHCPMetricsConfig generates config to enable a socket at path: <hcpMetricsBindSocketDir>/<namespace>_<proxy_id>.sock
// or <hcpMetricsBindSocketDir>/<proxy_id>.sock, if namespace is empty.
func appendHCPMetricsConfig(args *BootstrapTplArgs, hcpMetricsBindSocketDir string) {
dir := hcpMetricsBindSocketDir
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle / addition if needed from user input.

In cases where either assumption is violated this flag will prevent the
command attempting to resolve config from the local agent.

- `envoy_hcp_metrics_bind_socket_dir` - Specifies the directory of a unix socket
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Specifies the directory where a unix socket is created be better here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What creates the directory? The passive voice sounds as if either Envoy or Consul creates the directory. If that's true, then we should say something like "Specifies the directory where <Envoy/Consul> creates a unix socket. Envoy sends metrics to the socket so that HCP collectors can connect collect them."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that's much better, thank you.

@Achooo
Copy link
Copy Markdown
Contributor Author

Achooo commented Mar 7, 2023

@freddygv had the idea of using a unix socket instead of a port to avoid ports collisions, and after realizing services run on the same container.

PR has been updated to:

  • Allow configuration of envoy_hcp_metrics_bind_socket_dir, a directory where the socket will be created with the name <namespace>_<proxy_id>.sock
  • Update to add a HCPMetricsBindSocketDir string type in the BootstrapConfig struct, the stats sink and cluster generation and fix relevant tests.
  • Remove port validation in envoy.go and update golden file
  • Update docs with the new flag name

See this commit 3896dfe

@Achooo
Copy link
Copy Markdown
Contributor Author

Achooo commented Mar 8, 2023

See discussion on Windows support concern here: hashicorp/consul-dataplane#90 (comment)

TLDR: We'd like to move forward with this approach, and revisit at a later stage if needed.

Comment on lines 819 to 823
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a common situation we have acl.NamespaceOrDefault to handle defaulting:
path := fmt.Sprintf("%s%s_%s.sock", dir, acl.NamespaceOrDefault(args.Namespace), args.ProxyID)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this function fails the tests for a set namespace, as they all get normalized to default. I will add a test in enterprise to cover the other namespace case.

@Achooo Achooo force-pushed the cc-4361/hcp-metrics-bootstrap-config branch from d5dc932 to 8c332b9 Compare March 8, 2023 21:15
@freddygv freddygv requested a review from trujillo-adam March 9, 2023 17:03
@Achooo Achooo force-pushed the cc-4361/hcp-metrics-bootstrap-config branch from 307c579 to 8cda1df Compare March 10, 2023 13:30
@Achooo Achooo requested a review from hashi-derek March 10, 2023 13:34
Comment on lines 813 to 820
Copy link
Copy Markdown
Contributor

@loshz loshz Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the path package here:

Suggested change
dir := hcpMetricsBindSocketDir
if !strings.HasSuffix(dir, "/") {
dir += "/"
}
// Normalize namespace to "default". This ensures we match the namespace behaviour in proxycfg package,
// where a dynamic listener will be created at the same socket path via xDS.
path := fmt.Sprintf("%s%s_%s.sock", dir, acl.NamespaceOrDefault(args.Namespace), args.ProxyID)
sock := fmt.Sprintf("%s_%s.sock", acl.NamespaceOrDefault(args.Namespace), args.ProxyID)
// Normalize namespace to "default". This ensures we match the namespace behaviour in proxycfg package,
// where a dynamic listener will be created at the same socket path via xDS.
path := path.Join(hcpMetricsBindSocketDir, sock)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, fixed! That looks prettier 💯

Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>

This commit builds on top of the previous one by configuring a dynamic listener at that statically configured socket. The changes below inject the HCP metrics collector as an upstream for connect proxies since it is intended to be a mesh service deployed onto the user's cluster. 

Why is there dynamic configuration when the cluster is statically defined at bootstrap time?
- We want to secure the metrics stream using TLS, but the stats sink can only be defined in bootstrap config. With dynamic listeners/clusters we can use certificates issued by the connect CA, which aren't available at bootstrap time.
- We want to intelligently route to the HCP collector. Configuring its address at bootstrap time limits our flexibility routing-wise compared to providing clusters/endpoints dynamically using xDS. More on this below.

Why define the collector as an upstream in `proxycfg`?
- Service discovery and routing logic is automatically taken care of, meaning that no code changes are required in the `xds` package.
- Certificate management is taken care of. Each proxy will dial using the certificate of the proxy it represents, and the HCP collector can present its own certificate as well as check intentions.
- Custom routing rules can be added for the collector using discovery chain config entries. Initially the collector is expected to be deployed to each admin partition, but in the future could be deployed centrally in the default partition. These config entries could be managed by HCP itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr/no-changelog PR does not need a corresponding .changelog entry theme/cli Flags and documentation for the CLI interface type/docs Documentation needs to be created/updated/clarified

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants