Skip to content

feat: xds envoy proxy#175

Open
arcanez wants to merge 10 commits intofeat/telemetryvalidation-crdfrom
feat/xds-envoy-proxy
Open

feat: xds envoy proxy#175
arcanez wants to merge 10 commits intofeat/telemetryvalidation-crdfrom
feat/xds-envoy-proxy

Conversation

@arcanez
Copy link
Copy Markdown
Contributor

@arcanez arcanez commented Apr 24, 2026

Description

add xDS reconciliation for envoy

  • ENG-1132

What type of PR is this? (only check one)

  • Feature (feat)
  • Bug Fix (fix)
  • Refactor (refactor)
  • Documentation Update (doc)
  • Other, please specify:

Does this PR introduce any breaking changes?

  • No
  • Yes, and this is why:
  • I need help determining if there are any breaking changes

Does the PR title type prefix match the selected type?

  • Yes
  • No, and this is why:
  • I need help select the right one

Added/updated tests?

Please keep the code coverage percentage at 80% and above.

  • Yes
  • No, and this is why:
  • I need help with writing tests

@arcanez arcanez changed the base branch from main to refactor/controller-reconcile-operations April 24, 2026 18:44
@arcanez arcanez changed the base branch from refactor/controller-reconcile-operations to feat/telemetryvalidation-crd April 24, 2026 18:48
@arcanez arcanez force-pushed the feat/telemetryvalidation-crd branch from 4f54582 to 8f75414 Compare April 24, 2026 21:23
arcanez added 10 commits April 24, 2026 14:43
Adds an Envoy xDS control-plane server that dynamically builds route
snapshots from OpenTelemetryCollector and TelemetryValidation objects.
Includes the xDS manager, gRPC server, controller, deployment config,
and Helm chart wiring.
Tune xDS-generated Envoy configuration for DataDog agent -> OTEL datadogreceiver workloads:

- HTTP/2: MaxConcurrentStreams=50, 1MiB stream / 4MiB conn window sizes, 30s/5s keepalive
- Cluster: 32MiB buffer (primary) / 10MiB (mirror); circuit breakers 100 conns/500 req
  (primary), 50/200 (mirror); gRPC connect timeout 15s primary / 10s mirror / 5s HTTP;
  outlier detection with 5x consecutive errors, 30s ejection, 50% cap
- Route: Timeout=0 so sender governs deadline; IdleTimeout=10m; retry only on
  connection-level failures (connect-failure,refused-stream,reset), NumRetries=1 with
  100ms-1s backoff and 5s per-try timeout
- Listener: PerConnectionBufferLimitBytes=32MiB
- Mirror: explicit RuntimeFraction=100% on RequestMirrorPolicy
- Export IsShadowCollector for use by the controller package
…an up stale ports

Add a Namespace field to XDSReconciler populated from POD_NAMESPACE at startup.
Pass it as InNamespace to marked-service and marked-deployment list calls so the
controller only considers resources in its own namespace rather than cluster-wide.

Fix reconcileProxyServicePorts to clean up stale xds-* ports when no eligible
collectors remain, rather than returning early and leaving old ports in place.
Add findExistingProxyService for the zero-collector path which does a GET-only
lookup without creating a new service.

Remove isShadowCollectorForXDS and extractPortForXDS (duplicates of IsShadowCollector
from the xds package and extractPort from the validator file). Use xds.IsShadowCollector
and extractPort directly instead.

Remove discoveredCollectorPorts wrapper around eligibleCollectorsForXDS since callers
already pass a pre-filtered slice.
@arcanez arcanez force-pushed the feat/xds-envoy-proxy branch from 42be400 to c2aed14 Compare April 24, 2026 21:45
@arcanez arcanez marked this pull request as ready for review April 24, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant