Skip to content

Title: Active health check per-endpoint hostname override blocked by wildcard Host: "*" #8564

@wengyao04

Description

@wengyao04

Description:
Active health check per-endpoint hostname override blocked by wildcard Host: "*"

Summary

When using active HTTP health checks with Backend FQDN endpoints on routes that match by headers (no hostname), the per-endpoint health check hostname override introduced in #8452 does not take effect. The health check Host header is set to "*", causing all endpoints to fail active health checks and triggering envoy panic mode, which is not the purpose of health-check-based failover.

Environment

  • Envoy Gateway version: v1.7.1
  • Kubernetes version: v1.33.0

Setup

The HTTPRoute has no hostnames.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-route
  namespace: gateway
spec:
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: my-gateway
  rules:
    - matches:
        - headers:
            - name: x-model-id
              type: Exact
              value: my-model
      backendRefs:
        - group: gateway.envoyproxy.io
          kind: Backend
          name: my-backend-cluster-a
          weight: 1
        - group: gateway.envoyproxy.io
          kind: Backend
          name: my-backend-cluster-b
          weight: 1
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: my-backend-cluster-a
  namespace: gateway
spec:
  endpoints:
    - hostname: service.cluster-a.example.com
      fqdn:
        hostname: service.cluster-a.example.com
        port: 443
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: my-backend-cluster-b
  namespace: gateway
spec:
  endpoints:
    - hostname: service.cluster-b.example.com
      fqdn:
        hostname: service.cluster-b.example.com
        port: 443
---
apiVersion: gateway.networking.k8s.io/v1
kind: BackendTLSPolicy
metadata:
  name: my-backend-cluster-a
  namespace: gateway
spec:
  targetRefs:
    - group: gateway.envoyproxy.io
      kind: Backend
      name: my-backend-cluster-a
  validation:
    caCertificateRefs:
      - group: ""
        kind: ConfigMap
        name: ca-bundle
    hostname: service.cluster-a.example.com
---
apiVersion: gateway.networking.k8s.io/v1
kind: BackendTLSPolicy
metadata:
  name: my-backend-cluster-b
  namespace: gateway
spec:
  targetRefs:
    - group: gateway.envoyproxy.io
      kind: Backend
      name: my-backend-cluster-b
  validation:
    caCertificateRefs:
      - group: ""
        kind: ConfigMap
        name: ca-bundle
    hostname: service.cluster-b.example.com
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: my-route
  namespace: gateway
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: my-route
  healthCheck:
    active:
      type: HTTP
      http:
        path: /healthz
        method: GET
        expectedStatuses: [200]
        # hostname is NOT set — expecting per-endpoint hostname from Backend
      interval: 2s
      timeout: 2s
      healthyThreshold: 2
      unhealthyThreshold: 3

Expected Behavior

Each endpoint should have its own Host header in the health check request, derived from the Backend endpoint's hostname field via Endpoint.HealthCheckConfig.Hostname (the per-endpoint override path added in #8452).

Actual Behavior

The health check sends Host: "*" for all endpoints. The per-endpoint health_check_config is empty in the xDS config despite the Backend endpoint hostname field being properly populated.

This causes:

  1. All endpoints fail active health checks (upstream rejects Host: *)
  2. Envoy enters panic mode (100% unhealthy exceeds the 50% healthy_panic_threshold)
  3. Traffic is load balanced blindly across all endpoints, including unhealthy ones
  4. Clients see errors when requests hit the downed backend — no failover

Root Cause

  1. internal/gatewayapi/helpers.go:311-316, computeHosts returns "*" when neither the HTTPRoute nor the Listener specifies a hostname:
  // No route hostnames specified: use the listener hostname if specified,
  // or else match all hostnames.
  if len(routeHostnames) == 0 {
      if len(listenerHostnameVal) > 0 {
          return []string{listenerHostnameVal}
      }
      return []string{"*"}  // ← origin of "*"
  }

This "*" is assigned to the IR route at internal/gatewayapi/route.go:1285:

hostRoute.Hostname = host  // host = "*" for header-matched routes
  1. In internal/gatewayapi/backendtrafficpolicy.go:822, The "*" hostname is propagated to the health check Host field via SetHTTPHostIfAbsent:
r.Traffic.HealthCheck.SetHTTPHostIfAbsent(r.Hostname)  // r.Hostname = "*"

Which calls internal/ir/xds.go:2882-2886:

  func (h *HealthCheck) SetHTTPHostIfAbsent(host string) {
      if h != nil && h.Active != nil && h.Active.HTTP != nil && h.Active.HTTP.Host == "" {
          h.Active.HTTP.Host = host  // sets Host = "*"
      }
  }
  1. internal/xds/translator/cluster.go:890 - buildHealthCheckConfig treats "*" as an explicit hostname override, skipping the per-endpoint path from fix: active health check respect endpoint hostname #8452:
func buildHealthCheckConfig(hc *ir.HealthCheck, ep *ir.DestinationEndpoint) *endpointv3.Endpoint_HealthCheckConfig {
      // ...
      if hc.Active.HTTP != nil && hc.Active.HTTP.Host != "" {  // "*" != "" → true
          return nil  // skips per-endpoint hostname override
      }
      // This code is never reached:
      if ep == nil || ep.Hostname == nil {
          return nil
      }
      return &endpointv3.Endpoint_HealthCheckConfig{
          Hostname: *ep.Hostname,
      }
  }

The "*" wildcard is not an explicit user-provided hostname — it's an auto-generated default from computeHosts for routes that match by headers only. But buildHealthCheckConfig treats it the same as an intentional hostname override, blocking the per-endpoint path from #8452.

Suggested Fix

One-line change in internal/xds/translator/cluster.go:890:

 // Before:
  if hc.Active.HTTP != nil && hc.Active.HTTP.Host != "" {
// After:
  if hc.Active.HTTP != nil && hc.Active.HTTP.Host != "" && hc.Active.HTTP.Host != "*" {

This allows the per-endpoint health_check_config.hostname to be set when the cluster-level host is the wildcard default, while still respecting explicitly configured hostnames.

Verification

With this fix applied:

  • Cluster-level health check still shows host: "*" (unchanged)
  • Each endpoint gets health_check_config.hostname set from its Backend endpoint hostname field
  • Envoy uses the per-endpoint hostname to override the cluster-level "*" for health check requests
  • Active health checks pass for healthy endpoints, fail for unhealthy ones
  • Failover works correctly without triggering panic mode

[optional Relevant Links:]

Any extra documentation required to understand the issue.

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions