Skip to content

Improve Certificate Rotation for Metrics TLS in OpenShift Deployments #428

@frobware

Description

@frobware

The current OpenShift deployment configuration uses explicit certificate file paths (--metrics-cert-file and --metrics-key-file) for TLS on the metrics endpoint. This approach requires the controller to be restarted when certificates are renewed by OpenShift's Service CA, which can lead to potential downtime or periods where metrics are unavailable with outdated certificates.

Current Implementation

  • Uses explicit certificate files with --metrics-cert-file=/etc/tls/metrics/tls.crt and --metrics-key-file=/etc/tls/metrics/tls.key
  • OpenShift Service CA annotation generates certificates in the metrics-server-certs secret
  • Certificates are mounted at /etc/tls/metrics/
  • controller-runtime only loads these certificates at startup

Proposed Change (possibly)

Replace the explicit certificate file approach with the certificate directory approach:

  1. Update the OpenShift manager_metrics_patch.yaml to use --cert-dir=/etc/tls/metrics instead of explicit file paths
  2. Keep the same volume mount and OpenShift Service CA annotation configuration
  3. This will allow controller-runtime to monitor the certificate directory for changes and reload certificates automatically when they are renewed

Certificate Rotation Mechanism

Our implementation has two different approaches for handling TLS certificates:

  1. Current approach (OpenShift): Uses explicit certificate files with --metrics-cert-file and --metrics-key-file flags. These certificates are loaded once at startup using tls.LoadX509KeyPair() and set directly in the TLS config. This approach does not support certificate rotation without pod restart.

  2. Proposed approach: Use --cert-dir flag instead, which leverages controller-runtime's built-in certificate handling:

    • When using --cert-dir, controller-runtime's metrics server automatically sets up a certificate watcher
    • The watcher uses fsnotify to monitor certificate files for changes
    • It implements a GetCertificate callback that's attached to the TLS config
    • When a new connection is established, the TLS stack calls this method to get the most current certificate
    • Certificate changes are detected and applied without requiring server restart
    • This is handled automatically by controller-runtime when using the --cert-dir flag

Our implementation already includes code for both approaches, but the OpenShift configuration is currently set to use the explicit file path approach rather than the more robust certificate directory approach.

Benefits

  • Automatic certificate rotation without requiring pod restarts
  • Improved reliability when certificates are renewed
  • Consistent with Kubernetes best practices for certificate management
  • Prevents potential security issues with expired certificates

Related Configuration Files

  • /config/openshift/manager_metrics_patch.yaml
  • /config/openshift/metrics_service.yaml

The controller-runtime metrics server supports certificate hot-reloading when using the directory-based approach, which is the preferred method when certificates might be rotated during runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions