Skip to content

[Feature]: Startup and readiness probes for replicas #6621

@gbartolini

Description

@gbartolini

Is there an existing issue already for this feature request/idea?

  • I have searched for an existing issue, and could not find anything. I believe this is a new feature request to be evaluated.

What problem is this feature going to solve? Why should it be added?

The current implementation of the startup probe uses the same endpoint as the liveness probe, which is /healthz. This endpoint relies on the pg_isready command. The liveness probe reports success when pg_isready returns either 0 (indicating that the system is ready for connections) or 1 (indicating that the system is starting up, such as during crash recovery).

The readiness probe begins after this point. However, in some cases, such as with replicas, this implementation may pose a limitation.

Describe the solution you'd like

We would like to differentiate the startup probe to define multiple waiting stages before the readiness probe kicks in:

  • pg_isready: pg_isready returns 0 (connection is possible)
  • streaming: in case of a streaming connected replica, the WAL receiver is up and lag is within a desired bound

Describe alternatives you've considered

The issue arose from a request by one of our customers at EDB to "tune" the instances participating in the synchronous replication quorum, with a focus on prioritising data durability. Following the shutdown of their only synchronous replica, the replica was automatically added to the synchronous_standby_names list as "ready" when it restarted. This caused write operations on the primary to halt until the replica became fully synchronous.

Our initial approach was to address this issue at the level of synchronous replication, pushing the problem to PostgreSQL's handling of replica readiness. However, after extensive discussions and scenario analysis between @leonardoce and me, we realised that modifying the readiness probe wasn't the optimal solution (better, wasn't the right place to solve it). Instead, the better approach would have been to focus on the startup probe, which operates exclusively during the pod's startup phase.

Additional context

N/A

Backport?

No

Are you willing to actively contribute to this feature?

No

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

Projects

Status

Done

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions