Skip to content

OME raw InferenceService can report Ready before deployment readiness is populated #545

@YouNeedCryDear

Description

@YouNeedCryDear

Summary

The readiness investigation split this into two related problems:

  1. Creation-time readiness bootstrap is broken for raw deployments.
  2. The top-level Ready condition model is structurally incomplete for some deployment shapes.

For the specific fresh-create symptom, the primary bug is the bootstrap path: the controller can make a top-level readiness decision before it has authoritative engine readiness from the live workload object.

What happens today

  • On create/update, the raw reconciler returns the desired Deployment or LeaderWorkerSet object instead of a live object with controller-populated status.
  • Status propagation then tries to derive readiness from a workload object whose .status.conditions is still empty.
  • Missing workload conditions currently degrade into a no-op instead of an explicit blocking condition.
  • Component-condition initialization is inverted, so absent conditions are not initialized.
  • Top-level InferenceService conditions are not seeded at reconcile start.

That combination leaves a hole where a brand-new raw InferenceService can surface Ready before DeploymentAvailable / LeaderWorkerSetAvailable has actually been published.

Important scope note

  • The broader structural issue is still real: top-level Ready formally depends only on IngressReady and EngineReady, which means decoder/router/routes/latest-deployment are still outside the living condition set.

Fix direction

  1. Re-fetch the live Deployment / LeaderWorkerSet after create or update before propagating status.
  2. Treat missing workload readiness conditions as explicit Unknown instead of silently doing nothing.
  3. Initialize absent component conditions as Unknown with an Initializing reason.
  4. Seed top-level InferenceService conditions on first reconcile.
  5. Follow up separately on expanding the condition-set topology so Ready is reliable for PD-disaggregated and serverless shapes as well.

Expected behavior

A newly created raw InferenceService must stay non-ready until the underlying deployment object actually reports availability. Missing source-of-truth status must block readiness instead of allowing or preserving Ready=True.

This issue tracks the phase-1 bootstrap fix for the raw readiness hole. The broader condition-set cleanup can follow as separate work if needed.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions