Summary
The readiness investigation split this into two related problems:
- Creation-time readiness bootstrap is broken for raw deployments.
- The top-level
Ready condition model is structurally incomplete for some deployment shapes.
For the specific fresh-create symptom, the primary bug is the bootstrap path: the controller can make a top-level readiness decision before it has authoritative engine readiness from the live workload object.
What happens today
- On create/update, the raw reconciler returns the desired
Deployment or LeaderWorkerSet object instead of a live object with controller-populated status.
- Status propagation then tries to derive readiness from a workload object whose
.status.conditions is still empty.
- Missing workload conditions currently degrade into a no-op instead of an explicit blocking condition.
- Component-condition initialization is inverted, so absent conditions are not initialized.
- Top-level
InferenceService conditions are not seeded at reconcile start.
That combination leaves a hole where a brand-new raw InferenceService can surface Ready before DeploymentAvailable / LeaderWorkerSetAvailable has actually been published.
Important scope note
- The broader structural issue is still real: top-level
Ready formally depends only on IngressReady and EngineReady, which means decoder/router/routes/latest-deployment are still outside the living condition set.
Fix direction
- Re-fetch the live
Deployment / LeaderWorkerSet after create or update before propagating status.
- Treat missing workload readiness conditions as explicit
Unknown instead of silently doing nothing.
- Initialize absent component conditions as
Unknown with an Initializing reason.
- Seed top-level
InferenceService conditions on first reconcile.
- Follow up separately on expanding the condition-set topology so
Ready is reliable for PD-disaggregated and serverless shapes as well.
Expected behavior
A newly created raw InferenceService must stay non-ready until the underlying deployment object actually reports availability. Missing source-of-truth status must block readiness instead of allowing or preserving Ready=True.
This issue tracks the phase-1 bootstrap fix for the raw readiness hole. The broader condition-set cleanup can follow as separate work if needed.
Summary
The readiness investigation split this into two related problems:
Readycondition model is structurally incomplete for some deployment shapes.For the specific fresh-create symptom, the primary bug is the bootstrap path: the controller can make a top-level readiness decision before it has authoritative engine readiness from the live workload object.
What happens today
DeploymentorLeaderWorkerSetobject instead of a live object with controller-populated status..status.conditionsis still empty.InferenceServiceconditions are not seeded at reconcile start.That combination leaves a hole where a brand-new raw
InferenceServicecan surfaceReadybeforeDeploymentAvailable/LeaderWorkerSetAvailablehas actually been published.Important scope note
Readyformally depends only onIngressReadyandEngineReady, which means decoder/router/routes/latest-deployment are still outside the living condition set.Fix direction
Deployment/LeaderWorkerSetafter create or update before propagating status.Unknowninstead of silently doing nothing.Unknownwith anInitializingreason.InferenceServiceconditions on first reconcile.Readyis reliable for PD-disaggregated and serverless shapes as well.Expected behavior
A newly created raw
InferenceServicemust stay non-ready until the underlying deployment object actually reports availability. Missing source-of-truth status must block readiness instead of allowing or preservingReady=True.This issue tracks the phase-1 bootstrap fix for the raw readiness hole. The broader condition-set cleanup can follow as separate work if needed.