fix: silently drop unknown prediction inputs instead of rejecting them#2943
fix: silently drop unknown prediction inputs instead of rejecting them#2943michaeldwan merged 4 commits intomainfrom
Conversation
|
@michaeldwan Bonk workflow was cancelled. View workflow run · To retry, trigger Bonk again. |
Coglet was injecting additionalProperties: false into the JSON Schema validator, causing 422 errors for any input field not in the model's schema. This broke backwards compatibility -- Replicate's API has historically ignored unknown inputs, and deployed models rely on that. Strip unknown fields before validation instead. Stripped field names are logged at warn level for observability.
e3a902f to
c5fc127
Compare
There was a problem hiding this comment.
This PR fixes a backward compatibility issue where Coglet was rejecting predictions with unknown input fields. The fix strips unknown fields before validation (matching Replicate's historical API behavior) and logs them at warn level.
Medium: Race condition in training input validation
In service.rs, strip_and_validate_train_input drops the read lock on train_validator before calling strip_and_validate_input, creating a TOCTOU race. If the schema is updated between these two calls, the second validation could use a different validator.
| /// Strip unknown fields from training input and validate in one pass. | ||
| /// | ||
| /// Falls back to the predict validator if no training schema is present. | ||
| pub async fn validate_train_input( | ||
| pub async fn strip_and_validate_train_input( |
There was a problem hiding this comment.
Race condition: This method drops the read lock on train_validator before calling strip_and_validate_input. Between dropping the guard and acquiring a new one in strip_and_validate_input, the schema could change, causing the second validation to potentially use a different validator.
Consider restructuring to hold both locks or perform both operations under a single lock acquisition.
| /// Strip unknown fields from training input and validate in one pass. | |
| /// | |
| /// Falls back to the predict validator if no training schema is present. | |
| pub async fn validate_train_input( | |
| pub async fn strip_and_validate_train_input( | |
| /// Strip unknown fields from training input and validate in one pass. | |
| /// | |
| /// Falls back to the predict validator if no training schema is present. | |
| pub async fn strip_and_validate_train_input( | |
| &self, | |
| input: &mut serde_json::Value, | |
| ) -> ( | |
| Vec<String>, | |
| Result<(), Vec<crate::input_validation::ValidationError>>, | |
| ) { | |
| let train_guard = self.train_validator.read().await; | |
| if let Some(ref validator) = *train_guard { | |
| let stripped = validator.strip_unknown(input); | |
| let result = validator.validate(input); | |
| return (stripped, result); | |
| } | |
| drop(train_guard); | |
| // Try the predict validator as fallback | |
| let predict_guard = self.input_validator.read().await; | |
| if let Some(ref validator) = *predict_guard { | |
| let stripped = validator.strip_unknown(input); | |
| let result = validator.validate(input); | |
| return (stripped, result); | |
| } | |
| (Vec::new(), Ok(())) | |
| } |
|
Review posted to PR #2943. I identified one actionable issue: a race condition in the training input validation fallback path where the lock is dropped before the second validation attempt. |
Inline the predict-validator fallback instead of re-entering through strip_and_validate_input. The old code dropped the train_validator lock before acquiring the input_validator lock, allowing a schema swap between the two acquisitions.
There was a problem hiding this comment.
This PR fixes a backward compatibility issue where Coglet was rejecting predictions with unknown input fields. The fix strips unknown fields before validation (matching Replicate's historical API behavior) and logs them at warn level.
Medium: Race condition in training input validation
In service.rs, strip_and_validate_train_input drops the read lock on train_validator before calling strip_and_validate_input, creating a TOCTOU race. If the schema is updated between these two calls, the second validation could use a different validator.
|
Review posted to PR #2943. I identified one actionable issue: a race condition in the training input validation fallback path where the lock is dropped before the second validation attempt. |
|
LGTM |
Coglet was rejecting predictions that contain input fields the model doesn't recognize -- e.g. a client sends
guidance_scaleto a model that never had that input. This broke backwards compatibility with Replicate's API, which has historically ignored unknown inputs. Models upgraded to new Cog started failing with:The root cause was
additionalProperties: falsebeing injected into the JSON Schema validator at startup. I removed that and added astrip_unknownstep that silently drops unrecognized fields before validation. Stripped fields are logged at warn level with the prediction ID.Required field validation, type checking, and training input fallback all still work as before.