Summary
LLMProviders.analyzeImage() with the cloudflare provider returns { content: "", message: "" } when called from a Cloudflare Worker binding. The vision step completes without throwing, but produces no usable text, silently breaking downstream consumers.
Reproduction
Context: foodfiles POST /v2/recipes/analyze (apps/api/src/routes/recipes.ts)
const llm = new LLMProviders({
cloudflare: { ai: c.env.AI },
defaultProvider: "cloudflare",
costOptimization: true,
enableCircuitBreaker: true,
});
const visionResult = await llm.analyzeImage({
image: { data: base64, mimeType: imageFile.type },
prompt: "Describe this food image in detail...",
maxTokens: 512,
});
// visionResult.content === ""
// visionResult.message === ""
Model selected by getDefaultVisionModel() for cloudflare: @cf/meta/llama-3.2-11b-vision-instruct
Root cause hypothesis
attachImagesToLastUserMessage() (cloudflare.ts) formats the image as a { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } } content part in the messages array.
Workers AI's llama-3.2-11b-vision-instruct appears to return { choices: [{ message: { content: null } }] } for this format when called via the Workers binding (not the REST API). The extractText() null-content path returns "".
The Workers AI binding expects the raw format for vision:
ai.run('@cf/meta/llama-3.2-11b-vision-instruct', {
image: [...], // number[] / Uint8Array
prompt: "...",
max_tokens: 512,
})
// → { response: "description text" }
The REST API and the binding have different input shapes for this model. The provider only implements the messages/image_url path.
Evidence
- Direct external call to
tarotscript-worker.blue-pine-edf6.workers.dev/v2/recipes/analyze with a hand-crafted image_analysis string succeeds — confirming downstream is fine.
- Tarotscript returned
400 image_analysis is required when the llm-providers vision step was allowed to pass through its empty string, confirming visionResult.content and visionResult.message are both "".
extractText() handles chatContent === null by returning "" (cloudflare.ts ~line 562) — no error thrown, silent failure.
Suggested fix
For Workers AI vision models, use the raw binding format:
ai.run(model, {
image: Array.from(imageBytes), // number[]
prompt: request.messages[lastUserIdx].textContent,
max_tokens: request.maxTokens,
})
and map the { response: string } return back through formatResponse. The messages/image_url path can remain for REST API consumers.
Alternatively, detect at runtime whether the AI binding is present and branch the format accordingly.
Version
@stackbilt/llm-providers ^1.5.0 (foodfiles dependency)
Summary
LLMProviders.analyzeImage()with the cloudflare provider returns{ content: "", message: "" }when called from a Cloudflare Worker binding. The vision step completes without throwing, but produces no usable text, silently breaking downstream consumers.Reproduction
Context: foodfiles
POST /v2/recipes/analyze(apps/api/src/routes/recipes.ts)Model selected by
getDefaultVisionModel()for cloudflare:@cf/meta/llama-3.2-11b-vision-instructRoot cause hypothesis
attachImagesToLastUserMessage()(cloudflare.ts) formats the image as a{ type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } }content part in the messages array.Workers AI's
llama-3.2-11b-vision-instructappears to return{ choices: [{ message: { content: null } }] }for this format when called via the Workers binding (not the REST API). TheextractText()null-content path returns"".The Workers AI binding expects the raw format for vision:
The REST API and the binding have different input shapes for this model. The provider only implements the messages/image_url path.
Evidence
tarotscript-worker.blue-pine-edf6.workers.dev/v2/recipes/analyzewith a hand-craftedimage_analysisstring succeeds — confirming downstream is fine.400 image_analysis is requiredwhen the llm-providers vision step was allowed to pass through its empty string, confirmingvisionResult.contentandvisionResult.messageare both"".extractText()handleschatContent === nullby returning""(cloudflare.ts ~line 562) — no error thrown, silent failure.Suggested fix
For Workers AI vision models, use the raw binding format:
and map the
{ response: string }return back throughformatResponse. The messages/image_url path can remain for REST API consumers.Alternatively, detect at runtime whether the AI binding is present and branch the format accordingly.
Version
@stackbilt/llm-providers^1.5.0 (foodfiles dependency)