Skip to content

Cloudflare provider analyzeImage() returns empty string for Workers AI vision models #53

@stackbilt-admin

Description

@stackbilt-admin

Summary

LLMProviders.analyzeImage() with the cloudflare provider returns { content: "", message: "" } when called from a Cloudflare Worker binding. The vision step completes without throwing, but produces no usable text, silently breaking downstream consumers.

Reproduction

Context: foodfiles POST /v2/recipes/analyze (apps/api/src/routes/recipes.ts)

const llm = new LLMProviders({
  cloudflare: { ai: c.env.AI },
  defaultProvider: "cloudflare",
  costOptimization: true,
  enableCircuitBreaker: true,
});
const visionResult = await llm.analyzeImage({
  image: { data: base64, mimeType: imageFile.type },
  prompt: "Describe this food image in detail...",
  maxTokens: 512,
});
// visionResult.content === ""
// visionResult.message === ""

Model selected by getDefaultVisionModel() for cloudflare: @cf/meta/llama-3.2-11b-vision-instruct

Root cause hypothesis

attachImagesToLastUserMessage() (cloudflare.ts) formats the image as a { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } } content part in the messages array.

Workers AI's llama-3.2-11b-vision-instruct appears to return { choices: [{ message: { content: null } }] } for this format when called via the Workers binding (not the REST API). The extractText() null-content path returns "".

The Workers AI binding expects the raw format for vision:

ai.run('@cf/meta/llama-3.2-11b-vision-instruct', {
  image: [...], // number[] / Uint8Array
  prompt: "...",
  max_tokens: 512,
})
// → { response: "description text" }

The REST API and the binding have different input shapes for this model. The provider only implements the messages/image_url path.

Evidence

  • Direct external call to tarotscript-worker.blue-pine-edf6.workers.dev/v2/recipes/analyze with a hand-crafted image_analysis string succeeds — confirming downstream is fine.
  • Tarotscript returned 400 image_analysis is required when the llm-providers vision step was allowed to pass through its empty string, confirming visionResult.content and visionResult.message are both "".
  • extractText() handles chatContent === null by returning "" (cloudflare.ts ~line 562) — no error thrown, silent failure.

Suggested fix

For Workers AI vision models, use the raw binding format:

ai.run(model, {
  image: Array.from(imageBytes), // number[]
  prompt: request.messages[lastUserIdx].textContent,
  max_tokens: request.maxTokens,
})

and map the { response: string } return back through formatResponse. The messages/image_url path can remain for REST API consumers.

Alternatively, detect at runtime whether the AI binding is present and branch the format accordingly.

Version

@stackbilt/llm-providers ^1.5.0 (foodfiles dependency)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions