fix (telemetry): serialize UInt8Arrays as base64 for inner telemetry spans by dinmukhamedm · Pull Request #6357 · vercel/ai

dinmukhamedm · 2025-05-16T12:24:00Z

Background

generateObject, generateText, streamText, and streamObject currently call JSON.stringify on the input messages. If the input messages contain an image, it is most likely normalized into a Uint8Array.

JSON.stringify does not the most obvious things to TypedArrays including Uint8Array.

// this returns '{"0": 1,"1": 2,"2": 3}', where I'd expect this to be '[1,2,3]'
JSON.stringify(new Uint8array([1, 2, 3]))

In practice, this results in bloating images by about 5-15x depending on the original image size. For Laminar, for example, a span with 3 avg sized images will not be able to be sent as it is larger than the (reasonably high) gRPC payload size for our traces endpoint.

From MDN docs:

// TypedArray
JSON.stringify([new Int8Array([1]), new Int16Array([1]), new Int32Array([1])]);
// '[{"0":1},{"0":1},{"0":1}]'
JSON.stringify([
  new Uint8Array([1]),
  new Uint8ClampedArray([1]),
  new Uint16Array([1]),
  new Uint32Array([1]),
]);
// '[{"0":1},{"0":1},{"0":1},{"0":1}]'
JSON.stringify([new Float32Array([1]), new Float64Array([1])]);
// '[{"0":1},{"0":1}]'

Summary

Added a function that maps over messages in a LanguageModelV1Prompt and maps over content parts in each message, replacing UInt8Arrays with raw base64 strings instead.

Call this function when calling recordSpan for the inner (doStream/doGenerate) span in generateObject, generateText, streamText, and streamObject.

Verification

Ran this small script against a local instance of Laminar and logged the Telemetry payloads (span attributes) on the backend to verify that they are indeed base64.

import { Laminar, getTracer } from '@lmnr-ai/lmnr'

Laminar.initialize();

import { openai } from '@ai-sdk/openai'
import { generateText, generateObject, streamText, streamObject, tool } from "ai";
import { z } from "zod";
import dotenv from "dotenv";

dotenv.config();

const handle = async () => {
  const imageUrl = "https://upload.wikimedia.org/wikipedia/commons/b/bc/CoinEx.png"
  const imageData = await fetch(imageUrl)
    .then(response => response.arrayBuffer())
    .then(buffer => Buffer.from(buffer).toString('base64'));

  const o = streamObject({
    schema: z.object({
      text: z.string(),
      companyName: z.string().optional().nullable(),
    }),
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Describe this image briefly"
          },
          {
            type: "image",
            image: imageData,
            mimeType: "image/png"
          }
        ]
      }
    ],
    model: openai("gpt-4.1-nano"),
    experimental_telemetry: {
      isEnabled: true,
      tracer: getTracer()
    }
  });

  for await (const chunk of o.fullStream) {
    console.log(chunk);
  }
  await Laminar.shutdown();
};

handle().then((r) => {
    console.log(r);
});

Tasks

Tests have been added / updated (for bug fixes / features)
Documentation has been added / updated (for bug fixes / features)
- telemetry is experimental, so I reckon a doc update for this small fix is not required
A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
Formatting issues have been fixed (run pnpm prettier-fix in the project root)

Future Work

Related Issues

Fixes #6210

dinmukhamedm · 2025-05-16T12:31:23Z

packages/ai/core/prompt/stringify-for-telemetry.ts

+      ...part,
+      image:
+        part.image instanceof Uint8Array
+          ? convertDataContentToBase64String(part.image)


This will keep the raw base64 data, e.g.
IGcfqljA=

And NOT data:image/png;base64,IGcfqljA=

I am open to suggestions on this, but I decided not to add the data URL scheme (RFC 2397), because:

mimeType is optional on the ImagePart, and not always known

current Uint8Array is raw data as well and is not aware of mimeTypes

Telemetry backends can use the mimeType field, and there is lots of other heuristics to infer that this is base64

…spans (vercel#6357) ## Background `generateObject`, `generateText`, `streamText`, and `streamObject` currently call `JSON.stringify` on the input messages. If the input messages contain an image, it is most likely normalized into a `Uint8Array`. `JSON.stringify` does not the most obvious things to TypedArrays including `Uint8Array`. ```javascript // this returns '{"0": 1,"1": 2,"2": 3}', where I'd expect this to be '[1,2,3]' JSON.stringify(new Uint8array([1, 2, 3])) ``` In practice, this results in bloating images by about 5-15x depending on the original image size. For Laminar, for example, a span with 3 avg sized images will not be able to be sent as it is larger than the (reasonably high) gRPC payload size for our traces endpoint. From [MDN docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#examples): ```javascript // TypedArray JSON.stringify([new Int8Array([1]), new Int16Array([1]), new Int32Array([1])]); // '[{"0":1},{"0":1},{"0":1}]' JSON.stringify([ new Uint8Array([1]), new Uint8ClampedArray([1]), new Uint16Array([1]), new Uint32Array([1]), ]); // '[{"0":1},{"0":1},{"0":1},{"0":1}]' JSON.stringify([new Float32Array([1]), new Float64Array([1])]); // '[{"0":1},{"0":1}]' ``` ## Summary Added a function that maps over messages in a `LanguageModelV1Prompt` and maps over content parts in each message, replacing `UInt8Array`s with raw base64 strings instead. Call this function when calling `recordSpan` for the inner (doStream/doGenerate) span in `generateObject`, `generateText`, `streamText`, and `streamObject`. ## Verification Ran this small script against a local instance of Laminar and logged the Telemetry payloads (span attributes) on the backend to verify that they are indeed base64. ```javascript import { Laminar, getTracer } from '@lmnr-ai/lmnr' Laminar.initialize(); import { openai } from '@ai-sdk/openai' import { generateText, generateObject, streamText, streamObject, tool } from "ai"; import { z } from "zod"; import dotenv from "dotenv"; dotenv.config(); const handle = async () => { const imageUrl = "https://upload.wikimedia.org/wikipedia/commons/b/bc/CoinEx.png" const imageData = await fetch(imageUrl) .then(response => response.arrayBuffer()) .then(buffer => Buffer.from(buffer).toString('base64')); const o = streamObject({ schema: z.object({ text: z.string(), companyName: z.string().optional().nullable(), }), messages: [ { role: "user", content: [ { type: "text", text: "Describe this image briefly" }, { type: "image", image: imageData, mimeType: "image/png" } ] } ], model: openai("gpt-4.1-nano"), experimental_telemetry: { isEnabled: true, tracer: getTracer() } }); for await (const chunk of o.fullStream) { console.log(chunk); } await Laminar.shutdown(); }; handle().then((r) => { console.log(r); }); ``` ## Related Issues Fixes vercel#6210

…spans (vercel#6357) `generateObject`, `generateText`, `streamText`, and `streamObject` currently call `JSON.stringify` on the input messages. If the input messages contain an image, it is most likely normalized into a `Uint8Array`. `JSON.stringify` does not the most obvious things to TypedArrays including `Uint8Array`. ```javascript // this returns '{"0": 1,"1": 2,"2": 3}', where I'd expect this to be '[1,2,3]' JSON.stringify(new Uint8array([1, 2, 3])) ``` In practice, this results in bloating images by about 5-15x depending on the original image size. For Laminar, for example, a span with 3 avg sized images will not be able to be sent as it is larger than the (reasonably high) gRPC payload size for our traces endpoint. From [MDN docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#examples): ```javascript // TypedArray JSON.stringify([new Int8Array([1]), new Int16Array([1]), new Int32Array([1])]); // '[{"0":1},{"0":1},{"0":1}]' JSON.stringify([ new Uint8Array([1]), new Uint8ClampedArray([1]), new Uint16Array([1]), new Uint32Array([1]), ]); // '[{"0":1},{"0":1},{"0":1},{"0":1}]' JSON.stringify([new Float32Array([1]), new Float64Array([1])]); // '[{"0":1},{"0":1}]' ``` Added a function that maps over messages in a `LanguageModelV1Prompt` and maps over content parts in each message, replacing `UInt8Array`s with raw base64 strings instead. Call this function when calling `recordSpan` for the inner (doStream/doGenerate) span in `generateObject`, `generateText`, `streamText`, and `streamObject`. Ran this small script against a local instance of Laminar and logged the Telemetry payloads (span attributes) on the backend to verify that they are indeed base64. ```javascript import { Laminar, getTracer } from '@lmnr-ai/lmnr' Laminar.initialize(); import { openai } from '@ai-sdk/openai' import { generateText, generateObject, streamText, streamObject, tool } from "ai"; import { z } from "zod"; import dotenv from "dotenv"; dotenv.config(); const handle = async () => { const imageUrl = "https://upload.wikimedia.org/wikipedia/commons/b/bc/CoinEx.png" const imageData = await fetch(imageUrl) .then(response => response.arrayBuffer()) .then(buffer => Buffer.from(buffer).toString('base64')); const o = streamObject({ schema: z.object({ text: z.string(), companyName: z.string().optional().nullable(), }), messages: [ { role: "user", content: [ { type: "text", text: "Describe this image briefly" }, { type: "image", image: imageData, mimeType: "image/png" } ] } ], model: openai("gpt-4.1-nano"), experimental_telemetry: { isEnabled: true, tracer: getTracer() } }); for await (const chunk of o.fullStream) { console.log(chunk); } await Laminar.shutdown(); }; handle().then((r) => { console.log(r); }); ``` Fixes vercel#6210

serialize UInt8Arrays as base64 for inner telemetry spans

9aebb17

dinmukhamedm commented May 16, 2025

View reviewed changes

dinmukhamedm marked this pull request as ready for review May 16, 2025 12:37

lgrammel changed the title ~~serialize UInt8Arrays as base64 for inner telemetry spans~~ fix (telemetry): serialize UInt8Arrays as base64 for inner telemetry spans May 17, 2025

lgrammel approved these changes May 17, 2025

View reviewed changes

lgrammel merged commit ed0ebeb into vercel:main May 17, 2025
8 of 9 checks passed

dinmukhamedm mentioned this pull request May 17, 2025

v5-mirror: serialize UInt8Arrays as base64 for inner telemetry spans #6377

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix (telemetry): serialize UInt8Arrays as base64 for inner telemetry spans#6357

fix (telemetry): serialize UInt8Arrays as base64 for inner telemetry spans#6357
lgrammel merged 1 commit intovercel:mainfrom
lmnr-ai:fix/json-stringify-image

dinmukhamedm commented May 16, 2025 •

edited

Loading

Uh oh!

dinmukhamedm May 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

dinmukhamedm commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Summary

Verification

Tasks

Future Work

Related Issues

Uh oh!

dinmukhamedm May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dinmukhamedm commented May 16, 2025 •

edited

Loading

dinmukhamedm May 16, 2025 •

edited

Loading