Skip to content

Tts#97

Closed
AnthonyRonning wants to merge 7 commits intomasterfrom
tts
Closed

Tts#97
AnthonyRonning wants to merge 7 commits intomasterfrom
tts

Conversation

@AnthonyRonning
Copy link
Copy Markdown
Contributor

@AnthonyRonning AnthonyRonning commented Sep 13, 2025

Summary by CodeRabbit

  • New Features

    • Public TTS (/v1/audio/speech) and Transcription (/v1/audio/transcriptions) APIs with streaming audio and multipart upload support.
  • Improvements

    • Multi-provider transcription with retries/fallbacks, guest access blocked for audio APIs, enhanced logging/tracing, and canonical Whisper Large V3 routing; default model selection for transcription.
  • New Utility

    • Audio splitting/merging for large WAV uploads enabling parallel chunked transcription.
  • Chores

    • Appended new PCR history entries to dev and prod.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 13, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Rust handlers for TTS and transcription with multipart/base64 audio support, a WAV-aware AudioSplitter and transcription merge utilities, multi-provider transcription retries using a HyperBody HTTP client, model mapping for Whisper Large V3, Go tinfoil-proxy TTS/transcription endpoints and rotation logic, and appended PCR history entries.

Changes

Cohort / File(s) Summary
Rust: OpenAI web TTS & Transcription
src/web/openai.rs
Adds proxy_tts and proxy_transcription handlers (POST /v1/audio/speech, /v1/audio/transcriptions), send_transcription_request, send_transcription_with_retries; updates try_provider to use HyperBody; forwards selected headers; implements multi-provider retries, chunked audio processing, base64 encode/decode, encryption pipeline, and detailed logging.
Rust: Audio utilities
src/web/audio_utils.rs, src/web/mod.rs
New audio_utils module and export; adds AudioSplitter (WAV-aware split_wav), AudioChunk, TranscriptionSegment, MergedTranscription, merge_transcriptions, defaults, and unit tests; used to split/merge audio for transcription flows.
Rust: model routing updates
src/proxy_config.rs
Adds Whisper Large V3 constants and updates MODEL_EQUIVALENCIES to map canonical Whisper Large V3 to provider-specific model names for routing/translation.
Go: Tinfoil proxy TTS & Transcription
tinfoil-proxy/main.go
Adds TTSRequest/TranscriptionResponse, handlers handleTTS and handleTranscription, endpoints /v1/audio/speech and /v1/audio/transcriptions; streaming TTS (Kokoro/voice/format mapping), multipart transcription (defaults to whisper-large-v3*), client rotation/reinit on cert errors, logging, and TODO billing hook.
Configs / Data
pcrDevHistory.json, pcrProdHistory.json
Appends new PCR history entries (PCR0/PCR1/PCR2, timestamp, signature) and fixes JSON punctuation in pcrProdHistory.json.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Web as Rust Web (/v1/audio/...)
  participant AudioUtil as Audio Utils
  participant Proxy as Tinfoil Proxy
  participant Provider as Upstream Provider

  Client->>Web: POST /v1/audio/speech or /v1/audio/transcriptions
  Web->>Web: validate session/user, reject guests, forward selected headers
  alt transcription (large audio)
    Web->>AudioUtil: split_audio (WAV-aware)
    AudioUtil-->>Web: chunks
    loop per chunk (parallel)
      Web->>Proxy: POST /v1/audio/transcriptions (HyperBody)
      Proxy->>Provider: call provider SDK/API
      Provider-->>Proxy: transcription JSON
      Proxy-->>Web: JSON chunk result
    end
    Web->>Web: merge_transcriptions, encrypt, respond
  else tts
    Web->>Proxy: POST /v1/audio/speech (HyperBody)
    Proxy->>Provider: TTS stream (Kokoro)
    Provider-->>Proxy: streamed audio
    Proxy-->>Web: streamed audio
    Web->>Web: base64 encode, map format→MIME, encrypt, respond
  end
Loading
sequenceDiagram
  autonumber
  participant Web as Rust Web
  participant Upstream as Any Provider

  Note over Web: HTTP client body type changed to Client<..., HyperBody>
  Web->>Upstream: Request(Body = HyperBody)
  Upstream-->>Web: Response(Body = HyperBody)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Tinfoil proxy start #36 — Adds/updates the tinfoil-proxy Go service and endpoints (/v1/audio/speech, /v1/audio/transcriptions), directly matching the new proxied endpoints.
  • Feature/api key auth #94 — Modifies src/web/openai.rs and OpenAI web/auth flows; overlaps with added handlers, header forwarding, and client changes.
  • Append services to apple external oauth #32 — Appends PCR history entries; touches the same pcrDevHistory.json/pcrProdHistory.json files.

Poem

I nibble bytes beneath the moon,
I hum to Kokoro’s gentle tune. 🐇
Chunks stitched tight, transcripts bloom,
Headers forwarded, proxies zoom.
Encrypted carrots, routes take flight — hooray, code-night delight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Tts" is directly related to a real and central change in the changeset (addition of text‑to‑speech endpoints and TTS plumbing), so it is not misleading; however it is extremely terse and omits other significant additions (transcription endpoints, audio utilities, and proxy updates), which reduces clarity for someone scanning PR history.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tts

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces Text-to-Speech (TTS) functionality to the OpenSecret platform by adding a new /v1/audio/speech endpoint that follows OpenAI's API specification. The implementation spans two key components:

Rust Backend (opensecret):

  • Adds a new /v1/audio/speech route that mirrors the existing OpenAI proxy pattern
  • Implements proxy_tts function with similar authentication, validation, and encryption patterns as proxy_openai
  • Handles TTS request parameters (input text, model, voice, response format, speed) with validation
  • Returns base64-encoded audio data wrapped in encrypted responses
  • Introduces a type alias HyperBody to resolve naming conflicts with the hyper crate's Body type

Go Proxy Service (tinfoil-proxy):

  • Adds corresponding /v1/audio/speech endpoint with handleTTS function
  • Hardcodes 'kokoro' as the TTS model and provides voice mapping with 'af_sky' as default
  • Implements efficient audio streaming using a 4KB buffer to handle potentially large audio files
  • Follows the same error handling and certificate recovery patterns as existing chat endpoints

The implementation integrates seamlessly with the existing proxy infrastructure, maintaining consistency with authentication checks, billing validation, and the encrypt/decrypt middleware pattern. The choice to hardcode the 'kokoro' model suggests this is an initial implementation that establishes the foundation for TTS capabilities while leveraging the existing Tinfoil service's audio synthesis capabilities.

Confidence score: 4/5

  • This PR introduces new functionality with well-structured code that follows existing patterns, presenting moderate risk due to the complexity of audio processing
  • Score reflects solid implementation following established patterns, but audio processing and streaming introduces additional complexity that warrants careful testing
  • Pay close attention to the audio streaming logic in tinfoil-proxy/main.go and response format handling in the Rust implementation

2 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
tinfoil-proxy/main.go (4)

117-124: Trim unused field; clarify model usage.

  • StreamFormat is unused.
  • Model is accepted but only Kokoro is supported; either validate or document.

Apply this diff to drop the unused field:

 type TTSRequest struct {
   Model          string  `json:"model"`
   Input          string  `json:"input"`
   Voice          string  `json:"voice"`
   ResponseFormat string  `json:"response_format,omitempty"`
   Speed          float32 `json:"speed,omitempty"`
-  StreamFormat   string  `json:"stream_format,omitempty"`
 }

631-634: Add basic validation for model and input.

Reject empty input and non-Kokoro models early.

Apply this diff:

   if req.Speed == 0 {
     req.Speed = 1.0
   }
 
+  // Validate model and input
+  if strings.TrimSpace(req.Input) == "" {
+    c.JSON(http.StatusBadRequest, gin.H{"error": "input text is required"})
+    return
+  }
+  if req.Model != "" && req.Model != "kokoro" {
+    c.JSON(http.StatusBadRequest, gin.H{"error": "unsupported TTS model; only 'kokoro' is supported"})
+    return
+  }

699-704: Disable reverse-proxy buffering for smoother streaming.

Apply this diff:

   // Stream the audio response back to client
   c.Header("Content-Type", contentType)
+  c.Header("X-Accel-Buffering", "no")
   c.Status(http.StatusOK)

703-720: Simplify streaming with io.CopyBuffer.

Fewer syscalls and simpler code; chunked transfer remains intact.

Apply this diff:

-  // Copy the response body to the client
-  buffer := make([]byte, 4096)
-  for {
-    n, err := response.Body.Read(buffer)
-    if n > 0 {
-      if _, writeErr := c.Writer.Write(buffer[:n]); writeErr != nil {
-        log.Printf("Error writing TTS response: %v", writeErr)
-        return
-      }
-      c.Writer.Flush()
-    }
-    if err != nil {
-      if err != io.EOF {
-        log.Printf("Error reading TTS response: %v", err)
-      }
-      break
-    }
-  }
+  // Copy the response body to the client
+  if _, err := io.CopyBuffer(c.Writer, response.Body, make([]byte, 32*1024)); err != nil && err != io.EOF {
+    log.Printf("Error streaming TTS response: %v", err)
+  }
src/web/openai.rs (2)

697-701: Add a request-level timeout to avoid hangs on slow TTS.

Wrap client.request(...) in tokio::time::timeout and map elapsed to 504/500.

Example:

let res = tokio::time::timeout(Duration::from_secs(30), client.request(req))
    .await
    .map_err(|_| {
        error!("TTS request timed out");
        ApiError::InternalServerError
    })?
    .map_err(|e| {
        error!("Failed to send TTS request: {:?}", e);
        ApiError::InternalServerError
    })?;

673-679: Reuse a shared Hyper client to reduce per-request overhead.

Consider storing a Client<HttpsConnector<_>, HyperBody> in AppState for reuse.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a08a08 and 76fc3f5.

⛔ Files ignored due to path filters (1)
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (2)
  • src/web/openai.rs (5 hunks)
  • tinfoil-proxy/main.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (3)
src/proxy_config.rs (2)
  • Client (232-234)
  • new (150-184)
src/web/health_routes.rs (1)
  • Client (60-62)
src/web/encryption_middleware.rs (1)
  • encrypt_response (91-105)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (6)
tinfoil-proxy/main.go (2)

6-6: LGTM: io import is required for streaming.


782-786: LGTM: TTS route registration looks correct.

src/web/openai.rs (4)

23-23: LGTM: HyperBody aliasing aligns types across client/builders.


37-37: LGTM: New /v1/audio/speech route is correctly wired.


126-126: LGTM: Client builder updated to HyperBody.


735-739: LGTM: try_provider signature/body updates for HyperBody are consistent.

Also applies to: 771-771

Comment thread src/web/openai.rs
Comment on lines +614 to +1066
let input = body_obj
.get("input")
.and_then(|i| i.as_str())
.ok_or_else(|| {
error!("Input text not specified in request");
ApiError::BadRequest
})?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reject blank input early.

Prevents proxying empty text to TTS.

Apply this diff:

-    let input = body_obj
-        .get("input")
-        .and_then(|i| i.as_str())
-        .ok_or_else(|| {
-            error!("Input text not specified in request");
-            ApiError::BadRequest
-        })?;
+    let input = body_obj
+        .get("input")
+        .and_then(|i| i.as_str())
+        .filter(|s| !s.trim().is_empty())
+        .ok_or_else(|| {
+            error!("Input text not specified or empty in request");
+            ApiError::BadRequest
+        })?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let input = body_obj
.get("input")
.and_then(|i| i.as_str())
.ok_or_else(|| {
error!("Input text not specified in request");
ApiError::BadRequest
})?;
let input = body_obj
.get("input")
.and_then(|i| i.as_str())
.filter(|s| !s.trim().is_empty())
.ok_or_else(|| {
error!("Input text not specified or empty in request");
ApiError::BadRequest
})?;
🤖 Prompt for AI Agents
In src/web/openai.rs around lines 614 to 621, the handler accepts an "input"
field but does not reject empty or whitespace-only strings; update the
extraction to reject blank input early by filtering the str to ensure it's
non-empty after trimming (e.g., use as_str().filter(|s| !s.trim().is_empty()) or
equivalent) and keep the same error branch (log "Input text not specified in
request" and return ApiError::BadRequest) when the check fails.

Comment thread src/web/openai.rs
Comment on lines +702 to +1157
if !res.status().is_success() {
error!("TTS proxy returned non-success status: {}", res.status());
return Err(ApiError::InternalServerError);
}

// Get response body as bytes
let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
error!("Failed to read TTS response body: {:?}", e);
ApiError::InternalServerError
})?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Honor upstream Content-Type and align PCM mapping with proxy.

Avoids MIME mismatches and leverages the proxy’s declared type.

Apply this diff:

-    // Get response body as bytes
+    // Get upstream content-type, then response body as bytes
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
     let body_bytes = to_bytes(res.into_body()).await map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    // Compute content type from upstream header if present; otherwise map from requested format
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav", // aligned with proxy remap
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+
+    // Create a response object with the audio data and metadata
+    let audio_response = json!({
+        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
+        "content_type": content_type,
+    });

Also applies to: 714-725

Comment thread tinfoil-proxy/main.go
Comment on lines +685 to +801
contentType := "audio/mpeg" // default to mp3
switch req.ResponseFormat {
case "opus":
contentType = "audio/opus"
case "aac":
contentType = "audio/aac"
case "flac":
contentType = "audio/flac"
case "wav":
contentType = "audio/wav"
case "pcm":
contentType = "audio/pcm"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

PCM mismatch: advertised MIME doesn't match requested upstream format.

You remap PCM to WAV upstream but return Content-Type: audio/pcm, which can break clients.

Apply this diff:

 case "pcm":
-    contentType = "audio/pcm"
+    // We remap PCM to WAV upstream; advertise as audio/wav to match bytes
+    contentType = "audio/wav"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
contentType := "audio/mpeg" // default to mp3
switch req.ResponseFormat {
case "opus":
contentType = "audio/opus"
case "aac":
contentType = "audio/aac"
case "flac":
contentType = "audio/flac"
case "wav":
contentType = "audio/wav"
case "pcm":
contentType = "audio/pcm"
}
contentType := "audio/mpeg" // default to mp3
switch req.ResponseFormat {
case "opus":
contentType = "audio/opus"
case "aac":
contentType = "audio/aac"
case "flac":
contentType = "audio/flac"
case "wav":
contentType = "audio/wav"
case "pcm":
// We remap PCM to WAV upstream; advertise as audio/wav to match bytes
contentType = "audio/wav"
}
🤖 Prompt for AI Agents
In tinfoil-proxy/main.go around lines 685 to 697, the code advertises
Content-Type: audio/pcm while the upstream request remaps PCM to WAV, causing a
MIME mismatch; update the mapping so that when req.ResponseFormat == "pcm" the
Content-Type is set to "audio/wav" (or change the upstream mapping to truly send
PCM) so the advertised MIME matches the actual upstream format, and ensure any
related comments or tests reflect this behavior.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The recent changes focus solely on updating Platform Configuration Register (PCR) values across development and production environments to reflect the deployment of the TTS (Text-to-Speech) feature.

The changes update PCR measurements in four files: pcrDev.json, pcrProd.json, pcrDevHistory.json, and pcrProdHistory.json. PCR values are cryptographic measurements used by AWS Nitro Enclaves for attestation and integrity verification. When new functionality like the TTS API is added to the enclave, these measurements change and must be updated to maintain the secure computing environment's chain of trust.

The development environment updates (pcrDev.json and pcrDevHistory.json) follow the documented PCR verification workflow correctly - new measurements are signed and appended to the history file for auditability. However, the production environment updates show a concerning pattern: while pcrProd.json has been updated with new PCR values, the corresponding pcrProdHistory.json entry exists but the workflow appears incomplete.

These PCR updates integrate with the existing secure enclave infrastructure, ensuring that clients can verify they're communicating with the correct, unmodified version of the service that includes the new TTS functionality. The changes maintain the append-only audit trail required for the PCR verification system documented in the codebase.

Confidence score: 3/5

  • This PR has mixed safety levels - development PCR updates are safe, but production changes raise concerns about incomplete workflow adherence
  • Score reflects proper development environment handling but questionable production deployment process that may not fully follow documented security procedures
  • Pay close attention to production PCR files and verify the complete PCR signing workflow was followed

4 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment thread pcrProd.json Outdated
Comment on lines +3 to +5
"PCR0": "8f016105d59f721b04728d3813532d207c618d0401feb23768e81fd4d211b62fdb9ea93a5031c3141f3778430b606d37",
"PCR1": "f004075c672258b499f8e88d59701031a3b451f65c7de60c81d09da2b0799272675481ec390527594dd7069cb7de59d7",
"PCR2": "87570d06f6385dfc640b2dd4bd053f725e8848352834b9a5fd55c13bab30eb8a3d6e35567b9e374137a03590ca83fef4"
"PCR2": "62d47e47734fdd9818c5865098256366734d325de75743c6a231cfc8cec2d675c2fcd26fe3320acdce553f563689779d"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: PCR values updated but no corresponding entry added to pcrProdHistory.json. According to the PCR verification documentation, new PCR measurements should be signed and appended to the history file using just append-pcr-prod to maintain audit trail and enable frontend verification.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76fc3f5 and 9544f5b.

⛔ Files ignored due to path filters (2)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
📒 Files selected for processing (2)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
🔇 Additional comments (2)
pcrProdHistory.json (2)

309-316: Prod entry appended cleanly; formatting and ordering LGTM.


309-316: Mirror dev validation on prod: run the same sanity checks against pcrProdHistory.json

Previous verification failed to execute in CI — run the corrected script below on the prod host (bash, jq installed) to validate timestamps and PCR fields.

#!/usr/bin/env bash
set -euo pipefail

file="pcrProdHistory.json"

command -v jq >/dev/null 2>&1 || { echo "ERROR: jq not installed"; exit 1; }

# 1) JSON validity
if ! jq -e . "$file" >/dev/null 2>&1; then
  echo "INVALID_JSON"
  jq . "$file" || true
  exit 1
fi

# 2) timestamps strictly increasing
mapfile -t timestamps < <(jq -r '.[].timestamp // empty' "$file")
if [ "${#timestamps[@]}" -eq 0 ]; then
  echo "NO_TIMESTAMPS"
  exit 1
fi
prev=""
for t in "${timestamps[@]}"; do
  if [ -n "$prev" ] && [ "$t" -le "$prev" ]; then
    echo "TIMESTAMPS_NOT_STRICTLY_INCREASING"
    jq -r 'to_entries[] | "\(.key)\t\(.value.timestamp)"' "$file"
    exit 1
  fi
  prev=$t
done

# 3) PCR hex / length / even checks
invalids=$(jq -r 'to_entries[]
 | select(
    ((.value.PCR0 // "") | test("^[0-9a-f]+$") | not)
    or (((.value.PCR0 // "") | length) < 64)
    or ((((.value.PCR0 // "") | length) % 2) == 1)
    or ((.value.PCR1 // "") | test("^[0-9a-f]+$") | not)
    or (((.value.PCR1 // "") | length) < 64)
    or ((((.value.PCR1 // "") | length) % 2) == 1)
    or ((.value.PCR2 // "") | test("^[0-9a-f]+$") | not)
    or (((.value.PCR2 // "") | length) < 64)
    or ((((.value.PCR2 // "") | length) % 2) == 1)
 )
 | "\(.key)\t\(.value.PCR0 // \"<missing>\")\t\(.value.PCR1 // \"<missing>\")\t\(.value.PCR2 // \"<missing>\")"' "$file")

if [ -n "$invalids" ]; then
  echo "PCR_FIELD_VALIDATION_FAILED"
  echo "$invalids"
  exit 1
fi

# 4) last 10 PCR1 values must be identical
unique_count=$(jq -r '.[].PCR1 // empty' "$file" | tail -n 10 | sort -u | wc -l | tr -d ' ')
if [ "$unique_count" -ne 1 ]; then
  echo "LAST10_PCR1_NOT_IDENTICAL unique_count=$unique_count"
  jq -r '.[].PCR1' "$file" | tail -n 10
  exit 1
fi

echo "pcrProdHistory.json ✅"

Comment thread pcrDevHistory.json
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR.

This update implements audio transcription functionality by adding a new /v1/audio/transcriptions endpoint to complement the existing TTS capabilities. The implementation includes:

Audio Transcription API: A new endpoint in src/web/openai.rs that accepts base64-encoded audio files and forwards them to configured providers using multipart form data. The implementation follows the established pattern of guest user blocking, primary/fallback provider cycling (up to 3 attempts), and encrypted response handling.

Proxy Configuration Enhancement: Extended the model equivalency system in src/proxy_config.rs to support Whisper models across different providers (Continuum, Tinfoil, and canonical naming), maintaining the same multi-provider architecture used for chat models.

Tinfoil Proxy Integration: Added transcription handling in tinfoil-proxy/main.go with proper multipart form data processing, parameter validation, and OpenAI-compatible response formatting. The implementation includes client reinitialization on certificate errors and follows existing error handling patterns.

The transcription feature integrates seamlessly with the existing audio processing architecture, reusing the same provider routing system, encryption middleware, and authentication mechanisms. The API accepts standard transcription parameters (file, model, language, prompt, response format, temperature) and maintains consistency with OpenAI's transcription API specification.

Confidence score: 3/5

  • This PR introduces new functionality with some implementation gaps that could cause production issues
  • Score reflects incomplete billing integration and potential provider routing concerns in the transcription flow
  • Pay close attention to the transcription implementation in src/web/openai.rs and billing TODO comments

7 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment thread pcrDev.json Outdated
Comment on lines +3 to +5
"PCR0": "111cadaf1cafeaa231268bd146373f003c2f8060bede6e59c6dce8e41f777ab940989a03d0815bdc576bb08f38bea7c4",
"PCR1": "f004075c672258b499f8e88d59701031a3b451f65c7de60c81d09da2b0799272675481ec390527594dd7069cb7de59d7",
"PCR2": "20022cfe0d03c779396c8de4d6ba8db0f6c2e559f5ba5dbbd59b334203b8da55fcb7eaaca183868e9c4405d4af5306ad"
"PCR2": "e6bdce97195b9280302a0001cffacdca83730217558cf4065f5736dbe3603cb658be4f545f5c254c2be7894ef52e89df"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: New PCR measurements need to be signed and appended to pcrDevHistory.json. Run just append-pcr-dev after setting SIGNING_PRIVATE_KEY environment variable to maintain the cryptographic audit trail required for frontend verification.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
tinfoil-proxy/main.go (1)

789-801: PCM MIME mismatch: advertise audio/wav when returning WAV bytes.

You remap "pcm" to WAV upstream but still set Content-Type: audio/pcm. This breaks clients.

Apply:

 case "wav":
     contentType = "audio/wav"
 case "pcm":
-    contentType = "audio/pcm"
+    // We remap PCM to WAV upstream; advertise as audio/wav to match bytes
+    contentType = "audio/wav"
src/web/openai.rs (2)

927-935: Reject blank TTS input (trim).

Apply:

-    let input = body_obj
-        .get("input")
-        .and_then(|i| i.as_str())
-        .ok_or_else(|| {
-            error!("Input text not specified in request");
-            ApiError::BadRequest
-        })?;
+    let input = body_obj
+        .get("input")
+        .and_then(|i| i.as_str())
+        .map(|s| s.trim())
+        .filter(|s| !s.is_empty())
+        .ok_or_else(|| {
+            error!("Input text not specified or empty in request");
+            ApiError::BadRequest
+        })?;

1016-1039: Honor upstream Content-Type and align PCM mapping with proxy.

Use upstream header when available; default mapping should treat "pcm" as "audio/wav" to match the proxy’s remap.

-    // Get response body as bytes
+    // Capture upstream content-type, then get response body
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
     let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    // Prefer upstream type; otherwise map from requested format
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav", // proxy remaps PCM to WAV bytes
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+    // Create a response object with the audio data and metadata
+    let audio_response = json!({
+        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
+        "content_type": content_type,
+    });
🧹 Nitpick comments (3)
src/proxy_config.rs (1)

351-373: Also map canonical name when only Continuum is present.

Canonical→route mapping is added only inside the Tinfoil branch. If Tinfoil isn’t configured, canonical names (e.g., "whisper-large-v3") won’t resolve via model_routes and will fall back to default routing. Prefer mapping canonical→Continuum in the Continuum-only pass as well.

Apply this diff inside the Continuum-models pass to map the canonical name too:

@@
-            for model_name in continuum_model_list {
-                if !model_routes.contains_key(model_name) {
-                    // This is a Continuum-only model
-                    let route = ModelRoute {
-                        primary: self.default_proxy.clone(),
-                        fallbacks: vec![],
-                    };
-                    model_routes.insert(model_name.clone(), route);
-                    debug!("Continuum-only model '{}' has no fallbacks", model_name);
-                }
-            }
+            for model_name in continuum_model_list {
+                if !model_routes.contains_key(model_name) {
+                    // This is a Continuum-only model
+                    let route = ModelRoute {
+                        primary: self.default_proxy.clone(),
+                        fallbacks: vec![],
+                    };
+                    // Map the Continuum name
+                    model_routes.insert(model_name.clone(), route.clone());
+                    // If we know a canonical name for this Continuum model, map it as well
+                    for (canonical_name, provider_names) in &*MODEL_EQUIVALENCIES {
+                        if let Some(continuum_equiv) = provider_names.get("continuum") {
+                            if *continuum_equiv == model_name {
+                                model_routes.insert(canonical_name.to_string(), route);
+                                break;
+                            }
+                        }
+                    }
+                    debug!("Continuum-only model '{}' has no fallbacks", model_name);
+                }
+            }
tinfoil-proxy/main.go (2)

711-716: Reject blank TTS input early.

Avoid proxying empty/whitespace-only text.

 if err := c.ShouldBindJSON(&req); err != nil {
     c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
     return
 }
+if strings.TrimSpace(req.Input) == "" {
+    c.JSON(http.StatusBadRequest, gin.H{"error": "input is required"})
+    return
+}

628-633: Clarify model handling (param is ignored).

You read model but always use "whisper-large-v3-turbo" in params. Either document this behavior or honor the provided model when supported.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9544f5b and 07191ab.

⛔ Files ignored due to path filters (3)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (5)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (4 hunks)
  • src/web/openai.rs (5 hunks)
  • tinfoil-proxy/main.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrDevHistory.json
  • pcrProdHistory.json
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (3)
src/proxy_config.rs (2)
  • Client (243-245)
  • new (161-195)
src/web/health_routes.rs (1)
  • Client (60-62)
src/web/encryption_middleware.rs (1)
  • encrypt_response (91-105)
🪛 GitHub Actions: Rust CI
src/web/openai.rs

[error] 791-791: Command 'cargo clippy --all-targets --all-features -- -D warnings' failed: Clippy error: this function has too many arguments (10/7) in send_transcription_request (src/web/openai.rs:791:1).

🔇 Additional comments (4)
src/proxy_config.rs (2)

21-25: Whisper Large V3 constants — LGTM.

Names look consistent with the rest of the file.


44-49: Whisper Large V3 equivalency mapping — verify provider IDs.

Looks correct. Please double-check that the Tinfoil ID is exactly "whisper-large-v3-turbo" in upstream and that Continuum exposes "openai/whisper-large-v3".

src/web/openai.rs (2)

37-39: New routes wired — LGTM.

Endpoints registered for TTS and transcriptions.


127-128: HyperBody client usage — LGTM.

Consistent with the new client type.

Comment thread src/web/openai.rs
Comment thread tinfoil-proxy/main.go
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The changes focus on completing the Text-to-Speech and Transcription API implementation by adding critical missing functionality and fixing identified issues:

Audio File Upload Validation: The tinfoil-proxy service now includes proper file size validation for transcription uploads, preventing potential abuse from oversized files. A maximum upload size limit (25 MiB) is enforced before reading files into memory.

Function Parameter Refactoring: The send_transcription_request function has been refactored from having 10 parameters (which violated Clippy's lint rules) to using a structured TranscriptionPayload approach, improving code maintainability and readability.

PCR History Maintenance: The PCR (Platform Configuration Register) history files have been properly updated to maintain the cryptographic audit trail required for frontend verification of the new TTS-enabled enclaves. Both development and production PCR measurements are now correctly signed and appended to their respective history files.

Content-Type Alignment: The audio response handling has been improved to properly align MIME types between the proxy service and the main application, ensuring that PCM format requests correctly map to WAV upstream while maintaining consistent Content-Type headers.

Input Validation Enhancement: The TTS endpoint now includes early rejection of empty or whitespace-only input text, preventing unnecessary API calls to downstream providers with invalid data.

These changes integrate with the existing OpenAI proxy infrastructure by following established patterns for authentication, encryption middleware, provider fallback logic, and error handling. The implementation maintains consistency with the existing chat completion endpoints while adapting to the unique requirements of audio processing (multipart form data, binary responses, and different provider APIs).

Confidence score: 4/5

  • This PR is safe to merge with only minor remaining considerations around testing and monitoring
  • Score reflects good implementation of previous feedback with proper error handling, validation, and security measures in place
  • Pay attention to the TODO comments regarding billing/usage tracking implementation for the new audio endpoints

7 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
pcrDevHistory.json (1)

309-315: PCR1 divergence in the most recent window — please verify intent

The appended entry keeps PCR1=f004… while several of the last 10 entries still use PCR1=e45…; the “recent PCR1 consistency” check will fail. Confirm the switch is intentional (new boot policy/firmware) or align recent entries for consistency.

Run:

#!/bin/bash
set -euo pipefail
jq -e '.' pcrDevHistory.json >/dev/null
echo "JSON valid"

# Timestamps strictly ascending
prev=""
jq -r '.[].timestamp' pcrDevHistory.json | while read -r ts; do
  [ -z "${prev:-}" ] || [ "$ts" -gt "$prev" ] || { echo "not ascending: $prev >= $ts"; exit 2; }
  prev="$ts"
done
echo "timestamps ok"

# Recent PCR1 uniformity over last 10
jq -r '.[].PCR1' pcrDevHistory.json | tail -n 10 | sort -u | wc -l

Expected: 1 (uniform). If not uniform, either document the deliberate change in release notes or adjust the divergent entry.

src/web/openai.rs (3)

1017-1041: Honor upstream Content-Type; align PCM to WAV

Use the proxy’s declared MIME type when available and map pcm to audio/wav to match proxy behavior.

-    // Get response body as bytes
-    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
+    // Capture upstream content-type, then read body
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
+    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav",
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+    let audio_response = json!({
+        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
+        "content_type": content_type,
+    });

929-935: Reject blank TTS input early

Trim-check to prevent proxying empty text.

-    let input = body_obj
-        .get("input")
-        .and_then(|i| i.as_str())
-        .ok_or_else(|| {
-            error!("Input text not specified in request");
-            ApiError::BadRequest
-        })?;
+    let input = body_obj
+        .get("input")
+        .and_then(|i| i.as_str())
+        .filter(|s| !s.trim().is_empty())
+        .ok_or_else(|| {
+            error!("Input text not specified or empty in request");
+            ApiError::BadRequest
+        })?;

791-803: Refactor: collapse send_transcription_request args into a payload struct

Reduces complexity and removes clippy allow.

-#[allow(clippy::too_many_arguments)]
-async fn send_transcription_request(
+struct TranscriptionPayload<'a> {
+    model: &'a str,
+    file_bytes: &'a [u8],
+    filename: &'a str,
+    content_type: &'a str,
+    language: Option<&'a str>,
+    prompt: Option<&'a str>,
+    response_format: &'a str,
+    temperature: Option<f64>,
+}
+
+async fn send_transcription_request(
     client: &Client<HttpsConnector<hyper::client::HttpConnector>, HyperBody>,
     provider: &ProxyConfig,
-    model: &str,
-    file_bytes: &[u8],
-    filename: &str,
-    content_type: &str,
-    language: Option<&str>,
-    prompt: Option<&str>,
-    response_format: &str,
-    temperature: Option<f64>,
+    payload: TranscriptionPayload<'_>,
 ) -> Result<Value, String> {

And update body usages accordingly:

-    form_data.extend_from_slice(model.as_bytes());
+    form_data.extend_from_slice(payload.model.as_bytes());
@@
-            filename
+            payload.filename
@@
-    form_data.extend_from_slice(format!("Content-Type: {}\r\n\r\n", content_type).as_bytes());
-    form_data.extend_from_slice(file_bytes);
+    form_data.extend_from_slice(format!("Content-Type: {}\r\n\r\n", payload.content_type).as_bytes());
+    form_data.extend_from_slice(payload.file_bytes);
@@
-    if let Some(lang) = language {
+    if let Some(lang) = payload.language {
@@
-    if let Some(p) = prompt {
+    if let Some(p) = payload.prompt {
@@
-    form_data.extend_from_slice(response_format.as_bytes());
+    form_data.extend_from_slice(payload.response_format.as_bytes());
@@
-    if let Some(temp) = temperature {
+    if let Some(temp) = payload.temperature {

Update call sites (primary):

-        match send_transcription_request(
+        match send_transcription_request(
             &client,
             &route.primary,
-            &primary_model,
-            &file_bytes,
-            filename,
-            content_type,
-            language,
-            prompt,
-            response_format,
-            temperature,
+            TranscriptionPayload {
+                model: &primary_model,
+                file_bytes: &file_bytes,
+                filename,
+                content_type,
+                language,
+                prompt,
+                response_format,
+                temperature,
+            },
         )

And fallback:

-            match send_transcription_request(
+            match send_transcription_request(
                 &client,
                 fallback,
-                &fallback_model,
-                &file_bytes,
-                filename,
-                content_type,
-                language,
-                prompt,
-                response_format,
-                temperature,
+                TranscriptionPayload {
+                    model: &fallback_model,
+                    file_bytes: &file_bytes,
+                    filename,
+                    content_type,
+                    language,
+                    prompt,
+                    response_format,
+                    temperature,
+                },
             )
🧹 Nitpick comments (4)
src/proxy_config.rs (2)

44-49: Map canonical → provider names for Whisper L3; add unit tests

The mapping looks correct. Add assertions similar to llama tests to cover:

  • canonical → tinfoil/continuum
  • tinfoil/continuum → cross-provider and → canonical

Example (add to tests):

+let whisper = MODEL_EQUIVALENCIES.get(CANONICAL_WHISPER_LARGE_V3).unwrap();
+assert_eq!(whisper.get("continuum"), Some(&CONTINUUM_WHISPER_LARGE_V3));
+assert_eq!(whisper.get("tinfoil"), Some(&TINFOIL_WHISPER_LARGE_V3_TURBO));

242-246: Add request-level timeouts to model discovery

Pool timeouts don't cap request duration. Wrap requests with a deadline to avoid hangs on provider incidents.

Apply in fetch_models_from_proxy:

-        let res = client.request(req).await?;
+        let res = tokio::time::timeout(Duration::from_secs(15), client.request(req))
+            .await
+            .map_err(|_| "Fetch models timed out")??;
src/web/openai.rs (2)

662-667: Consider request deadlines for transcription calls

Add a timeout around client.request to avoid indefinite hangs on provider outages.

Example inside send_transcription_request:

-    match client.request(req).await {
+    match tokio::time::timeout(Duration::from_secs(60), client.request(req)).await {
+        Err(_) => Err(format!("Provider {} timed out", provider.provider_name)),
+        Ok(res) => match res {
+            Ok(res) => { /* existing handling */ }
+            Err(e) => Err(format!("Failed to send request to {}: {:?}", provider.provider_name, e)),
+        }
     }

1069-1083: Header forwarding: prefer allowlist to reduce risk

Forwarding nearly all headers can leak cookies or internal headers. Restrict to a curated set (e.g., x-request-id, x-trace*, user-agent) or specific prefixes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07191ab and cd41fae.

⛔ Files ignored due to path filters (3)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (5)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (4 hunks)
  • src/web/openai.rs (5 hunks)
  • tinfoil-proxy/main.go (24 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrProdHistory.json
  • tinfoil-proxy/main.go
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (2)
src/proxy_config.rs (2)
  • Client (243-245)
  • new (161-195)
src/web/encryption_middleware.rs (1)
  • encrypt_response (91-105)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (3)
src/proxy_config.rs (2)

21-25: Whisper Large V3 constants: clear and consistent

Naming aligns with existing patterns and will simplify translations.


351-373: Route aliasing for tinfoil/continuum/canonical is correct

Cloning for provider IDs and moving for canonical key avoids ownership issues and ensures lookups by any name work.

src/web/openai.rs (1)

37-39: New audio routes wired correctly

Endpoints are registered and pass through the decryption middleware.

Comment thread src/web/openai.rs
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR.

The most recent changes focus on implementing a TranscriptionParams struct to address the previous clippy warning about too many function parameters. This change refactors the send_transcription_request function to accept a structured parameter object instead of 10 individual arguments, improving code readability and satisfying Rust's clippy linter.

The TranscriptionParams struct contains all the necessary fields for transcription requests: model, file_bytes, filename, content_type, language, prompt, response_format, and temperature. The function signature has been simplified to take this struct as a parameter, and the function body has been updated to access fields through the struct rather than individual parameters.

All call sites have been updated to construct the new TranscriptionParams struct when calling send_transcription_request, both for primary and fallback provider attempts. This change maintains the exact same functionality while improving code organization and eliminating the clippy warning about excessive function parameters.

The refactoring follows Rust best practices for handling complex parameter sets and integrates cleanly with the existing audio processing infrastructure that was added in this PR, including the audio chunking, transcription merging, and retry logic with fallback providers.

Confidence score: 4/5

  • This change is safe to merge with minimal risk as it's a straightforward parameter restructuring
  • Score reflects the clean refactoring that addresses clippy warnings without changing functionality
  • Pay close attention to the PCR history files which still need proper cryptographic signing

7 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment thread src/web/audio_utils.rs
Comment on lines +119 to +131
while pos < audio_data.len() - 8 {
let chunk_id = &audio_data[pos..pos + 4];
let chunk_size = u32::from_le_bytes([
audio_data[pos + 4],
audio_data[pos + 5],
audio_data[pos + 6],
audio_data[pos + 7],
]) as usize;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Potential buffer overflow if chunk_size is maliciously large - should validate chunk_size against remaining buffer length before using

Suggested change
while pos < audio_data.len() - 8 {
let chunk_id = &audio_data[pos..pos + 4];
let chunk_size = u32::from_le_bytes([
audio_data[pos + 4],
audio_data[pos + 5],
audio_data[pos + 6],
audio_data[pos + 7],
]) as usize;
while pos < audio_data.len() - 8 {
let chunk_id = &audio_data[pos..pos + 4];
let chunk_size = u32::from_le_bytes([
audio_data[pos + 4],
audio_data[pos + 5],
audio_data[pos + 6],
audio_data[pos + 7],
]) as usize;
// Validate chunk_size to prevent buffer overflow
if pos + 8 + chunk_size > audio_data.len() {
return Err("Invalid WAV file: chunk extends beyond file boundary".to_string());
}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
pcrDevHistory.json (1)

309-316: Recent PCR1 inconsistency in trailing window — reconcile or justify.

Last 10 entries include two distinct PCR1 values (e45de6… and f00407…), so the “recent PCR1 consistent” check still fails. Either align the outlier (around Line 269–274) with f00407… if that’s the intended baseline, or document why PCR1 legitimately changed so we can relax the check.

src/web/openai.rs (3)

1036-1044: Reject blank TTS input (empty/whitespace).

Prevents proxying empty text.

Apply:

-    let input = body_obj
-        .get("input")
-        .and_then(|i| i.as_str())
-        .ok_or_else(|| {
-            error!("Input text not specified in request");
-            ApiError::BadRequest
-        })?;
+    let input = body_obj
+        .get("input")
+        .and_then(|i| i.as_str())
+        .filter(|s| !s.trim().is_empty())
+        .ok_or_else(|| {
+            error!("Input text not specified or empty in request");
+            ApiError::BadRequest
+        })?;

751-756: Validate decoded audio is non-empty and within size limits.

Avoids empty/oversized payloads causing provider errors or memory pressure.

Apply:

     let file_bytes = general_purpose::STANDARD.decode(file_base64).map_err(|e| {
         error!("Failed to decode base64 audio file: {:?}", e);
         ApiError::BadRequest
     })?;
+    if file_bytes.is_empty() {
+        error!("Decoded audio is empty");
+        return Err(ApiError::BadRequest);
+    }
+    if file_bytes.len() > MAX_AUDIO_SIZE_BYTES {
+        error!("Decoded audio exceeds {} bytes (got {})", MAX_AUDIO_SIZE_BYTES, file_bytes.len());
+        return Err(ApiError::BadRequest);
+    }

Add near the top of this file:

const MAX_AUDIO_SIZE_BYTES: usize = 25 * 1024 * 1024;

1130-1149: Honor upstream Content-Type; map pcm to audio/wav.

Avoid MIME mismatches and defer to proxy header when available.

Apply:

-    // Get response body as bytes
-    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
+    // Get upstream content-type, then response body as bytes
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
+    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    // Compute content type from upstream header if present; otherwise map from requested format
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav", // aligned with proxy remap for PCM
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+
+    // Create a response object with the audio data and metadata
+    let audio_response = json!({
+        "content_base64": general_pype::STANDARD.encode(&body_bytes)
+    });

Note: keep "content_type": content_type in the object; omitted above to focus on changed mapping.

🧹 Nitpick comments (3)
src/web/audio_utils.rs (1)

149-153: Don’t hardcode fmt position; scan chunks to locate “fmt ”.

Some WAVs contain extra chunks before fmt. Consider scanning like you did for “data”.

Example:

// Find fmt chunk first instead of assuming fmt_pos = 12
let mut pos = 12usize;
let fmt_pos = loop {
    if pos + 8 > audio_data.len() { return Err("WAV missing fmt chunk".into()); }
    let id = &audio_data[pos..pos+4];
    let size = u32::from_le_bytes(audio_data[pos+4..pos+8].try_into().unwrap()) as usize;
    if id == b"fmt " { break pos; }
    pos += 8 + size + (size % 2);
};
src/web/openai.rs (2)

812-823: Send a filename that matches the chunk content type.

Improves provider heuristics and logging.

Apply:

         let future = async move {
             info!("Processing chunk {}", chunk.index);
             
+            let effective_filename = if content_type == "audio/wav" {
+                format!("chunk-{}.wav", chunk.index)
+            } else if content_type == "audio/mpeg" {
+                format!("chunk-{}.mp3", chunk.index)
+            } else {
+                filename.clone()
+            };
+
             let params = TranscriptionParams {
                 audio_data: &chunk.data,
-                filename: &filename,
+                filename: &effective_filename,
                 content_type: &content_type,
                 language: language.as_deref(),
                 prompt: prompt.as_deref(),
                 response_format: &response_format,
                 temperature,
             };

660-697: Try all fallbacks per cycle, not just the first.

Iterate over route.fallbacks to increase success rate when a single fallback is down.

High-level change:

  • Replace first() with a for fallback in route.fallbacks.iter() loop inside each cycle.
  • Short-circuit on first success; keep last_error updated.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd41fae and cd2d7fd.

⛔ Files ignored due to path filters (2)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
📒 Files selected for processing (5)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/web/audio_utils.rs (1 hunks)
  • src/web/mod.rs (1 hunks)
  • src/web/openai.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pcrProdHistory.json
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (2)
src/web/audio_utils.rs (2)
  • merge_transcriptions (248-308)
  • new (47-49)
src/proxy_config.rs (2)
  • Client (243-245)
  • new (161-195)
🪛 GitHub Actions: Rust CI
src/web/audio_utils.rs

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

src/web/openai.rs

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (4)
src/web/audio_utils.rs (2)

81-96: Confirm intent: large non‑WAV files bypass splitting.

Currently, even when should_split() is true, non‑WAV formats are returned as a single chunk. If that’s intentional (to avoid re-encode), document it; otherwise consider a decode/re-encode path or a server-side size cap.


1-7: Fix formatting to satisfy CI.

Rust CI reports cargo fmt issues on this file. Run cargo fmt --all.

src/web/mod.rs (1)

2-2: Module exposure looks good.

pub mod audio_utils cleanly wires the new utilities; no concerns.

src/web/openai.rs (1)

1-10: Fix formatting to satisfy CI.

Rust CI reports cargo fmt issues on this file. Run cargo fmt --all.

Comment thread src/web/audio_utils.rs
Comment on lines +145 to +151
let header = &audio_data[0..data_start];
let audio_samples = &audio_data[data_start..data_start + data_size];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bounds-check WAV data slice to avoid panic on malformed chunk size.

data_size can exceed remaining bytes; slicing without clamping can panic.

Apply:

-        let header = &audio_data[0..data_start];
-        let audio_samples = &audio_data[data_start..data_start + data_size];
+        let header = &audio_data[0..data_start];
+        let data_end = data_start.saturating_add(data_size).min(audio_data.len());
+        let audio_samples = &audio_data[data_start..data_end];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let header = &audio_data[0..data_start];
let audio_samples = &audio_data[data_start..data_start + data_size];
let header = &audio_data[0..data_start];
let data_end = data_start.saturating_add(data_size).min(audio_data.len());
let audio_samples = &audio_data[data_start..data_end];
🧰 Tools
🪛 GitHub Actions: Rust CI

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

🤖 Prompt for AI Agents
In src/web/audio_utils.rs around lines 145 to 147, the code unsafely slices
audio_data using data_start and data_size which can panic if data_size exceeds
remaining bytes; change to first validate that data_start <= audio_data.len()
and compute end = data_start.saturating_add(data_size).min(audio_data.len()),
then use header = &audio_data[0..data_start] and audio_samples =
&audio_data[data_start..end]; if data_start >= audio_data.len() or data_size ==
0 return an Err/early return for malformed WAV instead of panicking.

Comment thread src/web/audio_utils.rs
Comment on lines +173 to +186
// Calculate chunk size based on file size limit only
let chunk_duration_bytes = self.max_chunk_size - header.len() - 8; // Subtract header and data chunk header
let overlap_bytes = (self.overlap_seconds as usize) * bytes_per_second;

// Make sure we align to sample boundaries (usually 2 or 4 bytes)
let block_align =
u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
let chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
let overlap_bytes = (overlap_bytes / block_align) * block_align;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Prevent zero-length chunk loop; fix overlap rounding to samples.

With very small max_chunk_size, chunk_duration_bytes can become 0, causing an infinite loop. Also, overlap rounding should be based on f64 then aligned to block boundaries.

Apply:

-        // Calculate chunk size based on file size limit only
-        let chunk_duration_bytes = self.max_chunk_size - header.len() - 8; // Subtract header and data chunk header
-        let overlap_bytes = (self.overlap_seconds as usize) * bytes_per_second;
+        // Calculate chunk size based on file size limit only
+        let mut chunk_duration_bytes =
+            self.max_chunk_size.saturating_sub(header.len() + 8); // Header + data header
+        let overlap_bytes_f = (self.overlap_seconds * bytes_per_second as f64).round() as usize;
@@
-        let block_align =
-            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
-        let chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
-        let overlap_bytes = (overlap_bytes / block_align) * block_align;
+        let block_align =
+            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
+        chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
+        let mut overlap_bytes = (overlap_bytes_f / block_align) * block_align;
+        if chunk_duration_bytes == 0 {
+            return Err("max_chunk_size too small for WAV header + one block".to_string());
+        }
🧰 Tools
🪛 GitHub Actions: Rust CI

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

🤖 Prompt for AI Agents
In src/web/audio_utils.rs around lines 173 to 182, the current logic can produce
a zero-length chunk and an infinite loop when max_chunk_size is very small and
also rounds overlap using integer arithmetic; change the math so
chunk_duration_bytes is computed then aligned to block_align and clamped to at
least block_align to avoid zero-length chunks, and compute overlap_bytes using
floating-point seconds (overlap_seconds as f64 * bytes_per_second as f64)
rounding to usize before aligning to block_align; ensure both
chunk_duration_bytes and overlap_bytes are divided/multiplied using the correct
types, aligned by (value / block_align) * block_align, and that you handle the
case where even after alignment overlap_bytes may be zero but less than
chunk_duration_bytes.

Comment thread src/web/openai.rs Outdated
Comment thread src/web/openai.rs
Comment on lines +981 to +1032
// Send request
match client.request(req).await {
Ok(res) => {
if res.status().is_success() {
let body_bytes = to_bytes(res.into_body())
.await
.map_err(|e| format!("Failed to read response body: {:?}", e))?;

let response_json: Value = serde_json::from_slice(&body_bytes)
.map_err(|e| format!("Failed to parse response: {:?}", e))?;

Ok(response_json)
} else {
let status = res.status();
let body_bytes = to_bytes(res.into_body()).await.ok();
let error_msg = body_bytes
.map(|b| String::from_utf8_lossy(&b).to_string())
.unwrap_or_else(|| status.to_string());

Err(format!(
"Provider {} returned error: {} - {}",
provider.provider_name, status, error_msg
))
}
}
Err(e) => Err(format!(
"Failed to send request to {}: {:?}",
provider.provider_name, e
)),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add request timeout for transcription provider call.

Prevents indefinite hangs on upstream.

Apply:

-    // Send request
-    match client.request(req).await {
+    // Send request with timeout
+    match tokio::time::timeout(Duration::from_secs(30), client.request(req)).await {
-        Ok(res) => {
+        Ok(Ok(res)) => {
             if res.status().is_success() {
@@
                 Ok(response_json)
             } else {
@@
             }
-        }
-        Err(e) => Err(format!(
-            "Failed to send request to {}: {:?}",
-            provider.provider_name, e
-        )),
+        }
+        Ok(Err(e)) => Err(format!(
+            "Failed to send request to {}: {:?}",
+            provider.provider_name, e
+        )),
+        Err(_) => Err(format!(
+            "Request to {} timed out after 30s",
+            provider.provider_name
+        )),
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Send request
match client.request(req).await {
Ok(res) => {
if res.status().is_success() {
let body_bytes = to_bytes(res.into_body())
.await
.map_err(|e| format!("Failed to read response body: {:?}", e))?;
let response_json: Value = serde_json::from_slice(&body_bytes)
.map_err(|e| format!("Failed to parse response: {:?}", e))?;
Ok(response_json)
} else {
let status = res.status();
let body_bytes = to_bytes(res.into_body()).await.ok();
let error_msg = body_bytes
.map(|b| String::from_utf8_lossy(&b).to_string())
.unwrap_or_else(|| status.to_string());
Err(format!(
"Provider {} returned error: {} - {}",
provider.provider_name, status, error_msg
))
}
}
Err(e) => Err(format!(
"Failed to send request to {}: {:?}",
provider.provider_name, e
)),
}
// Send request with timeout
match tokio::time::timeout(Duration::from_secs(30), client.request(req)).await {
Ok(Ok(res)) => {
if res.status().is_success() {
let body_bytes = to_bytes(res.into_body())
.await
.map_err(|e| format!("Failed to read response body: {:?}", e))?;
let response_json: Value = serde_json::from_slice(&body_bytes)
.map_err(|e| format!("Failed to parse response: {:?}", e))?;
Ok(response_json)
} else {
let status = res.status();
let body_bytes = to_bytes(res.into_body()).await.ok();
let error_msg = body_bytes
.map(|b| String::from_utf8_lossy(&b).to_string())
.unwrap_or_else(|| status.to_string());
Err(format!(
"Provider {} returned error: {} - {}",
provider.provider_name, status, error_msg
))
}
}
Ok(Err(e)) => Err(format!(
"Failed to send request to {}: {:?}",
provider.provider_name, e
)),
Err(_) => Err(format!(
"Request to {} timed out after 30s",
provider.provider_name
)),
}
🧰 Tools
🪛 GitHub Actions: Rust CI

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

🤖 Prompt for AI Agents
In src/web/openai.rs around lines 981 to 1010, the code awaits
client.request(req).await with no timeout which can hang indefinitely; wrap the
await in a tokio::time::timeout using a sensible Duration (from config or a
constant), await the timeout future and handle the timeout error by returning an
Err with a clear message that the transcription provider call timed out (include
provider.provider_name), keep the existing success/error handling for the inner
response when the timeout does not occur, and add the necessary use/import for
tokio::time::timeout and std::time::Duration (or fetch duration from runtime
config).

Comment thread src/web/openai.rs
Comment on lines +1119 to +1146
// Send request
let res = client.request(req).await.map_err(|e| {
error!("Failed to send TTS request: {:?}", e);
ApiError::InternalServerError
})?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add request timeout for TTS upstream call.

Prevents indefinite hangs.

Apply:

-    let res = client.request(req).await.map_err(|e| {
-        error!("Failed to send TTS request: {:?}", e);
-        ApiError::InternalServerError
-    })?;
+    let res = tokio::time::timeout(Duration::from_secs(30), client.request(req))
+        .await
+        .map_err(|_| {
+            error!("TTS request timed out after 30s");
+            ApiError::InternalServerError
+        })?
+        .map_err(|e| {
+            error!("Failed to send TTS request: {:?}", e);
+            ApiError::InternalServerError
+        })?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Send request
let res = client.request(req).await.map_err(|e| {
error!("Failed to send TTS request: {:?}", e);
ApiError::InternalServerError
})?;
// Send request
let res = tokio::time::timeout(Duration::from_secs(30), client.request(req))
.await
.map_err(|_| {
error!("TTS request timed out after 30s");
ApiError::InternalServerError
})?
.map_err(|e| {
error!("Failed to send TTS request: {:?}", e);
ApiError::InternalServerError
})?;
🧰 Tools
🪛 GitHub Actions: Rust CI

[error] cargo fmt --all -- --check detected formatting issues. Run 'cargo fmt' to format this file.

🤖 Prompt for AI Agents
In src/web/openai.rs around lines 1119 to 1124, the upstream TTS request can
hang indefinitely because there is no timeout; wrap the
client.request(req).await call with a timeout (e.g.,
tokio::time::timeout(Duration::from_secs(30), client.request(req)).await) and
handle the Result/Err: if timeout elapsed, log a timeout-specific error and
return an appropriate ApiError (e.g., ApiError::GatewayTimeout or an
InternalServerError mapped to timeout), otherwise propagate the request error as
before; also add the necessary imports (tokio::time::timeout and
std::time::Duration).

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The latest updates focus on correcting critical PCR (Platform Configuration Register) management issues and implementing timeout mechanisms for external API calls.

The key changes include properly adding signed PCR measurements to the production history file (pcrProdHistory.json), which maintains the cryptographic audit trail required for frontend verification of the Nitro Enclave. This follows the documented PCR workflow where new measurements must be signed and appended to enable client verification of enclave authenticity.

Additionally, timeout mechanisms were implemented for external API requests in the TTS and transcription endpoints to prevent indefinite hangs when upstream providers become unresponsive. The implementation uses tokio::time::timeout with 30-second limits for both TTS speech generation and transcription provider requests.

These changes integrate with the existing audio processing system by ensuring that the PCR values for the new TTS-enabled enclave build are properly recorded in the cryptographic history, maintaining the security properties of the attestation system while adding robustness through request timeouts.

6 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
src/web/audio_utils.rs (3)

123-141: Validate chunk_size to prevent overflow/OOB in WAV chunk scan

Unchecked addition can wrap and/or overrun the buffer. Add bounds checks before advancing.

-        while pos < audio_data.len() - 8 {
+        while pos + 8 <= audio_data.len() {
             let chunk_id = &audio_data[pos..pos + 4];
             let chunk_size = u32::from_le_bytes([
                 audio_data[pos + 4],
                 audio_data[pos + 5],
                 audio_data[pos + 6],
                 audio_data[pos + 7],
             ]) as usize;
+            // Validate chunk_size to prevent overflow/OOB
+            let next_pos = match pos.checked_add(8).and_then(|v| v.checked_add(chunk_size)) {
+                Some(v) => v,
+                None => return Err("Invalid WAV file: chunk size overflow".to_string()),
+            };
+            if next_pos > audio_data.len() {
+                return Err("Invalid WAV file: chunk extends beyond file boundary".to_string());
+            }
@@
-            pos += 8 + chunk_size;
+            pos = next_pos;
             if chunk_size % 2 == 1 {
                 pos += 1; // Padding byte
             }
         }

148-151: Bounds-check data slice to avoid panic on malformed data_size

Slicing with data_start + data_size can panic if truncated. Clamp and validate.

-        // Extract header (everything before data)
-        let header = &audio_data[0..data_start];
-        let audio_samples = &audio_data[data_start..data_start + data_size];
+        // Extract header (everything before data)
+        if data_start >= audio_data.len() {
+            return Err("Invalid WAV file: data_start beyond file size".to_string());
+        }
+        let header = &audio_data[0..data_start];
+        let data_end = data_start.saturating_add(data_size).min(audio_data.len());
+        let audio_samples = &audio_data[data_start..data_end];

177-186: Prevent zero-length chunks; compute overlap with f64 and align; saturate math

Avoid underflow, infinite loops, and wrong overlap due to truncation.

-        // Calculate chunk size based on file size limit only
-        let chunk_duration_bytes = self.max_chunk_size - header.len() - 8; // Subtract header and data chunk header
-        let overlap_bytes = (self.overlap_seconds as usize) * bytes_per_second;
+        // Calculate chunk size based on file size limit only
+        let mut chunk_duration_bytes =
+            self.max_chunk_size.saturating_sub(header.len() + 8); // header + data header
+        let overlap_bytes_f = (self.overlap_seconds * bytes_per_second as f64).round() as usize;
@@
-        let block_align =
-            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
-        let chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
-        let overlap_bytes = (overlap_bytes / block_align) * block_align;
+        let block_align =
+            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
+        chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
+        let mut overlap_bytes = (overlap_bytes_f / block_align) * block_align;
+        if chunk_duration_bytes == 0 {
+            return Err("max_chunk_size too small for WAV header + one block".to_string());
+        }
src/web/openai.rs (6)

737-742: Validate decoded audio is non-empty and within reasonable limits

Prevents empty/oversized payloads from hitting providers.

     let file_bytes = general_purpose::STANDARD.decode(file_base64).map_err(|e| {
         error!("Failed to decode base64 audio file: {:?}", e);
         ApiError::BadRequest
     })?;
+    if file_bytes.is_empty() {
+        error!("Decoded audio is empty");
+        return Err(ApiError::BadRequest);
+    }
+    if file_bytes.len() > 25 * 1024 * 1024 {
+        error!("Decoded audio exceeds 25MB: {}", file_bytes.len());
+        return Err(ApiError::BadRequest);
+    }

775-781: Map client-caused WAV parse errors to 400

Invalid WAV should not be a 500.

-    let chunks = splitter
-        .split_audio(&file_bytes, content_type)
-        .map_err(|e| {
-            error!("Failed to split audio: {}", e);
-            ApiError::InternalServerError
-        })?;
+    let chunks = splitter
+        .split_audio(&file_bytes, content_type)
+        .map_err(|e| {
+            error!("Failed to split audio: {}", e);
+            if e.contains("Invalid WAV") || e.contains("Not a valid WAV") || e.contains("no data chunk") {
+                ApiError::BadRequest
+            } else {
+                ApiError::InternalServerError
+            }
+        })?;

915-1020: Add timeout around provider transcription request

Avoid hangs on upstream.

-    // Send request
-    match client.request(req).await {
-        Ok(res) => {
+    // Send request with timeout
+    match tokio::time::timeout(Duration::from_secs(30), client.request(req)).await {
+        Ok(Ok(res)) => {
             if res.status().is_success() {
@@
-        }
-        Err(e) => Err(format!(
-            "Failed to send request to {}: {:?}",
-            provider.provider_name, e
-        )),
+        }
+        Ok(Err(e)) => Err(format!(
+            "Failed to send request to {}: {:?}",
+            provider.provider_name, e
+        )),
+        Err(_) => Err(format!(
+            "Request to {} timed out after 30s",
+            provider.provider_name
+        )),
     }

1046-1054: Reject blank/whitespace-only TTS input early

Prevents proxying empty text.

     let input = body_obj
         .get("input")
         .and_then(|i| i.as_str())
+        .filter(|s| !s.trim().is_empty())
         .ok_or_else(|| {
-            error!("Input text not specified in request");
+            error!("Input text not specified or empty in request");
             ApiError::BadRequest
         })?;

1129-1134: Add timeout around TTS upstream call

Prevents indefinite hangs.

-    // Send request
-    let res = client.request(req).await.map_err(|e| {
-        error!("Failed to send TTS request: {:?}", e);
-        ApiError::InternalServerError
-    })?;
+    // Send request with timeout
+    let res = tokio::time::timeout(Duration::from_secs(30), client.request(req))
+        .await
+        .map_err(|_| {
+            error!("TTS request timed out after 30s");
+            ApiError::InternalServerError
+        })?
+        .map_err(|e| {
+            error!("Failed to send TTS request: {:?}", e);
+            ApiError::InternalServerError
+        })?;

1140-1158: Honor upstream Content-Type; map PCM to WAV when absent

Aligns MIME with proxy and avoids mismatches.

-    // Get response body as bytes
-    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
+    // Get upstream content-type, then response body as bytes
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
+    let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    // Compute content type from upstream header if present; otherwise map from requested format
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav", // aligned with proxy remap
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+
+    // Create a response object with the audio data and metadata
+    let audio_response = json!({
+        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
+        "content_type": content_type,
+    });
🧹 Nitpick comments (3)
src/web/audio_utils.rs (1)

314-412: Add tests for malformed WAVs and tiny max_chunk_size

Cover: data chunk beyond bounds, zero block_align/bytes_per_second, max_chunk_size < header, and very small overlap leading to alignment to 0.

Happy to draft focused tests for these cases.

src/web/openai.rs (2)

857-867: Don’t fail entire transcription if some chunks succeed

Return merged success if at least one chunk worked.

-    for result in results {
-        match result {
-            Ok(r) => successful_results.push(r),
-            Err(e) => {
-                error!("Chunk processing failed: {}", e);
-                return Err(ApiError::InternalServerError);
-            }
-        }
-    }
+    for result in results {
+        match result {
+            Ok(r) => successful_results.push(r),
+            Err(e) => {
+                error!("Chunk processing failed: {}", e);
+                // continue; we'll proceed if we have any successes
+            }
+        }
+    }

1079-1083: Clamp TTS speed to sane range

Avoid provider errors for out-of-range speed.

-    let speed = body_obj
+    let mut speed = body_obj
         .get("speed")
         .and_then(|s| s.as_f64())
         .unwrap_or(1.0) as f32;
+    if !(0.25..=4.0).contains(&speed) {
+        speed = speed.clamp(0.25, 4.0);
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2d7fd and 61727e6.

⛔ Files ignored due to path filters (2)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
📒 Files selected for processing (4)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/web/audio_utils.rs (1 hunks)
  • src/web/openai.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrProdHistory.json
  • pcrDevHistory.json
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (3)
src/web/audio_utils.rs (2)
  • merge_transcriptions (252-312)
  • new (48-50)
src/proxy_config.rs (1)
  • new (161-195)
src/web/encryption_middleware.rs (1)
  • encrypt_response (91-105)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (2)
src/web/audio_utils.rs (1)

1-249: Run cargo fmt and fix Clippy warnings

CI flagged formatting for src/web/audio_utils.rs (lines 1–249). Sandbox verification failed (cargo error: "Could not locate working directory: no /proc/self/exe"). Run locally: cargo fmt --all && cargo clippy --all-targets --all-features -D warnings and commit any fixes.

src/web/openai.rs (1)

1-1247: Run cargo fmt (format src/web/openai.rs)

CI flagged rustfmt issues; sandbox verification failed (no /proc/self/exe). Run locally or in CI: cargo fmt --all -- --check || cargo fmt --all and push changes to re-run CI.

Comment thread src/web/audio_utils.rs
Comment thread src/web/audio_utils.rs
Comment on lines +153 to +170
let fmt_pos = 12;
if &audio_data[fmt_pos..fmt_pos + 4] != b"fmt " {
return Err("WAV file has no fmt chunk".to_string());
}

let channels = u16::from_le_bytes([audio_data[fmt_pos + 10], audio_data[fmt_pos + 11]]);
let sample_rate = u32::from_le_bytes([
audio_data[fmt_pos + 12],
audio_data[fmt_pos + 13],
audio_data[fmt_pos + 14],
audio_data[fmt_pos + 15],
]);
let bytes_per_second = u32::from_le_bytes([
audio_data[fmt_pos + 16],
audio_data[fmt_pos + 17],
audio_data[fmt_pos + 18],
audio_data[fmt_pos + 19],
]) as usize;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Find 'fmt ' chunk robustly and validate sizes; guard zero block_align/bytes_per_second

Assuming fmt at offset 12 is unsafe; add a scan and basic invariants.

-        let fmt_pos = 12;
-        if &audio_data[fmt_pos..fmt_pos + 4] != b"fmt " {
-            return Err("WAV file has no fmt chunk".to_string());
-        }
-
-        let channels = u16::from_le_bytes([audio_data[fmt_pos + 10], audio_data[fmt_pos + 11]]);
-        let sample_rate = u32::from_le_bytes([
-            audio_data[fmt_pos + 12],
-            audio_data[fmt_pos + 13],
-            audio_data[fmt_pos + 14],
-            audio_data[fmt_pos + 15],
-        ]);
-        let bytes_per_second = u32::from_le_bytes([
-            audio_data[fmt_pos + 16],
-            audio_data[fmt_pos + 17],
-            audio_data[fmt_pos + 18],
-            audio_data[fmt_pos + 19],
-        ]) as usize;
+        // Locate 'fmt ' chunk
+        let mut fmt_pos_opt = None;
+        let mut p = 12usize;
+        while p + 8 <= audio_data.len() {
+            let id = &audio_data[p..p + 4];
+            let sz = u32::from_le_bytes([
+                audio_data[p + 4],
+                audio_data[p + 5],
+                audio_data[p + 6],
+                audio_data[p + 7],
+            ]) as usize;
+            if id == b"fmt " {
+                fmt_pos_opt = Some((p, sz));
+                break;
+            }
+            let mut np = p.saturating_add(8).saturating_add(sz);
+            if sz % 2 == 1 {
+                np = np.saturating_add(1);
+            }
+            if np <= p { break; } // safety
+            p = np;
+        }
+        let (fmt_pos, fmt_size) = fmt_pos_opt.ok_or_else(|| "WAV file has no fmt chunk".to_string())?;
+        if fmt_size < 16 || fmt_pos + 22 > audio_data.len() {
+            return Err("WAV fmt chunk too small".to_string());
+        }
+
+        let channels = u16::from_le_bytes([audio_data[fmt_pos + 10], audio_data[fmt_pos + 11]]);
+        let sample_rate = u32::from_le_bytes([
+            audio_data[fmt_pos + 12],
+            audio_data[fmt_pos + 13],
+            audio_data[fmt_pos + 14],
+            audio_data[fmt_pos + 15],
+        ]);
+        let bytes_per_second = u32::from_le_bytes([
+            audio_data[fmt_pos + 16],
+            audio_data[fmt_pos + 17],
+            audio_data[fmt_pos + 18],
+            audio_data[fmt_pos + 19],
+        ]) as usize;
@@
-        let block_align =
-            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
-        let chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
-        let overlap_bytes = (overlap_bytes / block_align) * block_align;
+        let block_align =
+            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
+        if block_align == 0 || bytes_per_second == 0 {
+            return Err("Invalid WAV fmt: block_align or bytes_per_second is zero".to_string());
+        }

Also applies to: 182-185

🤖 Prompt for AI Agents
In src/web/audio_utils.rs around lines 153-170 (and also apply same checks to
lines ~182-185), do not assume the "fmt " chunk is fixed at offset 12: scan the
WAV chunks after the RIFF header to locate the 4-byte chunk id "fmt " and its
4-byte chunk size, validate that the slice operations stay within audio_data
bounds before reading any fields, and ensure the parsed block_align and
bytes_per_second (and channels/sample_rate) are non-zero and reasonable; if any
check fails, return a descriptive Err. Specifically: iterate chunk-by-chunk
using the 4-byte id + 4-byte size header, check available length for the
declared size, find the fmt chunk, read its size and only then read fields
(guard all index accesses), and add validations to reject zero
block_align/bytes_per_second (and sizes that are too small) with clear error
messages.

Comment thread src/web/audio_utils.rs
Comment on lines +191 to +222
while offset < audio_samples.len() {
let start = if index > 0 && offset >= overlap_bytes {
offset - overlap_bytes
} else {
offset
};

let end = (start + chunk_duration_bytes).min(audio_samples.len());
let chunk_samples = &audio_samples[start..end];

// Create a new WAV file for this chunk
let chunk_wav = self.create_wav_from_samples(header, chunk_samples)?;

let start_time = start as f64 / bytes_per_second as f64;
let end_time = end as f64 / bytes_per_second as f64;

chunks.push(AudioChunk {
data: chunk_wav,
start_time,
end_time,
index,
});

let chunk_size = chunks.last().map(|c| c.data.len()).unwrap_or(0);
info!(
"Created WAV chunk {} ({:.1}s - {:.1}s), size: {} bytes",
index, start_time, end_time, chunk_size
);

offset = end;
index += 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guarantee forward progress in the chunking loop

If end <= offset, loop can stall. Ensure offset advances at least one block.

@@
-            let end = (start + chunk_duration_bytes).min(audio_samples.len());
+            let end = (start + chunk_duration_bytes).min(audio_samples.len());
+            if end <= offset {
+                let step = block_align.max(1);
+                let new_offset = offset.saturating_add(step).min(audio_samples.len());
+                if new_offset == offset {
+                    break;
+                }
+                offset = new_offset;
+                index += 1;
+                continue;
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while offset < audio_samples.len() {
let start = if index > 0 && offset >= overlap_bytes {
offset - overlap_bytes
} else {
offset
};
let end = (start + chunk_duration_bytes).min(audio_samples.len());
let chunk_samples = &audio_samples[start..end];
// Create a new WAV file for this chunk
let chunk_wav = self.create_wav_from_samples(header, chunk_samples)?;
let start_time = start as f64 / bytes_per_second as f64;
let end_time = end as f64 / bytes_per_second as f64;
chunks.push(AudioChunk {
data: chunk_wav,
start_time,
end_time,
index,
});
let chunk_size = chunks.last().map(|c| c.data.len()).unwrap_or(0);
info!(
"Created WAV chunk {} ({:.1}s - {:.1}s), size: {} bytes",
index, start_time, end_time, chunk_size
);
offset = end;
index += 1;
}
while offset < audio_samples.len() {
let start = if index > 0 && offset >= overlap_bytes {
offset - overlap_bytes
} else {
offset
};
let end = (start + chunk_duration_bytes).min(audio_samples.len());
if end <= offset {
let step = block_align.max(1);
let new_offset = offset.saturating_add(step).min(audio_samples.len());
if new_offset == offset {
break;
}
offset = new_offset;
index += 1;
continue;
}
let chunk_samples = &audio_samples[start..end];
// Create a new WAV file for this chunk
let chunk_wav = self.create_wav_from_samples(header, chunk_samples)?;
let start_time = start as f64 / bytes_per_second as f64;
let end_time = end as f64 / bytes_per_second as f64;
chunks.push(AudioChunk {
data: chunk_wav,
start_time,
end_time,
index,
});
let chunk_size = chunks.last().map(|c| c.data.len()).unwrap_or(0);
info!(
"Created WAV chunk {} ({:.1}s - {:.1}s), size: {} bytes",
index, start_time, end_time, chunk_size
);
offset = end;
index += 1;
}
🤖 Prompt for AI Agents
In src/web/audio_utils.rs around lines 191 to 222, the chunking loop can stall
when end <= offset; ensure forward progress by detecting that case and advancing
offset at least one byte (or one sample) before continuing. Modify the loop so
that after computing end if end <= offset you set end = (offset +
1).min(audio_samples.len()) (or offset + minimum_block_bytes if you have a more
appropriate block size), then proceed to create the chunk and update offset =
end; this guarantees the loop always advances and prevents an infinite loop.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The most recent changes in this PR focus solely on updating PCR (Platform Configuration Register) measurements for AWS Nitro Enclave attestation in both development and production environments. The changes update the cryptographic measurements in four files: pcrDev.json, pcrDevHistory.json, pcrProd.json, and pcrProdHistory.json, along with adding canonical model name mapping for Whisper Large V3 in proxy_config.rs. These PCR updates are necessary because the TTS (Text-to-Speech) feature additions modified the enclave code, resulting in different cryptographic measurements when the enclave image was rebuilt.

PCR values are deterministic hashes that change whenever the enclave binary changes - even small code modifications produce completely different PCR values. The changes update PCR0 and PCR2 measurements while PCR1 remains consistent across most entries. The proxy configuration changes add support for routing audio transcription requests across different providers (Tinfoil and Continuum) while presenting a unified API to users through canonical model naming (e.g., 'whisper-large-v3' maps to provider-specific names like 'whisper-large-v3-turbo' on Tinfoil).

Confidence score: 3/5

  • This PR has moderate risk due to critical PCR measurement inconsistencies that could break attestation functionality
  • Score reflects properly signed PCR entries but concerning inconsistencies in the development environment and potential verification issues
  • Pay close attention to pcrDevHistory.json for PCR1 inconsistencies and verify all PCR measurements are properly signed and validated

5 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/proxy_config.rs (2)

63-69: Secrets leak risk: ProxyConfig’s Debug includes raw api_key. Redact in logs.

  • Deriving Debug on a struct with secrets will leak keys to logs (your test even asserts presence).
  • Implement Debug manually and redact the key; update the test accordingly.

Apply:

-#[derive(Debug, Clone)]
+#[derive(Clone)]
 pub struct ProxyConfig {
     pub base_url: String,
     pub api_key: Option<String>,
     /// Provider name for logging
     pub provider_name: String,
 }
+
+impl std::fmt::Debug for ProxyConfig {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("ProxyConfig")
+            .field("base_url", &self.base_url)
+            .field("api_key", &self.api_key.as_ref().map(|_| "<redacted>"))
+            .field("provider_name", &self.provider_name)
+            .finish()
+    }
+}

And update the test:

@@
-        // Should contain api_key but we're not testing the exact format
-        assert!(debug_str.contains("api_key"));
+        // api_key should be present but redacted
+        assert!(debug_str.contains("api_key"));
+        assert!(debug_str.contains("<redacted>"));
+        assert!(!debug_str.contains("secret"));

Also applies to: 748-761


470-506: Add request timeouts to avoid hangs on provider outages.

Wrap HTTP calls with tokio::time::timeout and return a clear error on timeout.

-        let req = req.body(Body::empty())?;
-        let res = client.request(req).await?;
+        let req = req.body(Body::empty())?;
+        let res = tokio::time::timeout(Duration::from_secs(10), client.request(req))
+            .await
+            .map_err(|_| "models request timed out")??;
🧹 Nitpick comments (5)
src/proxy_config.rs (5)

247-265: Incomplete model_to_proxy map (Continuum entries and canonical aliases missing).

  • You only populate model_to_proxy for Tinfoil models. If other parts of the code still rely on this map, Continuum-only models and canonical IDs won’t resolve.
  • Either remove model_to_proxy entirely (if deprecated) or populate it consistently for non-equivalent Continuum models and their canonical aliases.

Minimal fix (keeps map consistent without conflicting with Tinfoil primaries):

@@
-                            model_to_proxy.insert(model_id.to_string(), tinfoil_proxy.clone());
+                            model_to_proxy.insert(model_id.to_string(), tinfoil_proxy.clone());
+                            // Also map canonical alias (if any) to Tinfoil
+                            for (canonical_name, provider_names) in &*MODEL_EQUIVALENCIES {
+                                if let Some(tinfoil_name) = provider_names.get("tinfoil") {
+                                    if *tinfoil_name == model_id {
+                                        model_to_proxy
+                                            .insert(canonical_name.to_string(), tinfoil_proxy.clone());
+                                        break;
+                                    }
+                                }
+                            }
@@
-                        if !is_equivalent_to_tinfoil {
+                        if !is_equivalent_to_tinfoil {
+                            // Map Continuum provider ID
+                            model_to_proxy
+                                .insert(model_id.to_string(), self.default_proxy.clone());
                             // Check if this model has a canonical name
                             let mut canonical_model = model.clone();
@@
-                            all_models.push(canonical_model);
+                            // Map canonical alias (if any) to Continuum
+                            for (canonical_name, provider_names) in &*MODEL_EQUIVALENCIES {
+                                if let Some(continuum_name) = provider_names.get("continuum") {
+                                    if *continuum_name == model_id {
+                                        model_to_proxy.insert(
+                                            canonical_name.to_string(),
+                                            self.default_proxy.clone(),
+                                        );
+                                        break;
+                                    }
+                                }
+                            }
+                            all_models.push(canonical_model);

Also applies to: 296-358


223-236: Avoid concurrent refresh stampede.

Multiple callers can race and all trigger refresh_cache(). Gate with a tokio::sync::Mutex or a lightweight “in-progress” flag to serialize refreshes.


241-246: Reuse HTTP client instead of recreating per refresh.

Store the Client<HttpsConnector<_>> in ProxyRouter and reuse it to reduce TLS handshakes and connection churn.


145-159: Minor: parameter naming in get_model_name_for_provider.

canonical_model is misleading because the input can be either canonical or provider-specific. Consider renaming to model for clarity.


514-533: Test coverage: add assertions for Whisper V3 mapping.

Since you introduced Whisper V3 equivalencies, add unit tests mirroring the Llama checks to prevent regressions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61727e6 and 27c19e2.

⛔ Files ignored due to path filters (2)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
📒 Files selected for processing (3)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrDevHistory.json
  • pcrProdHistory.json
🔇 Additional comments (2)
src/proxy_config.rs (2)

5-5: Good change: use serde_json::json where constructing fragments.


21-25: Whisper V3 constants added. Looks consistent; please confirm provider IDs.
If these IDs ever drift, routing/canonicalization will silently break. Verify they match Tinfoil/Continuum upstream.

Comment thread src/proxy_config.rs
AnthonyRonning and others added 7 commits September 15, 2025 14:38
- Add /v1/audio/speech endpoint to both Rust and Go proxies
- Integrate with Tinfoil's Kokoro TTS model
- Support OpenAI-compatible TTS API parameters (voice, speed, format)
- Pass voice parameters directly without mapping (client's responsibility)
- Support MP3, Opus, AAC, FLAC, WAV audio formats

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- TTS endpoint now returns encrypted audio data like all other endpoints
- Audio is base64-encoded and wrapped in JSON with metadata before encryption
- Validate that only 'kokoro' model is accepted for TTS
- Fix session_id parameter usage for encryption
- Return audio with content_base64 and content_type fields

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add /v1/audio/transcriptions endpoint with encrypted JSON input (base64 audio)
- Configure Whisper model routing: Tinfoil (whisper-large-v3-turbo) primary, Continuum fallback
- Update tinfoil-proxy to handle multipart form transcription requests
- Fix canonical model name mapping in proxy router for all equivalencies
- Implement 3-cycle retry pattern matching chat completions behavior
- Add TODO for SQS billing events for transcription usage

Co-Authored-By: Claude <noreply@anthropic.com>
Implement automatic audio file chunking when files exceed 0.5MB limit.
- Split large WAV files into overlapping chunks for better transcription
- Process chunks in parallel for improved performance
- Automatically merge transcription results from multiple chunks
- Support single-file passthrough for smaller files and non-WAV formats

This enables transcription of large audio files that would otherwise
exceed API size limits while maintaining transcription accuracy through
overlapping segments.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Set MAX_CHUNK_SIZE back to 20MB from test value
- Add TINFOIL_MAX_SIZE constant (0.5MB) for Tinfoil provider limit
- Check chunk size and skip Tinfoil for files over 0.5MB
- Use case-insensitive comparison for provider name matching
- Route directly to Continuum fallback for larger files

This prevents 400 Bad Request errors from Tinfoil when processing
audio files that exceed its size limitations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces comprehensive Text-to-Speech (TTS) and audio transcription capabilities to the OpenSecretCloud platform. The implementation adds two new public API endpoints: /v1/audio/speech for TTS and /v1/audio/transcriptions for speech-to-text conversion, following OpenAI API compatibility standards.

Core Changes:

  • New Audio Processing Module: Adds src/web/audio_utils.rs with sophisticated WAV file splitting functionality that breaks large audio files into manageable chunks with temporal overlap for better transcription accuracy. The module handles WAV format parsing, chunk creation with proper headers, and transcription result merging.

  • Multi-Provider Audio Routing: Extends the existing proxy configuration system (src/proxy_config.rs) to support Whisper Large V3 model routing between Continuum and Tinfoil providers, enabling the same fallback/retry patterns used for LLM models.

  • Enhanced OpenAI Proxy: Significantly expands src/web/openai.rs with comprehensive audio handling including guest user blocking, provider-specific optimizations (bypassing Tinfoil for chunks >0.5MB), parallel chunk processing, and audio merging functionality.

  • Tinfoil Proxy Integration: Updates tinfoil-proxy/main.go to support the new audio endpoints with proper multipart form handling for transcription uploads and streaming responses for TTS generation.

  • Infrastructure Updates: Updates PCR measurements in both development and production configuration files to reflect the new enclave builds containing audio functionality, maintaining the cryptographic audit trail required for Nitro Enclave verification.

The implementation follows established patterns for authentication, encryption, error handling, and multi-provider routing while introducing significant new complexity around audio processing, file size management, and format handling. The audio splitting capability is particularly important for handling large files that exceed individual provider limits, enabling parallel processing while maintaining quality through overlap mechanisms.

Confidence score: 2/5

  • This PR introduces substantial new functionality with complex audio processing logic that has multiple identified security vulnerabilities and implementation issues
  • Score lowered due to critical buffer overflow vulnerabilities in WAV parsing, missing input validation, infinite loop potential, and several unaddressed security concerns from previous reviews
  • Pay close attention to src/web/audio_utils.rs for WAV parsing security issues, src/web/openai.rs for input validation gaps, and ensure all previous review recommendations are addressed before deployment

9 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment thread src/web/openai.rs
Comment on lines +817 to +837
// If chunk is over 0.5MB and primary is Tinfoil, skip Tinfoil and use fallback only
if chunk_size > TINFOIL_MAX_SIZE
&& route.primary.provider_name.to_lowercase() == "tinfoil"
{
info!(
"Chunk {} size {} bytes exceeds Tinfoil's 0.5MB limit, using fallback only",
chunk.index, chunk_size
);

// If we have a fallback (should be Continuum), use it as primary
if let Some(fallback) = route.fallbacks.first() {
route.primary = fallback.clone();
route.fallbacks.clear();
} else {
// No fallback available, this will fail
return Err(format!(
"Chunk {} size {} bytes exceeds Tinfoil's limit and no fallback available",
chunk.index, chunk_size
));
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Provider switching logic directly modifies the route configuration which could cause issues if the same route is reused. Consider creating a copy of the route instead of modifying in-place.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (9)
src/proxy_config.rs (1)

366-476: Canonical routing now covered for primary/fallback and single‑provider cases.

This addresses the earlier gap where canonical IDs could miss routes. Good fix.

src/web/audio_utils.rs (5)

80-97: Non‑WAV path bypasses size cap.

Large non‑WAV files return as a single oversized chunk; enforce a hard limit.

Apply:

-            _ => {
-                // For MP3 and other formats return full file as 1 chunk
+            _ => {
+                // No splitting for non-WAV; enforce hard cap
                 info!(
-                    "Non-WAV format ({}), returning entire file as single chunk",
-                    content_type
+                    "Non-WAV format ({}), returning entire file as single chunk (size: {} bytes)",
+                    content_type, audio_data.len()
                 );
-                Ok(vec![AudioChunk {
+                if audio_data.len() > self.max_chunk_size {
+                    return Err("Non-WAV input exceeds max chunk size and cannot be split".to_string());
+                }
+                Ok(vec![AudioChunk {
                     data: audio_data.to_vec(),
                     start_time: 0.0,
                     end_time: 0.0, // Unknown duration
                     index: 0,
                 }])
             }

177-186: Prevent underflow/zero-length chunks; align with block size.

Current arithmetic can underflow and produce zero-length chunks -> infinite loop.

Apply:

-        // Calculate chunk size based on file size limit only
-        let chunk_duration_bytes = self.max_chunk_size - header.len() - 8; // Subtract header and data chunk header
-        let overlap_bytes = (self.overlap_seconds as usize) * bytes_per_second;
+        // Calculate chunk size based on file size limit only
+        let mut chunk_duration_bytes =
+            self.max_chunk_size.saturating_sub(header.len() + 8);
+        let overlap_bytes_f = (self.overlap_seconds * bytes_per_second as f64).round() as usize;
@@
-        let block_align =
-            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
-        let chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
-        let overlap_bytes = (overlap_bytes / block_align) * block_align;
+        let block_align =
+            u16::from_le_bytes([audio_data[fmt_pos + 20], audio_data[fmt_pos + 21]]) as usize;
+        if block_align == 0 || bytes_per_second == 0 {
+            return Err("Invalid WAV fmt: block_align or bytes_per_second is zero".to_string());
+        }
+        chunk_duration_bytes = (chunk_duration_bytes / block_align) * block_align;
+        let mut overlap_bytes = (overlap_bytes_f / block_align) * block_align;
+        if chunk_duration_bytes == 0 {
+            return Err("max_chunk_size too small for WAV header + one block".to_string());
+        }

148-151: Guard audio_samples slice end to prevent panic.

data_size may exceed remaining bytes.

Apply:

-        let header = &audio_data[0..data_start];
-        let audio_samples = &audio_data[data_start..data_start + data_size];
+        let header = &audio_data[0..data_start];
+        let data_end = data_start.saturating_add(data_size).min(audio_data.len());
+        if data_end <= data_start {
+            return Err("Invalid WAV file: empty data chunk".to_string());
+        }
+        let audio_samples = &audio_data[data_start..data_end];

191-222: Guarantee forward progress in chunk loop.

If end <= offset, loop can stall forever.

Apply:

-        while offset < audio_samples.len() {
+        while offset < audio_samples.len() {
@@
-            let end = (start + chunk_duration_bytes).min(audio_samples.len());
+            let end = (start + chunk_duration_bytes).min(audio_samples.len());
+            if end <= offset {
+                let step = block_align.max(1);
+                let new_offset = offset.saturating_add(step).min(audio_samples.len());
+                if new_offset == offset { break; }
+                offset = new_offset;
+                index += 1;
+                continue;
+            }

123-142: Bounds-check WAV chunk_size to avoid out-of-bounds read.

Without validating chunk_size, pos can jump past buffer end.

Apply:

-        while pos < audio_data.len() - 8 {
+        while pos + 8 <= audio_data.len() {
             let chunk_id = &audio_data[pos..pos + 4];
             let chunk_size = u32::from_le_bytes([
                 audio_data[pos + 4],
                 audio_data[pos + 5],
                 audio_data[pos + 6],
                 audio_data[pos + 7],
             ]) as usize;
+            if pos + 8 + chunk_size > audio_data.len() {
+                return Err("Invalid WAV file: chunk extends beyond file boundary".to_string());
+            }
tinfoil-proxy/main.go (2)

799-801: MIME mismatch for "pcm": advertise audio/wav.

You request WAV upstream for pcm; returning audio/pcm misleads clients.

Apply:

-    case "pcm":
-        contentType = "audio/pcm"
+    case "pcm":
+        // Upstream uses WAV bytes for "pcm" request; advertise correctly
+        contentType = "audio/wav"

617-647: Enforce upload size limits before reading multipart.

Guard against abuse and OOM by bounding body and validating fileHeader.Size.

Apply:

 func (s *TinfoilProxyServer) handleTranscription(c *gin.Context) {
-    // Parse multipart form data
+    // Parse multipart form data
+    const maxUploadBytes = int64(25 << 20) // 25 MiB
+    c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, maxUploadBytes)
     file, fileHeader, err := c.Request.FormFile("file")
     if err != nil {
         c.JSON(http.StatusBadRequest, gin.H{"error": "Audio file is required"})
         return
     }
     defer file.Close()
 
     log.Printf("Transcription request - file: %s, size: %d bytes", fileHeader.Filename, fileHeader.Size)
+    if fileHeader.Size > 0 && fileHeader.Size > maxUploadBytes {
+        c.JSON(http.StatusRequestEntityTooLarge, gin.H{"error": "file too large"})
+        return
+    }
src/web/openai.rs (1)

1152-1170: Honor upstream Content‑Type; align PCM to WAV if header missing.

Reduces MIME mismatches with the proxy.

Apply:

-    // Get response body as bytes
+    // Get upstream content-type, then response body as bytes
+    let upstream_content_type = res
+        .headers()
+        .get(hyper::header::CONTENT_TYPE)
+        .and_then(|v| v.to_str().ok())
+        .map(|s| s.to_string());
     let body_bytes = to_bytes(res.into_body()).await.map_err(|e| {
         error!("Failed to read TTS response body: {:?}", e);
         ApiError::InternalServerError
     })?;
 
-    // Create a response object with the audio data and metadata
-    let audio_response = json!({
-        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
-        "content_type": match response_format {
-            "mp3" => "audio/mpeg",
-            "opus" => "audio/opus",
-            "aac" => "audio/aac",
-            "flac" => "audio/flac",
-            "wav" => "audio/wav",
-            "pcm" => "audio/pcm",
-            _ => "audio/mpeg", // Default to MP3
-        },
-    });
+    let content_type = upstream_content_type.unwrap_or_else(|| {
+        match response_format {
+            "mp3" => "audio/mpeg",
+            "opus" => "audio/opus",
+            "aac" => "audio/aac",
+            "flac" => "audio/flac",
+            "wav" => "audio/wav",
+            "pcm" => "audio/wav", // proxy returns WAV bytes for pcm
+            _ => "audio/mpeg",
+        }
+        .to_string()
+    });
+
+    // Create a response object with the audio data and metadata
+    let audio_response = json!({
+        "content_base64": general_purpose::STANDARD.encode(&body_bytes),
+        "content_type": content_type,
+    });
🧹 Nitpick comments (9)
src/proxy_config.rs (1)

507-543: Add request timeout to model fetch.

Network calls can hang; wrap client.request with a timeout.

Apply:

-        let res = client.request(req).await?;
+        let res = tokio::time::timeout(Duration::from_secs(15), client.request(req))
+            .await
+            .map_err(|_| format!("Timed out fetching models from {}", proxy_config.base_url))??
+            ;
tinfoil-proxy/main.go (1)

628-666: Unused form parameter.

The local variable model is read but not used; either honor it or remove to avoid confusion.

src/web/audio_utils.rs (1)

153-176: Do not assume fmt at fixed offset; scan and validate.

Assuming fmt at 12 is fragile; scan chunks to find "fmt " and validate sizes.

If desired, I can provide a full diff to scan for "fmt " safely.

src/web/openai.rs (6)

743-747: Validate decoded audio length.

Reject empty or overly large inputs to avoid provider errors/memory pressure.

Apply:

     let file_bytes = general_purpose::STANDARD.decode(file_base64).map_err(|e| {
         error!("Failed to decode base64 audio file: {:?}", e);
         ApiError::BadRequest
     })?;
+    if file_bytes.is_empty() {
+        error!("Decoded audio is empty");
+        return Err(ApiError::BadRequest);
+    }
+    if file_bytes.len() > 25 * 1024 * 1024 {
+        error!("Decoded audio exceeds 25MB");
+        return Err(ApiError::BadRequest);
+    }

786-793: Map client-caused split errors to 400.

Invalid WAV inputs should return BadRequest, not 500.

Apply:

-        .map_err(|e| {
-            error!("Failed to split audio: {}", e);
-            ApiError::InternalServerError
-        })?;
+        .map_err(|e| {
+            error!("Failed to split audio: {}", e);
+            if e.contains("Invalid WAV")
+                || e.contains("Not a valid WAV")
+                || e.contains("no data chunk")
+            {
+                ApiError::BadRequest
+            } else {
+                ApiError::InternalServerError
+            }
+        })?;

1003-1032: Add timeout to transcription provider call.

Prevents indefinite hangs.

Apply:

-    // Send request
-    match client.request(req).await {
+    // Send request with timeout
+    match tokio::time::timeout(Duration::from_secs(30), client.request(req)).await {
-        Ok(res) => {
+        Ok(Ok(res)) => {
@@
-        }
-        Err(e) => Err(format!(
+        }
+        Ok(Err(e)) => Err(format!(
             "Failed to send request to {}: {:?}",
             provider.provider_name, e
         )),
+        Err(_) => Err(format!(
+            "Request to {} timed out after 30s",
+            provider.provider_name
+        )),
     }

1059-1065: Reject blank TTS input.

Ignore whitespace‑only strings.

Apply:

-    let input = body_obj
+    let input = body_obj
         .get("input")
         .and_then(|i| i.as_str())
+        .filter(|s| !s.trim().is_empty())
         .ok_or_else(|| {
-            error!("Input text not specified in request");
+            error!("Input text not specified or empty in request");
             ApiError::BadRequest
         })?;

1141-1146: Add timeout to TTS upstream call.

Avoids hanging requests.

Apply:

-    let res = client.request(req).await.map_err(|e| {
-        error!("Failed to send TTS request: {:?}", e);
-        ApiError::InternalServerError
-    })?;
+    let res = tokio::time::timeout(Duration::from_secs(30), client.request(req))
+        .await
+        .map_err(|_| {
+            error!("TTS request timed out after 30s");
+            ApiError::InternalServerError
+        })?
+        .map_err(|e| {
+            error!("Failed to send TTS request: {:?}", e);
+            ApiError::InternalServerError
+        })?;

796-865: Bound parallelism for many chunks.

Processing all chunks at once can overload providers; cap concurrency (e.g., buffer_unordered(4)).

If you want, I can provide a small refactor using futures::stream::iter(...).buffer_unordered(N).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27c19e2 and a60ac66.

⛔ Files ignored due to path filters (3)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (7)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (2 hunks)
  • src/web/audio_utils.rs (1 hunks)
  • src/web/mod.rs (1 hunks)
  • src/web/openai.rs (6 hunks)
  • tinfoil-proxy/main.go (24 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrDevHistory.json
  • src/web/mod.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/web/openai.rs (3)
src/web/audio_utils.rs (2)
  • merge_transcriptions (252-312)
  • new (48-50)
src/proxy_config.rs (2)
  • Client (235-237)
  • new (161-195)
src/web/encryption_middleware.rs (1)
  • encrypt_response (91-105)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (3)
src/proxy_config.rs (2)

21-25: Whisper L3 constants: LGTM.

Adding provider-specific and canonical IDs looks correct and consistent with existing patterns.


44-49: MODEL_EQUIVALENCIES updated correctly.

Canonical→provider mapping for Whisper L3 is well-formed and aligns with the new constants.

pcrProdHistory.json (1)

316-318: No JSON comma issue — file validates.
Verified with jq: jq . pcrProdHistory.json parsed successfully (output: JSON OK).

@AnthonyRonning
Copy link
Copy Markdown
Contributor Author

closed for #99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant