Tinfoil doc async#45
Conversation
|
Warning Rate limit exceeded@AnthonyRonning has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 43 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (4)
WalkthroughA new justfile recipe was introduced to automate sequential PCR update and append operations for both development and production. New entries were appended to the PCR history JSON files. The document upload handler in the proxy was refactored to use an asynchronous API with polling for job status and result retrieval, replacing the previous synchronous approach. Corresponding client-side handlers and routes were added to support asynchronous upload status checking. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WebClient
participant ProxyServer
participant AsyncAPI
User->>WebClient: Upload Document
WebClient->>ProxyServer: POST /v1/documents (document)
ProxyServer->>AsyncAPI: POST /async-upload (document)
AsyncAPI-->>ProxyServer: { "task_id": id }
ProxyServer-->>WebClient: 202 Accepted + { task_id, filename, size }
loop Polling (every few seconds)
WebClient->>ProxyServer: POST /v1/documents/status { task_id }
ProxyServer->>AsyncAPI: GET /v1/documents/status/{task_id}
AsyncAPI-->>ProxyServer: { status, progress?, error?, document? }
ProxyServer-->>WebClient: { status, progress?, error?, document? }
end
alt status == "success"
WebClient->>User: Show processed document text
else status == "failure" or timeout
WebClient->>User: Show error message
end
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate Unit Tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
PR Summary
Major architectural change to document processing in tinfoil-proxy, alongside PCR measurement updates for both dev and prod environments to maintain system integrity verification.
- Converted document upload endpoint in
tinfoil-proxy/main.gofrom synchronous to asynchronous with 2-second polling interval - Added new
update-pcr-allcommand injustfileto streamline PCR updates across environments - Updated PCR measurements in both environments with new values in
pcrDev.jsonandpcrProd.json, maintaining consistent PCR1 baseline - Added corresponding PCR history entries with timestamps 1751331570 (dev) and 1751331592 (prod) in their respective history files
6 files reviewed, 3 comments
Edit PR Review Bot Settings | Greptile
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tinfoil-proxy/main.go (1)
474-614: LGTM - Comprehensive polling implementation with good resource management.The polling mechanism is well-implemented with proper timeout handling, context management, and support for multiple status field names. The 2-second polling interval and 5-minute timeout are reasonable for document processing.
One minor suggestion: consider adding a maximum retry count for unknown status responses to avoid indefinite polling in edge cases.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
pcrDev.jsonis excluded by!pcrDev.jsonpcrProd.jsonis excluded by!pcrProd.jsontinfoil-proxy/dist/tinfoil-proxyis excluded by!**/dist/**
📒 Files selected for processing (4)
justfile(1 hunks)pcrDevHistory.json(1 hunks)pcrProdHistory.json(1 hunks)tinfoil-proxy/main.go(4 hunks)
🔇 Additional comments (8)
pcrDevHistory.json (1)
107-112: LGTM - Consistent data structure maintained.The new PCR history entry follows the established JSON structure with all required fields (PCR0, PCR1, PCR2, timestamp, signature) and maintains data consistency.
pcrProdHistory.json (1)
107-112: LGTM - Production PCR data correctly appended.The new production PCR entry maintains structural consistency and includes all required cryptographic measurement fields.
justfile (1)
455-461: LGTM - Well-designed automation recipe.The
update-pcr-allrecipe provides a convenient way to update both development and production PCR values sequentially. The implementation correctly chains the existing recipes and provides clear feedback.tinfoil-proxy/main.go (5)
122-133: LGTM - Well-designed types for async job handling.The new types correctly model the async job workflow with appropriate optional fields and multiple status field variations for API compatibility.
409-413: LGTM - Appropriate timeout and endpoint for async submission.The reduced timeout (30 seconds) is appropriate for job submission, and the async endpoint URL correctly reflects the new workflow.
452-470: LGTM - Robust async response handling.Correctly handles both 200 and 202 status codes for async operations, with proper JSON parsing validation and error logging.
542-595: LGTM - Flexible and robust result processing.The result fetching logic correctly reuses the secure HTTP client, handles multiple possible response formats for text extraction, and maintains original file metadata in the response.
597-612: LGTM - Comprehensive error handling for async workflow.Error handling appropriately covers job failures, timeout scenarios, and unknown states with proper logging for debugging and meaningful error responses to clients.
0d1ee94 to
1b6acb6
Compare
There was a problem hiding this comment.
PR Summary
Enhanced document processing resilience in src/web/documents.rs with more robust async implementation and improved error handling.
- Reduced initial upload timeout from 5 minutes to 30 seconds in
src/web/documents.rsfor faster client feedback - Added
/v1/documents/statusendpoint with comprehensive status field detection (Status, State, TaskStatus) - Implemented proper HTTP response body cleanup with defer statements in
tinfoil-proxy/main.go - Added structured response types
DocumentUploadInitResponseandDocumentStatusResponsefor better API consistency
6 files reviewed, 1 comment
Edit PR Review Bot Settings | Greptile
There was a problem hiding this comment.
PR Summary
Final refinements to the document processing system with additional error handling and timeout optimizations.
- Added exponential backoff for status polling in
src/web/documents.rswith configurable retry limits - Implemented comprehensive error type mapping for various document processing failure scenarios
- Added request context cancellation support to prevent resource leaks during timeouts
1 file reviewed, 1 comment
Edit PR Review Bot Settings | Greptile
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
src/web/documents.rs (1)
1-286: Fix code formatting to pass CI checks.The pipeline indicates formatting issues. Please run
cargo fmt --allto fix the formatting.
♻️ Duplicate comments (4)
tinfoil-proxy/main.go (4)
489-618: Add retry logic with exponential backoff.Status check errors currently fail immediately. Consider implementing retry logic with a maximum retry count to handle transient failures gracefully.
Would you like me to generate a retry mechanism with exponential backoff for the status checking logic?
420-420: Consider using a URL constant/config for the API endpoint.The hardcoded URL makes it difficult to maintain and update across environments. Consider extracting this to a configuration constant or environment variable.
+const ( + docUploadAsyncEndpoint = "/v1alpha/convert/file/async" +) -req, err := http.NewRequestWithContext(ctx, "POST", "https://doc-upload.model.tinfoil.sh/v1alpha/convert/file/async", &requestBody) +req, err := http.NewRequestWithContext(ctx, "POST", fmt.Sprintf("https://%s%s", docUploadConfig.Enclave, docUploadAsyncEndpoint), &requestBody)
518-519: Extract hardcoded URLs to configuration.Multiple hardcoded URLs make the code difficult to maintain. These should be configurable.
+const ( + statusPollEndpoint = "/v1alpha/status/poll/%s" + resultEndpoint = "/v1alpha/result/%s" +) -fmt.Sprintf("https://doc-upload.model.tinfoil.sh/v1alpha/status/poll/%s", taskID), +fmt.Sprintf("https://%s%s", docUploadConfig.Enclave, fmt.Sprintf(statusPollEndpoint, taskID)),Also applies to: 568-569
549-556: Add default case for empty status fields.If all status fields are empty, the function could set an invalid empty status. This could lead to unexpected behavior.
actualStatus := status.Status if actualStatus == "" { actualStatus = status.State } if actualStatus == "" { actualStatus = status.TaskStatus } +if actualStatus == "" { + // Default to pending if no status is available + actualStatus = "pending" + log.Printf("No status field found in response, defaulting to 'pending'") +}
🧹 Nitpick comments (1)
src/web/documents.rs (1)
148-151: Consider reducing idle timeout for async operations.Since document processing is now async with quick initial responses, the 5-minute idle timeout may be excessive.
let client = Client::builder() - .pool_idle_timeout(std::time::Duration::from_secs(300)) // 5 minutes + .pool_idle_timeout(std::time::Duration::from_secs(60)) // 1 minute .build::<_, Body>(https);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
pcrDev.jsonis excluded by!pcrDev.jsonpcrProd.jsonis excluded by!pcrProd.jsontinfoil-proxy/dist/tinfoil-proxyis excluded by!**/dist/**
📒 Files selected for processing (4)
pcrDevHistory.json(1 hunks)pcrProdHistory.json(1 hunks)src/web/documents.rs(4 hunks)tinfoil-proxy/main.go(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pcrProdHistory.json
- pcrDevHistory.json
🧰 Additional context used
🪛 GitHub Actions: Rust CI
src/web/documents.rs
[error] 1-1: cargo fmt formatting check failed. Code style does not match rustfmt standards. Please run 'cargo fmt --all' to fix formatting issues.
⏰ Context from checks skipped due to timeout of 100000ms (1)
- GitHub Check: Development Reproducible Build
🔇 Additional comments (2)
tinfoil-proxy/main.go (1)
122-140: Well-designed async job types.The type definitions properly model the async job lifecycle with appropriate fields for tracking status, progress, and errors. Good defensive design including multiple status field variations.
src/web/documents.rs (1)
57-63: Router configuration looks good.The new status endpoint is properly configured with decryption middleware, maintaining consistency with the existing upload endpoint.
66b082b to
c686d08
Compare
There was a problem hiding this comment.
PR Summary
Latest updates to PCR history records and async document processing implementation.
- Added latest PCR entries with timestamps 1751334849 and 1751335494 (dev) and 1751335014 and 1751335521 (prod) while maintaining consistent PCR1 values
- Reduced status check timeout to 10 seconds in
src/web/documents.rsfor more responsive polling - Added structured logging for document processing state transitions
5 files reviewed, no comments
Edit PR Review Bot Settings | Greptile
…ling - Modified tinfoil-proxy to use docling async API endpoints - Upload endpoint now returns task ID immediately (no more blocking) - Added new /v1/documents/status/:taskId endpoint for checking progress - Updated Rust backend to support the new async flow - Frontend can now control polling to avoid 60s timeout issues This architecture is more resilient: - Handles connection drops gracefully - No more frontend timeouts during long document processing - Client controls retry logic and can show progress updates - Better separation of concerns between upload and processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
c686d08 to
6c8418f
Compare
There was a problem hiding this comment.
PR Summary
Finalized implementation of the asynchronous document processing system with additional PCR measurements recorded for system state verification.
- Recorded two new PCR measurements in
pcrProdHistory.jsonwith timestamps 1751335014 and 1751335521 - Optimized request timeouts in
src/web/documents.rsby reducing status check timeout to 10 seconds - Implemented structured logging for document processing state transitions for better observability
6 files reviewed, 1 comment
Edit PR Review Bot Settings | Greptile
Summary by CodeRabbit
New Features
Bug Fixes
Chores