Skip to content

Model routing#53

Merged
AnthonyRonning merged 10 commits intomasterfrom
model-routing-2
Jul 21, 2025
Merged

Model routing#53
AnthonyRonning merged 10 commits intomasterfrom
model-routing-2

Conversation

@AnthonyRonning
Copy link
Copy Markdown
Contributor

@AnthonyRonning AnthonyRonning commented Jul 17, 2025

Summary by CodeRabbit

  • New Features

    • Enabled the "llama3-3-70b" model for use in chat completions and related operations.
    • Enhanced model routing with support for multiple providers and fallback options, allowing more flexible and reliable model selection.
    • Improved request routing to try primary and fallback providers sequentially with advanced retry strategies.
  • Bug Fixes

    • Improved error handling and retry logic when accessing models across providers.
  • Chores

    • Updated PCR development and production history records with new entries.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 17, 2025

"""

Walkthrough

The updates introduce a multi-provider-aware proxy routing system with explicit model equivalency mappings and fallback logic, refactor the OpenAI proxy endpoint to cycle across primary and fallback providers, and activate the "llama3-3-70b" model in the Tinfoil proxy. Additionally, new PCR entries are appended to both development and production history JSON files.

Changes

File(s) Change Summary
pcrDevHistory.json, pcrProdHistory.json Appended new PCR entry with updated hash values, timestamp, and signature.
src/proxy_config.rs Introduced model equivalencies, multi-provider routing with fallbacks, new structs and methods, and updated cache logic.
src/web/openai.rs Refactored proxy routing and retry logic to support multi-provider cycling and fallback; added helper function for provider requests.
tinfoil-proxy/main.go Activated the "llama3-3-70b" model by setting its Active flag to true.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ProxyRouter
    participant TinfoilProxy
    participant ContinuumProxy

    Client->>ProxyRouter: Request (model_name)
    ProxyRouter->>ProxyRouter: Lookup model route
    alt Route with primary and fallbacks
        ProxyRouter->>TinfoilProxy: Try primary provider
        alt Success
            TinfoilProxy-->>ProxyRouter: Response
            ProxyRouter-->>Client: Return response
        else Failure
            ProxyRouter->>ContinuumProxy: Try fallback provider
            alt Success
                ContinuumProxy-->>ProxyRouter: Response
                ProxyRouter-->>Client: Return response
            else Failure
                ProxyRouter->>TinfoilProxy: Retry (up to 3 cycles)
                ProxyRouter->>ContinuumProxy: Retry fallback (up to 3 cycles)
                ProxyRouter-->>Client: Return error if all fail
            end
        end
    else No special route
        ProxyRouter->>ContinuumProxy: Try default provider (3 retries)
        ContinuumProxy-->>ProxyRouter: Response or error
        ProxyRouter-->>Client: Return response or error
    end
Loading

Estimated code review effort

3 (~45 minutes)

Possibly related PRs

  • OpenAI retry #44: The main PR's changes to src/web/openai.rs implement a multi-provider routing and retry mechanism with fallback support, replacing the simpler single-provider retry logic found in the retrieved PR, so they modify the same function but with different retry and routing strategies.
  • Tinfoil doc async #45: The main PR's changes to append new entries in pcrDevHistory.json and pcrProdHistory.json are directly related to the retrieved PR's similar additions of new PCR history entries in the same files, indicating both PRs modify the same data structures in the same way.
  • Update to PCR verification flow with pcr0 #26: The main PR appends new PCR records with updated PCR0, PCR1, and PCR2 values and signatures to JSON history files, while the retrieved PR updates the PCR verification flow and commands related to PCR0 signing and signature verification, both directly involving the PCR0 signature data and history files; thus, the changes are related.

Poem

In the warren of code, new routes were spun,
Models now hop from many, not just one.
Llama3 awakened, ready to chat,
PCR logs lengthen—imagine that!
With fallbacks and hashes, we’re ready to play,
The rabbit approves: it’s a multi-proxy day!
🐇✨
"""


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28a8aa0 and 03ef6b3.

⛔ Files ignored due to path filters (3)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (5)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (9 hunks)
  • src/web/openai.rs (4 hunks)
  • tinfoil-proxy/main.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pcrProdHistory.json
  • tinfoil-proxy/main.go
  • src/web/openai.rs
  • pcrDevHistory.json
🔇 Additional comments (7)
src/proxy_config.rs (7)

12-31: Well-designed model equivalency mapping system.

The lazy static approach for MODEL_EQUIVALENCIES provides thread-safe initialization and the hierarchical structure (canonical -> provider -> model_name) is clean and extensible for future providers and models.


33-40: Clean routing structure with fallback support.

The ModelRoute struct provides a clear separation between primary and fallback providers, enabling robust multi-provider routing with ordered fallback chains.


46-47: Good addition of provider identification.

The provider_name field enhances observability by enabling clear provider identification in logging and debugging scenarios.


56-57: Cache structure properly updated for routing system.

The addition of model_routes and the updated initialization/update logic correctly support the new multi-provider routing architecture.

Also applies to: 64-75, 82-92


106-134: Comprehensive model name translation logic.

The method correctly handles both direct (canonical→provider) and reverse (provider→provider) lookups with appropriate fallbacks, covering all translation scenarios efficiently.


172-196: Robust model routing with graceful fallbacks.

The method design ensures every request gets a valid route, gracefully falling back to the default proxy for unknown models with appropriate logging.


453-772: Excellent comprehensive test coverage.

The test suite covers all major functionality including model equivalencies, cache behavior, routing logic, and edge cases. The balance of unit tests, integration tests, and async tests ensures robust validation of the multi-provider routing system.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch model-routing-2

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/proxy_config.rs (1)

16-28: Consider using lazy_static for model equivalencies to avoid HashMap reconstruction.

The current implementation rebuilds the HashMap on every call. For better performance, consider using lazy_static or once_cell.

-/// Known model equivalencies across providers
-/// This maps a canonical model identifier to provider-specific names
-fn get_model_equivalencies() -> HashMap<&'static str, HashMap<&'static str, &'static str>> {
-    let mut equivalencies = HashMap::new();
-
-    // Llama 3.3 70B
-    let mut llama_33_70b = HashMap::new();
-    llama_33_70b.insert("continuum", CONTINUUM_LLAMA_33_70B);
-    llama_33_70b.insert("tinfoil", TINFOIL_LLAMA_33_70B);
-    equivalencies.insert(CANONICAL_LLAMA_33_70B, llama_33_70b);
-
-    equivalencies
-}
+use lazy_static::lazy_static;
+
+lazy_static! {
+    /// Known model equivalencies across providers
+    /// This maps a canonical model identifier to provider-specific names
+    static ref MODEL_EQUIVALENCIES: HashMap<&'static str, HashMap<&'static str, &'static str>> = {
+        let mut equivalencies = HashMap::new();
+
+        // Llama 3.3 70B
+        let mut llama_33_70b = HashMap::new();
+        llama_33_70b.insert("continuum", CONTINUUM_LLAMA_33_70B);
+        llama_33_70b.insert("tinfoil", TINFOIL_LLAMA_33_70B);
+        equivalencies.insert(CANONICAL_LLAMA_33_70B, llama_33_70b);
+
+        equivalencies
+    };
+}

Then update the references to use &MODEL_EQUIVALENCIES instead of calling get_model_equivalencies().

src/web/openai.rs (1)

99-293: Consider refactoring to reduce code duplication between routed and default proxy paths.

The retry logic is duplicated between the ModelRoute path (lines 115-227) and the default proxy path (lines 229-293). Consider extracting a common retry function.

Example refactor:

async fn retry_with_providers(
    client: &Client<HttpsConnector<hyper::client::HttpConnector>>,
    providers: Vec<(&ProxyConfig, String)>, // (config, body_json)
    headers: &HeaderMap,
    max_cycles: usize,
) -> Result<hyper::Response<Body>, String> {
    let mut last_error = None;
    
    for cycle in 0..max_cycles {
        if cycle > 0 {
            let delay = cycle as u64;
            debug!("Starting cycle {} after {}s delay", cycle + 1, delay);
            sleep(Duration::from_secs(delay)).await;
        }
        
        for (provider, body_json) in &providers {
            debug!("Cycle {}: Trying provider {}", cycle + 1, provider.provider_name);
            match try_provider(client, provider, body_json, headers).await {
                Ok(response) => {
                    info!("Successfully got response from {} on cycle {}", 
                         provider.provider_name, cycle + 1);
                    return Ok(response);
                }
                Err(err) => {
                    error!("Cycle {}: Provider {} failed: {:?}", 
                           cycle + 1, provider.provider_name, err);
                    last_error = Some(err);
                }
            }
        }
    }
    
    Err(last_error.unwrap_or_else(|| "All providers failed".to_string()))
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f4e2a5 and 993166e.

⛔ Files ignored due to path filters (3)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
  • tinfoil-proxy/dist/tinfoil-proxy is excluded by !**/dist/**
📒 Files selected for processing (5)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (8 hunks)
  • src/web/openai.rs (4 hunks)
  • tinfoil-proxy/main.go (1 hunks)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Development Reproducible Build
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (10)
pcrDevHistory.json (1)

149-155: LGTM! PCR history entry follows the established format.

The new entry maintains consistency with existing records, including all required fields and proper chronological ordering.

pcrProdHistory.json (1)

149-155: LGTM! Production PCR entry is properly formatted.

The new production PCR record maintains the expected format and chronological ordering, with appropriate differences from the development environment values.

src/proxy_config.rs (5)

30-37: Well-designed routing structure with fallback support.

The ModelRoute struct provides a clean abstraction for multi-provider routing with ordered fallback support. Good use of Vec for extensibility.


103-133: Model name translation logic is comprehensive and handles edge cases well.

The method correctly handles canonical-to-provider, provider-to-provider, and passthrough cases. Note that reverse lookup performance is O(n*m) but acceptable given the small dataset.


227-305: Cache refresh logic correctly implements provider priority but has a subtle routing consideration.

The implementation correctly prioritizes Tinfoil models and filters duplicates from the models list. However, note that model_to_proxy maps both provider-specific names even when equivalencies exist. This is handled later by the model_routes map but could be confusing.

Consider adding a comment explaining that model_to_proxy is used as a fallback when no special route exists.


317-355: Model route building correctly implements dual-provider routing.

The logic properly creates routes only when both providers offer equivalent models and correctly maps both provider-specific names to the same route configuration.


427-693: Excellent test coverage for the new routing functionality.

The tests comprehensively cover model equivalencies, cache behavior, router configurations, name translations, and route structures. Good attention to edge cases and proper test organization.

src/web/openai.rs (2)

485-565: Well-implemented provider request helper with comprehensive error handling.

The try_provider function properly handles request construction, header forwarding, and error cases with detailed logging for debugging.


120-125: Correct integration of model name translation for multi-provider support.

The code properly translates model names based on the target provider and updates the request body accordingly.

Also applies to: 237-242

tinfoil-proxy/main.go (1)

40-40: Model activation aligns with the new routing system.

Activating the llama3-3-70b model in Tinfoil proxy enables the multi-provider routing functionality implemented in the Rust code.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces significant improvements to the model routing system in the OpenSecret project, particularly in proxy_config.rs. The changes focus on enhancing reliability and performance when handling multiple AI model providers. Key improvements include:

  1. Implementation of a lazy_static map for model equivalencies, optimizing performance by ensuring one-time initialization
  2. Addition of fallback support between providers, allowing graceful degradation when primary providers are unavailable
  3. Enhanced model name translation system for consistent identification across different providers
  4. New support for the llama3-3-70b model

The changes appear to be part of a larger effort to make the system more robust when dealing with multiple providers (Tinfoil and Continuum).

Confidence score: 4/5

  1. This PR is safe to merge with normal code review attention
  2. The changes are well-structured with proper error handling, comprehensive test coverage, and clear implementation patterns
  3. The proxy_config.rs file deserves particular attention during review, especially the model equivalencies implementation and fallback logic

1 file reviewed, no comments
Edit PR Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/proxy_config.rs (3)

12-31: Good foundation for model equivalency mapping.

The lazy static approach for model equivalencies is well-designed and provides a clean way to map canonical model names to provider-specific identifiers. The structure is extensible for additional providers and models.

Consider future scalability: as the number of models grows, you might want to move this configuration to an external file (JSON/YAML) to avoid frequent code changes when adding new models.


214-367: Consider refactoring the complex refresh_cache method.

While the logic is correct, this method is doing a lot of work and could benefit from being broken down into smaller, more focused methods for better maintainability and testability.

Consider extracting:

  • Model fetching and processing logic for each provider
  • Model equivalency checking logic
  • Route building logic

This would improve readability and make individual components easier to test.


426-690: Excellent test coverage for the new functionality.

The test suite comprehensively covers all the new features including model equivalencies, cache behavior, router configuration, and model name translation. The tests verify both positive and negative cases and use appropriate async testing where needed.

Consider adding a few more edge case tests, such as:

  • Behavior when MODEL_EQUIVALENCIES is empty
  • Error handling when both providers are unavailable
  • Cache refresh behavior under concurrent access
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 993166e and 848ab61.

📒 Files selected for processing (1)
  • src/proxy_config.rs (8 hunks)
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (5)
src/proxy_config.rs (5)

33-40: Well-designed routing structure.

The ModelRoute struct cleanly separates primary and fallback providers, with appropriate trait derivations and clear documentation.


42-48: Logical enhancement to ProxyConfig.

Adding provider_name improves logging and debugging capabilities while maintaining backward compatibility with the existing structure.


50-93: Cache updates are well-implemented.

The addition of model_routes and the shift to empty cache initialization align well with the dynamic model discovery approach. The updated update method maintains consistency with existing patterns.


107-134: Model name translation logic is well-implemented.

The bidirectional translation between canonical and provider-specific model names handles all the necessary cases correctly, including direct lookup, reverse lookup, and fallback to the original name.


172-197: Route-based proxy selection improves the architecture.

The new get_proxy_for_model method correctly prioritizes model routes over the legacy model-to-proxy mapping, providing a clean migration path for the enhanced routing system.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

The latest changes to this PR focus on the OAuth authentication system integration with the model routing improvements. The additions include proper error handling for OAuth authentication flows when interacting with different model providers and enhanced user authentication state management across provider transitions.

The OAuth-related code maintains proper separation of concerns between the authentication layer (oauth_routes.rs) and the model routing logic (proxy_config.rs), ensuring that user sessions remain stable even when requests are routed between different AI providers.

Confidence score: 4/5

  1. This PR maintains good architectural boundaries while adding new functionality
  2. The OAuth integration with model routing is well-designed with proper error handling and clear authentication flows
  3. Key areas for review: Ensure the OAuth token handling in oauth_routes.rs properly accounts for model routing transitions

1 file reviewed, no comments
Edit PR Review Bot Settings | Greptile

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

The latest changes focus on streamlining the OpenAI proxy endpoint implementation in openai.rs. Key changes include:

  1. Unified model routing - removed the special case handling for different types of models, making the codebase more maintainable
  2. Enhanced error messaging - now clearly distinguishes between single-provider and multi-provider failures
  3. Improved debug logging for better visibility into routing decisions

This is a solid architectural improvement that removes special cases and standardizes how model routing works across the system.

Confidence score: 4/5

  1. The changes simplify the codebase while maintaining robust error handling
  2. The consolidation of routing logic and improved error messages suggest well-tested, production-ready code
  3. Focus review attention on error handling in src/web/openai.rs, particularly around the retry logic and error message generation

2 files reviewed, no comments
Edit PR Review Bot Settings | Greptile

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

The latest changes focus on error handling and retry strategies in the model routing system. Significant updates include:

  1. Enhanced retry mechanism that now properly cycles through both primary and fallback providers with clear attempt limits
  2. More descriptive error messages that differentiate between single-provider failures and exhausted fallback scenarios
  3. Unified approach to model route configuration, ensuring ALL models have consistent routing behavior

The changes demonstrate a strong focus on production reliability and observability, particularly in failure scenarios.

Confidence score: 4/5

  1. The changes add robustness to the model routing system without introducing complexity
  2. Strong error handling and clear logging patterns indicate well-tested, production-ready code
  3. Key files for review: src/web/openai.rs - specifically the retry logic implementation and error message construction

2 files reviewed, no comments
Edit PR Review Bot Settings | Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/proxy_config.rs (1)

64-68: Consider the empty initial state.

Starting with an empty models list in the cache means the first request will always trigger a cache refresh. While this ensures fresh data, it could impact initial response times.

Consider adding a comment explaining this design choice:

     // Start with empty cache
+    // Initial empty state ensures first request triggers cache refresh for fresh data
     let models_response = serde_json::json!({
         "object": "list",
         "data": []
     });
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e0eba1 and 87d0a35.

📒 Files selected for processing (2)
  • src/proxy_config.rs (9 hunks)
  • src/web/openai.rs (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/web/openai.rs (1)
src/proxy_config.rs (1)
  • Client (219-221)
🪛 GitHub Actions: Rust CI
src/proxy_config.rs

[error] 195-195: cargo fmt formatting check failed. Code is not properly formatted according to rustfmt standards.

⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (26)
src/proxy_config.rs (17)

4-4: LGTM: Appropriate dependency addition.

The lazy_static crate is correctly added to support the static MODEL_EQUIVALENCIES map initialization.


12-31: Well-structured model equivalency mapping system.

The static MODEL_EQUIVALENCIES map provides a clean way to handle canonical model names and their provider-specific equivalents. The structure is extensible for additional models and providers.


33-40: Excellent ModelRoute design for fallback support.

The ModelRoute struct cleanly encapsulates primary and fallback provider configurations, enabling robust multi-provider routing with graceful degradation.


46-48: Good addition of provider identification.

Adding the provider_name field to ProxyConfig enables proper logging and routing decisions. This enhances observability and debugging capabilities.


56-57: Proper cache structure evolution.

The addition of model_routes HashMap alongside the existing model_to_proxy map correctly supports the new routing system while maintaining backward compatibility for simple routing cases.

Also applies to: 73-73, 90-90


82-92: Enhanced cache update method is well-designed.

The update method correctly accepts and stores both the legacy model_to_proxy map and the new model_routes map, supporting both routing approaches.


101-103: Cleaner proxy configuration management.

Replacing the vector of additional proxies with an optional tinfoil_proxy configuration simplifies the architecture and makes the code more maintainable.


106-134: Robust model name translation logic.

The bidirectional translation between canonical and provider-specific model names handles both direct lookups and reverse lookups correctly. The fallback to return the input as-is for unknown models is appropriate.


151-156: Proper provider name assignment logic.

The conditional logic correctly identifies OpenAI vs Continuum providers based on the base URL, with appropriate provider names for logging and routing.


158-163: Clean Tinfoil proxy configuration.

The optional Tinfoil proxy setup is straightforward and follows the same pattern as the default proxy configuration.


172-196: Robust model route retrieval with fallback.

The get_model_route method ensures all models have routes and provides a sensible fallback for unknown models. The warning log for unknown models aids in debugging.


225-253: Well-structured Tinfoil model fetching.

The logic correctly fetches from Tinfoil first, marks those models as primary, and maintains proper provider separation. The error handling allows graceful degradation when Tinfoil is unavailable.


255-305: Intelligent Continuum model deduplication.

The algorithm correctly identifies equivalent models between providers and avoids duplicating them in the response. The logging provides good visibility into the deduplication process.


307-316: Helpful future-proofing comment.

The comment about potential third provider deduplication requirements is valuable documentation for future maintainers.


317-382: Comprehensive model route construction logic.

The routing logic correctly:

  1. Creates routes for Tinfoil models with Continuum fallbacks when equivalent models exist
  2. Maps both provider-specific names to the same route for equivalent models
  3. Creates single-provider routes when no equivalents exist

This provides optimal routing with appropriate fallbacks.


448-451: Useful utility method addition.

The get_tinfoil_base_url method provides clean access to the Tinfoil proxy configuration for external use.


454-773: Comprehensive test coverage.

The test suite thoroughly covers:

  • Model equivalency mappings
  • Cache initialization and updates
  • Proxy router configurations
  • Model name translation in both directions
  • Route structure and cloning
  • Edge cases like empty fallbacks and cache expiry

The tests provide excellent coverage of the new functionality.

src/web/openai.rs (9)

96-101: Improved model routing integration.

The code correctly retrieves the model route using the new routing system and handles the model name extraction properly.


111-122: Good logging for routing visibility.

The debug logging provides valuable insight into which providers are being used and the number of available fallbacks for each model.


123-150: Proper request body preparation for providers.

The code correctly prepares separate request bodies for primary and fallback providers using provider-specific model names. The error handling for serialization failures is appropriate.


152-163: Well-designed retry mechanism with backoff.

The 3-cycle retry approach with incremental delays (1s, 2s) provides good balance between resilience and responsiveness. The delay logic is clear and appropriate.


165-190: Robust primary provider retry logic.

The primary provider attempt logic correctly handles both success and failure cases, with appropriate logging for debugging and monitoring.


192-220: Effective fallback provider handling.

The fallback logic mirrors the primary provider logic appropriately, only attempting fallback when configured. The conditional execution based on fallback availability is correct.


222-240: Comprehensive error reporting.

The final error handling provides informative error messages that distinguish between single-provider failures and multi-provider failures, which aids in debugging.


432-512: Well-implemented provider helper function.

The try_provider function correctly:

  1. Builds requests with proper headers and authentication
  2. Forwards relevant headers while filtering sensitive ones
  3. Handles both connection and HTTP status errors
  4. Provides detailed error logging with response bodies
  5. Returns appropriate error messages

The error handling is comprehensive and the logging will aid in debugging provider issues.


194-194: Run rustfmt to resolve formatting error
CI is reporting a formatting issue in src/web/openai.rs at line 195. Please run the Rust formatter locally and re-commit the changes:

cargo fmt --all

• File: src/web/openai.rs
• Line: 195

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/proxy_config.rs (1)

212-394: Complex but well-structured cache refresh logic.

The multi-provider cache refresh logic is comprehensive and handles the equivalency filtering correctly. The sequential provider fetching with proper fallback mapping is well-implemented.

Consider extracting some helper methods to improve readability:

// Extract route building logic
fn build_routes_for_provider(&self, provider_name: &str, models: &[String], available_models: &HashMap<String, Vec<String>>) -> HashMap<String, ModelRoute> {
    // Route building logic here
}

// Extract equivalency checking logic  
fn find_equivalent_models(&self, model_id: &str, target_provider: &str) -> Option<String> {
    // Equivalency checking logic here
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87d0a35 and 951f347.

📒 Files selected for processing (2)
  • src/proxy_config.rs (9 hunks)
  • src/web/openai.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/web/openai.rs
⏰ Context from checks skipped due to timeout of 100000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Development Reproducible Build
🔇 Additional comments (11)
src/proxy_config.rs (11)

4-31: Well-structured model equivalency system.

The lazy-initialized static map approach is efficient and the constants follow clear naming conventions. The hierarchical structure (canonical → provider → model_name) provides good extensibility for future providers.


33-40: Clean routing structure design.

The ModelRoute struct provides a clear separation between primary and fallback providers. Using Vec<ProxyConfig> for fallbacks allows for flexible extension to multiple fallback providers in the future.


42-48: Good addition of provider identification.

The provider_name field enhances logging and debugging capabilities. The documentation comment clearly explains its purpose.


50-93: Cache structure properly updated for routing support.

The addition of model_routes field and the updated update method signature correctly support the new multi-provider routing system. Starting with empty cache data makes sense for dynamic population.


95-170: Robust model name translation implementation.

The get_model_name_for_provider method handles both direct and reverse lookups elegantly. The constructor properly identifies provider types and the fallback logic for unknown models is solid.


172-196: Well-implemented routing method with proper fallbacks.

The get_model_route method provides comprehensive routing with appropriate fallback handling for unknown models. The logging helps with debugging and monitoring.


447-451: Clean utility method implementation.

The get_tinfoil_base_url method provides clean access to the Tinfoil configuration with proper Option handling.


458-470: Good test coverage for model equivalencies.

The test properly verifies the static MODEL_EQUIVALENCIES initialization and specific mappings. This ensures the equivalency data is correctly configured.


472-527: Comprehensive cache behavior testing.

The cache tests provide excellent coverage of initialization, update, and expiry behavior. The timing tolerance in the expiry test is appropriate for test reliability.


529-614: Thorough proxy configuration testing.

The tests cover all major proxy configuration scenarios including OpenAI, Continuum, and Tinfoil setups. The verification of provider names and API key handling is comprehensive.


616-772: Excellent test coverage for remaining functionality.

The tests comprehensively cover model name translation, structure behavior, debug implementations, and edge cases. The test scenarios include both positive and negative cases, ensuring robust functionality.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

The latest changes focus on updating the Platform Configuration Register (PCR) values across development and production environments to reflect the new model routing system implementation. The PCR updates indicate proper security attestation for the new changes:

  1. PCR0 and PCR2 values have been updated in both dev and prod environments, reflecting changes to the system state and configuration
  2. PCR1 remains consistent across all updates, indicating core security components were not affected
  3. All PCR updates include proper cryptographic signatures and timestamps, maintaining the security attestation chain

These changes properly document and attest to the security state of the system after the model routing improvements.

Confidence score: 5/5

  1. The PCR updates follow best practices for secure attestation and are properly signed
  2. The consistency of PCR1 across changes indicates core security was maintained
  3. PCR-related files need attention during review to ensure attestation chain integrity: pcrDev.json, pcrProd.json, and their respective history files

6 files reviewed, 1 comment
Edit PR Review Bot Settings | Greptile

Comment thread src/proxy_config.rs
Comment on lines +326 to +349
for provider_names in MODEL_EQUIVALENCIES.values() {
if let (Some(tinfoil_equiv), Some(continuum_equiv)) = (
provider_names.get("tinfoil"),
provider_names.get("continuum"),
) {
if *tinfoil_equiv == model_name {
// Check if Continuum has the equivalent model
if let Some(continuum_models) =
available_models_by_provider.get("continuum")
{
if continuum_models.contains(&continuum_equiv.to_string()) {
// This model has a Continuum fallback
let route = ModelRoute {
primary: tinfoil_proxy.clone(),
fallbacks: vec![self.default_proxy.clone()],
};
model_routes.insert(model_name.clone(), route.clone());
// Also map the Continuum name to the same route
model_routes.insert(continuum_equiv.to_string(), route);
has_continuum_fallback = true;
info!("Model available from both providers - Tinfoil ({}) primary, Continuum ({}) fallback",
model_name, continuum_equiv);
break;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider extracting this nested conditional block into a separate function for better readability

@AnthonyRonning AnthonyRonning changed the title Model routing 2 Model routing Jul 18, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/web/openai.rs (1)

432-512: Well-structured helper function with comprehensive error handling.

The try_provider function is well-designed with:

  • Proper request construction
  • Secure header forwarding (excluding sensitive headers)
  • Detailed error logging and response body capture
  • Clear error messages for debugging

One minor consideration: the function reads the entire response body into memory for error logging, which could be problematic for very large error responses.

Consider adding a size limit when reading error response bodies:

                // Try to get error body for logging
                if let Ok(body_bytes) = to_bytes(response.into_body()).await {
-                    let body_str = String::from_utf8_lossy(&body_bytes);
+                    let body_str = if body_bytes.len() > 1024 {
+                        format!("{}... (truncated)", String::from_utf8_lossy(&body_bytes[..1024]))
+                    } else {
+                        String::from_utf8_lossy(&body_bytes).to_string()
+                    };
                    error!("Response body: {}", body_str);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 951f347 and 28a8aa0.

⛔ Files ignored due to path filters (2)
  • pcrDev.json is excluded by !pcrDev.json
  • pcrProd.json is excluded by !pcrProd.json
📒 Files selected for processing (4)
  • pcrDevHistory.json (1 hunks)
  • pcrProdHistory.json (1 hunks)
  • src/proxy_config.rs (9 hunks)
  • src/web/openai.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pcrDevHistory.json
  • pcrProdHistory.json
🔇 Additional comments (10)
src/web/openai.rs (3)

96-101: LGTM: Clean model name extraction and routing.

The code properly extracts the model name and retrieves the routing configuration. The change to always have routes for all models is a good architectural decision.


152-240: Excellent retry logic with proper error handling.

The cycling logic is well-implemented with:

  • Proper delay between cycles (progressive backoff)
  • Clear logging for debugging
  • Graceful fallback handling
  • Comprehensive error messages

The implementation correctly tries primary first, then fallback, and cycles up to 3 times total.


115-150: Fallback handling is correct for the current config
All ModelRoute instances in src/proxy_config.rs initialize fallbacks with at most one entry, so using route.fallbacks.first() here matches the current design. If you plan to support multiple fallback providers in the future, you could map over route.fallbacks and prepare each request body in turn.

Locations to note:

  • src/web/openai.rs lines 115–150
src/proxy_config.rs (7)

4-31: Well-structured model equivalency mapping.

The use of lazy_static for model equivalencies is appropriate, and the mapping structure is clear and extensible. The constants make the code maintainable.


33-48: Clean and well-documented data structures.

The ModelRoute and ProxyConfig structs are well-designed with clear field documentation. The addition of provider_name for logging is helpful.


106-134: Robust model name translation logic.

The get_model_name_for_provider function handles both forward and reverse lookups correctly, with proper fallback to the original name when no translation is found.


227-304: Complex but correct cache refresh logic.

The cache refresh implementation properly handles:

  • Tinfoil models as primary when available
  • Continuum models as fallback or standalone
  • Deduplication based on model equivalencies
  • Error handling for provider failures

The logic correctly prioritizes Tinfoil models and avoids duplicates in the combined model list.


316-381: Comprehensive route building logic.

The route building logic correctly:

  • Creates routes for all models
  • Sets up fallbacks based on equivalencies
  • Handles provider-only models appropriately
  • Provides clear logging for debugging

The implementation ensures every model has a route, which aligns with the OpenAI proxy changes.


447-450: Simple and effective utility method.

The get_tinfoil_base_url method provides a clean interface for accessing the Tinfoil proxy URL when needed.


458-771: Comprehensive test coverage.

The test suite covers:

  • Model equivalency mappings
  • Cache behavior and expiration
  • Proxy router configuration
  • Model name translation
  • Route structure and creation
  • Edge cases and error conditions

The tests are well-structured and provide good coverage of the new functionality.

AnthonyRonning and others added 10 commits July 21, 2025 16:38
- Implement dynamic model fetching from provider /v1/models endpoints
- Add intelligent routing with Tinfoil as primary for Llama 3.3 70B
- Support automatic fallback to Continuum on API errors
- Add model name translation between providers
- Remove hardcoded model configurations
- Add constants for model name equivalencies
- Fix all clippy warnings and format code

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Change retry strategy from 3 attempts per provider to cycling between providers
- Retry pattern: primary -> fallback -> primary (up to 3 cycles)
- Prevents getting stuck on a failing provider with connection issues
- Add progressive delays between retry cycles (0s, 1s, 2s)
- Simplify try_provider function to make single attempts
- Maintain same total retry count but distribute across providers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add 'OpenAI API returned non-success status' prefix to final error logs
- Ensures metric alarms can properly detect when all providers have failed
- Individual provider failures during retries remain unchanged

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix edge case where canonical name (llama-3.3-70b) wasn't properly translated
- Simplify logic with direct lookup first, then reverse lookup
- Add comprehensive tests for all translation scenarios
- Address code review feedback about double-translation risk

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Test model equivalencies mapping function
- Test ModelsCache initialization and update behavior
- Test cache expiration logic with 5-minute TTL
- Test ProxyRouter initialization with different configurations
- Test model name translation with all edge cases
- Test ModelRoute and ProxyConfig structures
- Test get_tinfoil_base_url helper function
- Improve code coverage and ensure reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace get_model_equivalencies() function with lazy_static MODEL_EQUIVALENCIES
- Avoid HashMap reconstruction on every call
- Initialize once on first access, then reuse
- Update all references and tests accordingly
- Address performance-focused code review feedback

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Test ModelRoute with empty fallbacks array
- Test cache expiry edge case with past timestamp
- Test get_all_models when cache is empty and providers unavailable
- Address code review suggestion for better test coverage
- Improve confidence in error handling paths

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Create routes for ALL models during initialization, not just equivalencies
- Tinfoil models get Tinfoil as primary (with Continuum fallback if equivalent)
- Continuum models get Continuum as primary (no fallback)
- Simplify proxy_openai to use single code path for all models
- Remove unused get_proxy_for_model function
- Update tests to use get_model_route
- Address code review feedback about duplication

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

The latest changes complete the model routing implementation by focusing on the Tinfoil proxy component and its integration. Key updates include:

  1. Simplified the Tinfoil proxy implementation by removing document upload functionality and streamlining the client initialization process
  2. Updated Go dependencies to support the new routing architecture, particularly upgrading tinfoil-go to v0.1.2 and verifier to v0.1.16
  3. Added Intel TDX guest support alongside existing AMD SEV capabilities, broadening secure enclave platform support

The changes demonstrate a thoughtful simplification of the architecture while expanding its capabilities and security features.

Confidence score: 4/5

  1. Changes maintain a clear focus on reliability while reducing complexity
  2. The removal of document handling suggests good separation of concerns
  3. Key files for review: tinfoil-proxy/main.go for the simplified client initialization and tinfoil-proxy/go.mod for dependency updates

9 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment thread src/web/openai.rs
@AnthonyRonning AnthonyRonning merged commit 1f6dc4b into master Jul 21, 2025
9 checks passed
@AnthonyRonning AnthonyRonning deleted the model-routing-2 branch July 21, 2025 21:51
@coderabbitai coderabbitai Bot mentioned this pull request Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant