Agent params #792

alexmanle · 2026-01-27T22:03:42Z

Summary

Adds a simple interface to override agents hyperparameters.

Test Plan

Tested using AIConfigurator workload and GA agent.
Test scenario

Example TOML:

[agent_config]
population_size = 0
n_offsprings = 0
crossover_prob = 0.0
mutation_prob = 0.0
random_seed = 0

New terminal output:

2026-01-27 13:05:11,633 - INFO - Applying agent config overrides for 'ga': {'n_offsprings': 0, 'crossover_prob': 0.0, 'mutation_prob': 0.0, 'random_seed': 0}

coderabbitai · 2026-01-27T22:04:28Z

📝 Walkthrough

Walkthrough

Added Pydantic agent configuration models, extended TestDefinition with an optional agent_config field, and implemented runtime validation of agent overrides in CLI handlers that validates per-agent config types and conditionally supplies validated kwargs to agent constructors.

Changes

Cohort / File(s)	Summary
Agent configuration models `src/cloudai/models/agent_config.py`	New Pydantic module introducing `AgentConfig` (base, extra="forbid") and specialized models: `GeneticAlgorithmConfig`, `BayesianOptimizationConfig`, `MultiArmedBanditConfig` with typed, constrained optional fields and metadata.
Workload model update `src/cloudai/models/workload.py`	Added `agent_config: Optional[dict[str, Any]] = None` to `TestDefinition` to carry agent override data.
Runtime validation & handler changes `src/cloudai/cli/handlers.py`	Added `validate_agent_overrides(agent_type, agent_config) -> dict[str, Any]`, imported `ValidationError` and agent config models, and updated agent construction flow to validate overrides per-agent type, log field-level ValidationError details, skip invalid overrides, and pass validated kwargs when present.
Miscellaneous `manifest_file`, `requirements.txt`	Updated manifest/requirements to reflect new module/dependencies (lines added).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I nibble on configs, tidy and neat,
Seeds for GA, BO, and arms that compete,
Pydantic keeps rules, no surprises today,
Validated overrides guide each agent's play,
Hop, ship, and repeat — carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Agent params' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changeset.	Use a more specific title that describes the main feature, such as 'Add agent hyperparameter override configuration' or 'Support agent config overrides in DSE jobs'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the purpose, implementation, and testing of the agent hyperparameter override feature with concrete examples.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-01-27T22:07:09Z

Greptile Overview

Greptile Summary

This PR adds a clean interface for overriding agent hyperparameters through TOML configuration files. The implementation introduces Pydantic models (GeneticAlgorithmConfig, BayesianOptimizationConfig, MultiArmedBanditConfig) to validate agent-specific parameters with proper constraints, and integrates validation into the DSE job handler with helpful error messages.

Key changes:

Added agent_config field to TestDefinition model to store hyperparameter overrides
Created typed config classes with validation constraints (e.g., population_size >= 2, probability ranges 0.0-1.0)
Implemented validate_agent_overrides() function that validates configs and logs applied overrides
Added proper error handling with detailed validation messages when configs are invalid

Notable behaviors:

Unknown agent types (like grid_search) with agent_config silently ignore the config with only debug-level logging
Empty override dicts are correctly handled to avoid passing kwargs to agents that don't accept them
The conditional logic if agent_overrides else relies on empty dict being falsy, which works but could be more explicit

Confidence Score: 4/5

This PR is safe to merge with minor style improvements recommended
The implementation is well-structured with proper validation and error handling. Pydantic models provide type safety and constraint validation. The integration is straightforward and the previous thread's concerns about GridSearchAgent are addressed by the empty dict check. Minor points for improvement include making the conditional logic more explicit and improving user feedback for ignored configs.
No files require special attention - all changes are straightforward with good validation

Important Files Changed

Filename	Overview
src/cloudai/models/agent_config.py	New file defining Pydantic config models for GA, BO, and MAB agent hyperparameter validation
src/cloudai/models/workload.py	Added `agent_config` field to `TestDefinition` to store agent hyperparameter overrides
src/cloudai/cli/handlers.py	Added `validate_agent_overrides` function and validation logic to apply agent config overrides with error handling

Sequence Diagram

sequenceDiagram
    participant User
    participant Handler as handle_dse_job
    participant Validator as validate_agent_overrides
    participant ConfigClass as GeneticAlgorithmConfig/<br/>BayesianOptimizationConfig/<br/>MultiArmedBanditConfig
    participant Registry
    participant Agent as GA/BO/MAB/GridSearch Agent
    participant Env as CloudAIGymEnv

    User->>Handler: Run DSE job with agent_config in TOML
    Handler->>Registry: Get agent_class for agent_type
    Registry-->>Handler: Return agent class
    Handler->>Env: Create CloudAIGymEnv(test_run, runner)
    Env-->>Handler: Environment instance
    Handler->>Validator: validate_agent_overrides(agent_type, agent_config)
    alt agent_config is None/empty
        Validator-->>Handler: Return {}
    else agent_type not in map (e.g., grid_search)
        Validator->>Validator: Log debug message
        Validator-->>Handler: Return {}
    else agent_type in map (ga/bo/mab)
        Validator->>ConfigClass: model_validate(agent_config)
        ConfigClass->>ConfigClass: Validate constraints (ge, le, etc.)
        alt Validation fails
            ConfigClass-->>Validator: Raise ValidationError
            Validator-->>Handler: Propagate ValidationError
            Handler->>Handler: Log detailed error messages
            Handler-->>User: Return error code, skip agent
        else Validation succeeds
            ConfigClass-->>Validator: Validated config object
            Validator->>Validator: model_dump(exclude_none=True)
            Validator->>Validator: Log info with overrides
            Validator-->>Handler: Return agent_kwargs dict
        end
    end
    alt agent_overrides is empty dict
        Handler->>Agent: agent_class(env)
    else agent_overrides has values
        Handler->>Agent: agent_class(env, **agent_overrides)
    end
    Agent-->>Handler: Agent instance with configured parameters
    Handler->>Agent: Run agent steps (select_action, update_policy)

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/cli/handlers.py

src/cloudai/models/agent_config.py

src/cloudai/cli/handlers.py

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/cloudai/cli/handlers.py`:
- Around line 198-201: The code silently drops a user-supplied agent_config when
no validator exists; update the branch that checks config_class =
config_class_map.get(agent_type) to emit a logging.warning (not debug) when an
agent_config was provided and will be ignored, referencing agent_type and
agent_config in the message so callers understand overrides were ignored; keep
returning {} but ensure the warning clearly states the agent_type and that
agent_config will be ignored (optionally include a truncated/summary of
agent_config) to aid user visibility.

In `@src/cloudai/models/agent_config.py`:
- Around line 51-54: The algorithm Field currently accepts any string which
risks invalid runtime values; restrict and validate it by replacing the loose
Optional[str] with a strict set of allowed values (use a Python Enum or
typing.Literal for "ucb1", "ts" (thompson_sampling), "epsilon_greedy",
"softmax", "random") or add a Pydantic validator on the algorithm field in the
AgentConfig class to raise a clear validation error when an unsupported
algorithm is provided, and update the Field description to match the enforced
choices.
- Around line 42-45: The botorch_num_trials field currently allows any integer
but should only accept -1 or integers >= 1 (matching the semantic described and
the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials
on the AgentConfig model (e.g., a method named validate_botorch_num_trials
decorated with `@validator`("botorch_num_trials")) that returns the value if it's
None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0
or < -1).

src/cloudai/cli/handlers.py

coderabbitai · 2026-01-27T22:12:42Z

src/cloudai/models/agent_config.py

+    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
+    botorch_num_trials: Optional[int] = Field(
+        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file structure and find agent_config.py fd -t f "agent_config.py" | head -20

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Read the agent_config.py file to see current state cat -n src/cloudai/models/agent_config.py

Repository: NVIDIA/cloudai

Length of output: 3888

🏁 Script executed:

# Search for botorch_num_trials usage throughout the codebase rg "botorch_num_trials" -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 499

🏁 Script executed:

# Search for files that use BayesianOptimizationConfig or botorch_num_trials rg "BayesianOptimizationConfig|botorch_num_trials" -t py --files-with-matches

Repository: NVIDIA/cloudai

Length of output: 120

🏁 Script executed:

# Look for the BO agent implementation fd -t f ".*agent.*" -path "*/bayesian*" -o -path "**/optimization*" | grep -i agent

Repository: NVIDIA/cloudai

Length of output: 226

🏁 Script executed:

# Broader search for agent implementations fd -t f "agent" -type f

Repository: NVIDIA/cloudai

Length of output: 228

🏁 Script executed:

# Search more broadly for agent implementations fd -e py "agent" | head -30

Repository: NVIDIA/cloudai

Length of output: 152

🏁 Script executed:

# Also search for where BayesianOptimizationConfig is instantiated rg "BayesianOptimizationConfig" -B 3 -A 10 -t py

Repository: NVIDIA/cloudai

Length of output: 2822

🏁 Script executed:

# Search for botorch usage in the codebase rg "botorch" -i -B 3 -A 3

Repository: NVIDIA/cloudai

Length of output: 673

🏁 Script executed:

# Find the base_agent.py file and search for BO implementation cat src/cloudai/configurator/base_agent.py | head -100

Repository: NVIDIA/cloudai

Length of output: 2224

🏁 Script executed:

# Search for where botorch_num_trials is actually used in the agent rg "botorch_num_trials" -A 5 -B 5 src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 40

🏁 Script executed:

# Look for any BO agent or related files fd -e py | xargs rg -l "class.*Bayesian|botorch_num_trials"

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Search for actual agent implementations/classes rg "class.*Agent.*:" -A 5 | head -100

Repository: NVIDIA/cloudai

Length of output: 4155

🏁 Script executed:

# Look at test_agents.py to see how these configs are tested cat tests/test_agents.py

Repository: NVIDIA/cloudai

Length of output: 2749

🏁 Script executed:

# Check if there are any actual BO agent implementations elsewhere fd -e py -x rg -l "num_trials" {} \;

Repository: NVIDIA/cloudai

Length of output: 94

🏁 Script executed:

# Look for any BO agent implementations or references fd -e py | xargs rg -l "BayesianOptimization|BoTorch" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 120

🏁 Script executed:

# Check if there's a bayesian agent file fd -e py -path "*/agent*" -o -path "*/bayesian*"

Repository: NVIDIA/cloudai

Length of output: 226

🏁 Script executed:

# Search for where the agent configs are actually used rg "botorch_num_trials|sobol_num_trials" -B 5 -A 5

Repository: NVIDIA/cloudai

Length of output: 905

🏁 Script executed:

# Check what agent types are actually implemented ls -la src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 559

🏁 Script executed:

# Check if BO agent exists or is planned fd "\.py$" | xargs rg -l "class.*Bayesian" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Look at handlers.py more carefully to see how config is used cat src/cloudai/cli/handlers.py | grep -A 20 "BayesianOptimizationConfig"

Repository: NVIDIA/cloudai

Length of output: 1467

🏁 Script executed:

# Check if there's external documentation or plugin system for agents rg "entry_points\|entrypoint" -A 5 -B 5

Repository: NVIDIA/cloudai

Length of output: 40

🏁 Script executed:

# Search for any test or documentation mentioning BO agent rg "bayesian|bo agent" -i -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 1069

Add validation constraint to botorch_num_trials.
The field currently accepts any integer, but the description specifies "(-1 for unlimited until max_steps)", implying only -1 or ≥1 are valid. Without validation, invalid values like 0 or -2 pass Pydantic validation. This is inconsistent with the sibling field sobol_num_trials (line 42), which has ge=1 constraint.

🛠 Proposed fix

-from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, field_validator @@ class BayesianOptimizationConfig(AgentConfig): """Configuration overrides for Bayesian Optimization agent.""" @@ botorch_num_trials: Optional[int] = Field( default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)" ) + + `@field_validator`("botorch_num_trials") + `@classmethod` + def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]: + if v is None: + return v + if v != -1 and v < 1: + raise ValueError("botorch_num_trials must be -1 or >= 1") + return v

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")

botorch_num_trials: Optional[int] = Field(

default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"

)

sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")

botorch_num_trials: Optional[int] = Field(

default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"

)

`@field_validator`("botorch_num_trials")

`@classmethod`

def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:

if v is None:

return v

if v != -1 and v < 1:

raise ValueError("botorch_num_trials must be -1 or >= 1")

return v

🤖 Prompt for AI Agents

In `@src/cloudai/models/agent_config.py` around lines 42 - 45, The botorch_num_trials field currently allows any integer but should only accept -1 or integers >= 1 (matching the semantic described and the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials on the AgentConfig model (e.g., a method named validate_botorch_num_trials decorated with `@validator`("botorch_num_trials")) that returns the value if it's None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0 or < -1).

@alexmanle would be good to consider this. wdyt?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Also would be good to test it with BO agent w/ AIConfigurator along with GA.

src/cloudai/models/agent_config.py

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T22:15:12Z

src/cloudai/cli/handlers.py

+            err = 1
+            continue
+
+        agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)


Conditional passes kwargs only when non-empty, but empty dict is falsy. If agent_overrides = {}, the condition evaluates False and takes the else branch. Consider using is not None or explicit length check for clarity:

Suggested change

agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)

agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

However, given validate_agent_overrides always returns a dict, the current logic works but is subtle.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

src/cloudai/cli/handlers.py

alexmanle added 2 commits January 27, 2026 13:46

agent hyperparam interface

1012aa0

fix formatting

bd551e4

alexmanle requested review from amaslenn, jeffnvidia and srivatsankrishnan as code owners January 27, 2026 22:03

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

src/cloudai/cli/handlers.py Outdated Show resolved Hide resolved

src/cloudai/models/agent_config.py Show resolved Hide resolved

src/cloudai/cli/handlers.py Show resolved Hide resolved

default pass no kwargs

c1ecf25

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

jeffnvidia approved these changes Jan 28, 2026

View reviewed changes

alexmanle marked this pull request as draft January 28, 2026 20:31

alexmanle changed the base branch from main to agent-dev January 29, 2026 17:51

-    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
-    botorch_num_trials: Optional[int] = Field(
-        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
-    )
+    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
+    botorch_num_trials: Optional[int] = Field(
+        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
+    )
+    `@field_validator`("botorch_num_trials")
+    `@classmethod`
+    def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
+        if v is None:
+            return v
+        if v != -1 and v < 1:
+            raise ValueError("botorch_num_trials must be -1 or >= 1")
+        return v

	agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
	agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

Agent params #792

Are you sure you want to change the base?

Agent params #792

Uh oh!

Conversation

alexmanle commented Jan 27, 2026

Summary

Test Plan

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srivatsankrishnan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

srivatsankrishnan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

coderabbitai bot Jan 27, 2026 •

edited

Loading