Conversation
- Add agent.rs module with AgentRegistry for managing custom agents - Support loading agent configurations from ~/.codex/agents.toml - Enable agent-specific system prompts with file-based loading - Add comprehensive subagents documentation with usage examples - Include example-agents.toml with 10 specialized agent templates - Update README with multi-agent system documentation links - Integrate agent system into core Codex architecture - Support inheritance of tools/permissions from workspace context - Prevent recursive agent spawning for system stability The multi-agent system allows users to define specialized AI agents with custom system prompts for tasks like code review, testing, refactoring, security auditing, and performance analysis.
Added a new function `validate_prompt_path` in the `AgentRegistry` to ensure that prompt file paths do not escape allowed directories, preventing potential path traversal attacks. This function checks if the provided path is within the user's home directory or the base directory, returning an error if it is not. Updated the agent loading logic to utilize this validation, enhancing security during agent initialization.
Updated the agent context creation to improve clarity and functionality. The task extraction logic from the agent prompt has been refined to ensure that the user's task is correctly identified and handled. This change includes better error handling for cases where no task is found, and improved logging for agent execution. Additionally, comments have been updated to reflect the new structure and intent of the code.
Updated the agent ID generation logic to safely truncate the call ID, ensuring unique identifiers are created without risk of overflow. Additionally, introduced an `AgentContextGuard` struct to manage the agent context flag more effectively, ensuring it is reset appropriately after use. This change enhances clarity and reliability in agent execution tracking.
Enhanced the `AgentContextGuard` struct to ensure the agent context flag is reset appropriately after use, improving reliability in agent execution tracking. Updated comments for clarity and added logging to indicate when the agent context is marked and unmarked, ensuring better traceability during agent operations.
Refined the logic for sending the `AgentBegin` and `AgentEnd` events to ensure they are only sent under appropriate conditions. Improved the handling of agent context creation and task extraction, ensuring better clarity and reliability in agent execution tracking. Updated comments for better understanding and added error handling for agent stream errors, enhancing overall robustness in agent operations.
Added validation logic to ensure that the AgentConfig struct requires either a prompt or a prompt_file, but not both. Implemented functionality to load the prompt from a specified file if prompt_file is provided. Enhanced the get_prompt method to cache the loaded prompt for efficiency. Updated tests to cover various scenarios for prompt validation and retrieval, ensuring robust agent configuration management.
Enhanced the handling of agent tool calls by introducing a structure to manage pending tool calls, allowing for parallel execution of agent and non-agent tool calls. Updated the logic to categorize and process tool calls efficiently, ensuring that agent calls are executed concurrently while maintaining proper context management. Improved error handling and logging for better traceability during agent operations.
Enhanced the agent execution logic to support true parallel execution of agents, ensuring that all agents run concurrently rather than sequentially. Introduced thread-safe wrappers for Session and TurnContext to facilitate safe concurrent access. Updated comments for clarity and improved logging to reflect the parallel execution status, enhancing performance tracking during agent operations.
Refactored the mutex handling in the codex module to replace `lock_unchecked()` with `lock_or_recover()`, enhancing the robustness of the code by allowing recovery from poisoned locks. Updated various functions to ensure safe access to shared state, improving error handling and overall clarity. This change aims to provide a cleaner API for managing mutex locks while maintaining performance during concurrent operations.
Refactored the handling of tool calls in the codex module to implement a more efficient concurrency strategy. Agent calls are executed in parallel, while other tool calls are processed sequentially to respect dependencies.
Refactored the agent message structure in the codex module to replace `OutputText` with `InputText` for better clarity and alignment with input processing. This change enhances the handling of user messages, ensuring that the content is appropriately categorized and processed as input. The update aims to improve the overall readability and maintainability of the code.
Refactored the agent tool handling in the codex module to streamline the processing of agent calls and non-agent tool calls. This update emphasizes true parallel execution for agent calls while ensuring that other tool calls are processed sequentially, respecting dependencies. Improved the clarity of comments and the overall structure of the code to enhance maintainability and performance tracking during agent operations.
Enhanced the agent execution logic in the codex module by introducing a flag to track whether the execution context is an agent context. This update prevents recursion during agent calls, ensuring that agents do not spawn other agents. The changes include setting and resetting the agent context flag appropriately, improving the clarity of the code and maintaining the integrity of parallel execution. Updated comments for better understanding of the context management process.
Summary of ChangesHello @Grinsven, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates a robust multi-agent orchestration system into Codex, enabling users to leverage specialized AI agents for various tasks. The system allows for flexible agent configuration via ~/.codex/agents.toml, supporting custom system prompts and tool/permission overrides. A key enhancement is the ability to execute multiple agent calls in parallel, optimizing performance for complex workflows. User interaction is streamlined through a new @agent mention syntax and a /agents command in the TUI, providing real-time feedback and automatic plan tracking. The changes span core logic, protocol definitions, UI components, and extensive documentation, making the multi-agent capabilities a central feature of Codex. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an impressive pull request that introduces a powerful multi-agent orchestration system. The feature is well-designed, with custom agent configurations, parallel execution, and good user-facing features like @agent mentions and the /agents command. The addition of comprehensive documentation and examples is also excellent.
My review focuses on a few key areas:
- Concurrency Safety: There is a
criticalissue regarding the use ofunsafeto implementSyncforSessionWrapper. This is a potential source of data races and should be addressed with a safer pattern likeWeakpointers. - Robustness: A couple of
mediumseverity suggestions to improve cross-platform compatibility and robustness in path handling within the new agent module. - Maintainability: A
mediumseverity suggestion to refactor a loop incodex.rsto reduce code duplication.
Overall, this is a fantastic contribution. Addressing the safety concern around concurrency is the highest priority.
| /// Thread-safe wrapper for Session to enable concurrent execution | ||
| struct SessionWrapper { | ||
| // We use a raw pointer here because: | ||
| // 1. We need to share &Session across threads but Session isn't Clone | ||
| // 2. The Session is guaranteed to outlive this wrapper (created in same scope) | ||
| // 3. Session has internal thread-safety via Mutex fields | ||
| ptr: *const Session, | ||
| _phantom: std::marker::PhantomData<Session>, | ||
| } | ||
|
|
||
| // SAFETY: Session has internal synchronization via Mutex fields | ||
| // The wrapper ensures the Session outlives all uses | ||
| unsafe impl Send for SessionWrapper {} | ||
| unsafe impl Sync for SessionWrapper {} |
There was a problem hiding this comment.
The manual unsafe implementation of Send and Sync for SessionWrapper is extremely dangerous and can lead to data races. The Session struct is not Sync because it contains fields that are not Sync (due to a reference cycle with AgentTask holding an Arc<Session>). Bypassing the compiler's safety checks with unsafe is a major security and correctness risk.
A safer approach to break the reference cycle would be to use std::sync::Weak<Session> inside AgentTask instead of Arc<Session>. This is the idiomatic way to handle cycles in Rust.
| let path = if prompt_file.starts_with('/') { | ||
| PathBuf::from(prompt_file) | ||
| } else { | ||
| base_dir.join(prompt_file) | ||
| }; |
There was a problem hiding this comment.
The check prompt_file.starts_with('/') is not cross-platform for detecting absolute paths. On Windows, absolute paths can start with a drive letter (e.g., C:\). Using Path::is_absolute() would make this function more robust and portable.
| let path = if prompt_file.starts_with('/') { | |
| PathBuf::from(prompt_file) | |
| } else { | |
| base_dir.join(prompt_file) | |
| }; | |
| let path = PathBuf::from(prompt_file); | |
| let path = if path.is_absolute() { | |
| path | |
| } else { | |
| base_dir.join(path) | |
| }; |
| fn get_agents_directory() -> Option<PathBuf> { | ||
| std::env::var("HOME") | ||
| .or_else(|_| std::env::var("USERPROFILE")) | ||
| .ok() | ||
| .map(|home| PathBuf::from(home).join(".codex")) | ||
| } |
There was a problem hiding this comment.
For consistency and robustness, get_agents_directory should use dirs::home_dir() to locate the user's home directory, just like validate_prompt_path does. The current implementation using std::env::var("HOME").or_else(|_| std::env::var("USERPROFILE")) is less reliable across different platforms and configurations. The dirs crate is already a dependency.
fn get_agents_directory() -> Option<PathBuf> {
dirs::home_dir().map(|home| home.join(".codex"))
}| for item in collected_items { | ||
| match &item { | ||
| ResponseItem::FunctionCall { | ||
| name, | ||
| call_id, | ||
| arguments, | ||
| .. | ||
| } if name == "agent" => { | ||
| // Collect agent calls for parallel execution | ||
| pending_agent_calls.push(PendingToolCall { | ||
| item: item.clone(), | ||
| call_id: call_id.clone(), | ||
| name: name.clone(), | ||
| arguments: Some(arguments.clone()), | ||
| }); | ||
| } | ||
| ResponseItem::FunctionCall { .. } => { | ||
| // Process non-agent function calls immediately in order | ||
| let response = handle_response_item( | ||
| sess, | ||
| turn_context, | ||
| turn_diff_tracker, | ||
| sub_id, | ||
| item.clone(), | ||
| ) | ||
| .await?; | ||
| if let Some(resp) = response.clone() { | ||
| output.push(resp); | ||
| } | ||
| processed_items.push(ProcessedResponseItem { | ||
| item: item.clone(), | ||
| response, | ||
| }); | ||
| } | ||
| ResponseItem::LocalShellCall { | ||
| call_id: Some(id), .. | ||
| } => { | ||
| // Process shell calls immediately in order | ||
| let response = handle_response_item( | ||
| sess, | ||
| turn_context, | ||
| turn_diff_tracker, | ||
| sub_id, | ||
| item.clone(), | ||
| ) | ||
| .await?; | ||
| if let Some(resp) = response.clone() { | ||
| output.push(resp); | ||
| } | ||
| processed_items.push(ProcessedResponseItem { | ||
| item: item.clone(), | ||
| response, | ||
| }); | ||
| } | ||
| ResponseItem::CustomToolCall { name, call_id, .. } if name == "agent" => { | ||
| // Collect agent calls for parallel execution | ||
| pending_agent_calls.push(PendingToolCall { | ||
| item: item.clone(), | ||
| call_id: call_id.clone(), | ||
| name: name.clone(), | ||
| arguments: None, | ||
| }); | ||
| } | ||
| ResponseItem::CustomToolCall { .. } => { | ||
| // Process non-agent custom tool calls immediately in order | ||
| let response = handle_response_item( | ||
| sess, | ||
| turn_context, | ||
| turn_diff_tracker, | ||
| sub_id, | ||
| item.clone(), | ||
| ) | ||
| .await?; | ||
| if let Some(resp) = response.clone() { | ||
| output.push(resp); | ||
| } | ||
| processed_items.push(ProcessedResponseItem { | ||
| item: item.clone(), | ||
| response, | ||
| }); | ||
| } | ||
| _ => { | ||
| // Process non-tool items immediately in order | ||
| let response = handle_response_item( | ||
| sess, | ||
| turn_context, | ||
| turn_diff_tracker, | ||
| sub_id, | ||
| item.clone(), | ||
| ) | ||
| .await?; | ||
| if let Some(resp) = response.clone() { | ||
| output.push(resp); | ||
| } | ||
| processed_items.push(ProcessedResponseItem { item, response }); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The loop for processing collected_items contains significant code duplication for handling different types of tool calls (FunctionCall, LocalShellCall, CustomToolCall). The logic for processing these items is nearly identical. This can be refactored to reduce repetition and improve maintainability. You can check if an item is an agent call first, and then have a common block for all other tool calls.
for item in collected_items {
match &item {
ResponseItem::FunctionCall { name, call_id, arguments, .. } if name == "agent" => {
// Collect agent calls for parallel execution
pending_agent_calls.push(PendingToolCall {
item: item.clone(),
call_id: call_id.clone(),
name: name.clone(),
arguments: Some(arguments.clone()),
});
}
ResponseItem::CustomToolCall { name, call_id, .. } if name == "agent" => {
// Collect agent calls for parallel execution
pending_agent_calls.push(PendingToolCall {
item: item.clone(),
call_id: call_id.clone(),
name: name.clone(),
arguments: None,
});
}
_ => {
// Process non-agent items immediately in order
let response = handle_response_item(
sess,
turn_context,
turn_diff_tracker,
sub_id,
item.clone(),
)
.await?;
if let Some(resp) = response.clone() {
output.push(resp);
}
processed_items.push(ProcessedResponseItem { item, response });
}
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-architected feature: a multi-agent orchestration system. The changes are extensive, touching core logic, the communication protocol, UI components, and documentation to support this new capability. The highlight is the implementation of parallel agent execution, which is a complex but powerful addition. My review focuses on the core implementation in codex-rs/core, where I've identified a potential security vulnerability, some opportunities for code simplification and cleanup, and a risky use of unsafe code that warrants careful consideration for future maintenance. Overall, this is a very impressive piece of work.
| pub fn get_prompt(&mut self, agents_dir: Option<&Path>) -> anyhow::Result<String> { | ||
| if let Some(prompt) = &self.prompt { | ||
| return Ok(prompt.clone()); | ||
| } | ||
|
|
||
| if let Some(prompt_file) = &self.prompt_file { | ||
| let full_path = if let Some(dir) = agents_dir { | ||
| dir.join(prompt_file) | ||
| } else { | ||
| PathBuf::from(prompt_file) | ||
| }; | ||
|
|
||
| let prompt_content = std::fs::read_to_string(&full_path).map_err(|e| { | ||
| anyhow::anyhow!("Cannot read prompt file '{}': {}", full_path.display(), e) | ||
| })?; | ||
|
|
||
| // Cache the loaded prompt | ||
| self.prompt = Some(prompt_content.clone()); | ||
| Ok(prompt_content) | ||
| } else { | ||
| Err(anyhow::anyhow!("No prompt or prompt_file specified")) | ||
| } | ||
| } |
There was a problem hiding this comment.
This public function get_prompt is vulnerable to path traversal. It constructs a path by joining agents_dir and prompt_file without validating prompt_file. An attacker could craft a prompt_file like "../../../../etc/passwd" to read arbitrary files on the system. The path validation logic from AgentRegistry::validate_prompt_path should be used here to prevent this. The logic for loading prompts from files is duplicated between this function and AgentRegistry::new, which is also a maintainability issue. It would be better to consolidate this logic.
| fn get_agents_directory() -> Option<PathBuf> { | ||
| std::env::var("HOME") | ||
| .or_else(|_| std::env::var("USERPROFILE")) | ||
| .ok() | ||
| .map(|home| PathBuf::from(home).join(".codex")) | ||
| } |
There was a problem hiding this comment.
For consistency and robustness, it's better to use dirs::home_dir() to get the home directory, as it's already used in validate_prompt_path. The dirs crate handles more edge cases across different operating systems than manually checking HOME and USERPROFILE environment variables.
fn get_agents_directory() -> Option<PathBuf> {
dirs::home_dir().map(|home| home.join(".codex"))
}| let agent_infos = | ||
| if let Ok(Some(registry)) = sess.agent_registry.lock().map(|g| g.as_ref().cloned()) { | ||
| Some(registry.list_agent_details()) | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
This logic for getting agent_infos can be simplified by using the lock_or_recover extension method, which is consistent with other mutex locks in this file. This makes the code more concise and readable.
let agent_infos = sess.agent_registry.lock_or_recover()
.as_ref()
.map(|registry| registry.list_agent_details());| "agent" => { | ||
| // Agent calls are now handled in parallel at the turn level | ||
| // Return a placeholder response that will be replaced by parallel execution | ||
| ResponseInputItem::FunctionCallOutput { | ||
| call_id, | ||
| output: FunctionCallOutputPayload { | ||
| content: "Agent execution deferred for parallel processing".to_string(), | ||
| success: Some(false), | ||
| }, | ||
| } | ||
| } |
There was a problem hiding this comment.
This "agent" match arm appears to be dead code. The try_run_turn function explicitly separates agent tool calls into pending_agent_calls and only calls handle_response_item (which in turn calls handle_function_call) for non-agent tool calls. This branch will therefore never be reached. It should be removed to avoid confusion and code rot.
| /// Thread-safe wrapper for Session to enable concurrent execution | ||
| struct SessionWrapper { | ||
| // We use a raw pointer here because: | ||
| // 1. We need to share &Session across threads but Session isn't Clone | ||
| // 2. The Session is guaranteed to outlive this wrapper (created in same scope) | ||
| // 3. Session has internal thread-safety via Mutex fields | ||
| ptr: *const Session, | ||
| _phantom: std::marker::PhantomData<Session>, | ||
| } | ||
|
|
||
| // SAFETY: Session has internal synchronization via Mutex fields | ||
| // The wrapper ensures the Session outlives all uses | ||
| unsafe impl Send for SessionWrapper {} | ||
| unsafe impl Sync for SessionWrapper {} |
There was a problem hiding this comment.
The use of raw pointers and unsafe blocks to share &Session across threads is a high-risk pattern. While the comments justify its use and the current implementation seems to manage lifetimes correctly via join_all, this is a potential footgun. Any future changes to how agent futures are executed (e.g., moving from join_all to tokio::spawn) could easily lead to use-after-free bugs. Consider adding more prominent warnings or exploring alternative, safer patterns if possible, even if they are more verbose (e.g., refactoring Session to be Sync or passing owned data to threads).
| /// Thread-safe wrapper for TurnContext to enable concurrent execution | ||
| struct TurnContextWrapper { | ||
| // We use a raw pointer here for the same reasons as SessionWrapper | ||
| ptr: *const TurnContext, | ||
| _phantom: std::marker::PhantomData<TurnContext>, | ||
| } | ||
|
|
||
| // SAFETY: TurnContext contains only thread-safe types | ||
| // The wrapper ensures the TurnContext outlives all uses | ||
| unsafe impl Send for TurnContextWrapper {} | ||
| unsafe impl Sync for TurnContextWrapper {} |
There was a problem hiding this comment.
…ai#8950) Agent wouldn't "see" attached images and would instead try to use the view_file tool: <img width="1516" height="504" alt="image" src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12" /> In this PR, we wrap image content items in XML tags with the name of each image (now just a numbered name like `[Image #1]`), so that the model can understand inline image references (based on name). We also put the image content items above the user message which the model seems to prefer (maybe it's more used to definitions being before references). We also tweak the view_file tool description which seemed to help a bit Results on a simple eval set of images: Before <img width="980" height="310" alt="image" src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86" /> After <img width="918" height="322" alt="image" src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f" /> ```json [ { "id": "single_describe", "prompt": "Describe the attached image in one sentence.", "images": ["image_a.png"] }, { "id": "single_color", "prompt": "What is the dominant color in the image? Answer with a single color word.", "images": ["image_b.png"] }, { "id": "orientation_check", "prompt": "Is the image portrait or landscape? Answer in one sentence.", "images": ["image_c.png"] }, { "id": "detail_request", "prompt": "Look closely at the image and call out any small details you notice.", "images": ["image_d.png"] }, { "id": "two_images_compare", "prompt": "I attached two images. Are they the same or different? Briefly explain.", "images": ["image_a.png", "image_b.png"] }, { "id": "two_images_captions", "prompt": "Provide a short caption for each image (Image 1, Image 2).", "images": ["image_c.png", "image_d.png"] }, { "id": "multi_image_rank", "prompt": "Rank the attached images from most colorful to least colorful.", "images": ["image_a.png", "image_b.png", "image_c.png"] }, { "id": "multi_image_choice", "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.", "images": ["image_b.png", "image_d.png"] } ] ```
I've seen this test fail with: ``` - Mock #1. Expected range of matching incoming requests: == 2 Number of matched incoming requests: 1 ``` This is because we pop the wrong task_complete events and then the test exits. I think this is because the MCP events are now buffered after openai#8874. So: 1. clear the buffer before we do any user message sending 2. additionally listen for task start before task complete 3. use the ID from task start to find the correct task complete event.
Fixes openai#9050 When a draft is stashed with Ctrl+C, we now persist the full draft state (text elements, local image paths, and pending paste payloads) in local history. Up/Down recall rehydrates placeholder elements and attachments so styling remains correct and large pastes still expand on submit. Persistent (cross‑session) history remains text‑only. Backtrack prefills now reuse the selected user message’s text elements and local image paths, so image placeholders/attachments rehydrate when rolling back. External editor replacements keep only attachments whose placeholders remain and then normalize image placeholders to `[Image #1]..[Image #N]` to keep the attachment mapping consistent. Docs: - docs/tui-chat-composer.md Testing: - just fix -p codex-tui - cargo test -p codex-tui Co-authored-by: Eric Traut <etraut@openai.com>
…i#10590) ## Summary This PR makes app-server-provided image URLs first-class attachments in TUI, so they survive resume/backtrack/history recall and are resubmitted correctly. <img width="715" height="491" alt="Screenshot 2026-02-12 at 8 27 08 PM" src="https://github.com/user-attachments/assets/226cbd35-8f0c-4e51-a13e-459ef5dd1927" /> Can delete the attached image upon backtracking: <img width="716" height="301" alt="Screenshot 2026-02-12 at 8 27 31 PM" src="https://github.com/user-attachments/assets/4558d230-f1bd-4eed-a093-8e1ab9c6db27" /> In both history and composer, remote images are rendered as normal `[Image #N]` placeholders, with numbering unified with local images. ## What changed - Plumb remote image URLs through TUI message state: - `UserHistoryCell` - `BacktrackSelection` - `ChatComposerHistory::HistoryEntry` - `ChatWidget::UserMessage` - Show remote images as placeholder rows inside the composer box (above textarea), and in history cells. - Support keyboard selection/deletion for remote image rows in composer (`Up`/`Down`, `Delete`/`Backspace`). - Preserve remote-image-only turns in local composer history (Up/Down recall), including restore after backtrack. - Ensure submit/queue/backtrack resubmit include remote images in model input (`UserInput::Image`), and keep request shape stable for remote-image-only turns. - Keep image numbering contiguous across remote + local images: - remote images occupy `[Image #1]..[Image #M]` - local images start at `[Image #M+1]` - deletion renumbers consistently. - In protocol conversion, increment shared image index for remote images too, so mixed remote/local image tags stay in a single sequence. - Simplify restore logic to trust in-memory attachment order (no placeholder-number parsing path). - Backtrack/replay rollback handling now queues trims through `AppEvent::ApplyThreadRollback` and syncs transcript overlay/deferred lines after trims, so overlay/transcript state stays consistent. - Trim trailing blank rendered lines from user history rendering to avoid oversized blank padding. ## Docs + tests - Updated: `docs/tui-chat-composer.md` (remote image flow, selection/deletion, numbering offsets) - Added/updated tests across `tui/src/chatwidget/tests.rs`, `tui/src/app.rs`, `tui/src/app_backtrack.rs`, `tui/src/history_cell.rs`, and `tui/src/bottom_pane/chat_composer.rs` - Added snapshot coverage for remote image composer states, including deleting the first of two remote images. ## Validation - `just fmt` - `cargo test -p codex-tui` ## Codex author `codex fork 019c2636-1571-74a1-8471-15a3b1c3f49d`
Port of upstream PR openai#3655 (originally by @cognitive-glitch).