Skip to content

feat: Tool Integration with PolicyEngine (PR 2 of #7231)#8078

Closed
allenhutchison wants to merge 12 commits intomainfrom
adh/issue-7231-pr2-tool-integration
Closed

feat: Tool Integration with PolicyEngine (PR 2 of #7231)#8078
allenhutchison wants to merge 12 commits intomainfrom
adh/issue-7231-pr2-tool-integration

Conversation

@allenhutchison
Copy link
Copy Markdown
Contributor

Summary

This PR implements the second phase of the Tool Confirmation Message Bus architecture (#7231) by integrating the PolicyEngine with an actual tool.

  • Integrates PolicyEngine with the web-search tool as a proof of concept
  • Implements shouldConfirmExecute method that respects policy decisions
  • Adds comprehensive test coverage for all policy scenarios

Implementation Details

The web-search tool now:

  • Checks PolicyEngine for tool execution decisions
  • Handles ALLOW (immediate execution), DENY (block with error), and ASK_USER (request confirmation)
  • Provides a working example of the integration pattern

Test Coverage

Added tests that mock:

  • Policy decisions (ALLOW/DENY/ASK_USER)
  • UI confirmation responses
  • Different query patterns triggering different policies

Future Work

As noted, as we integrate this functionality into other tools, we will:

  • Extract common confirmation logic to the base tool class
  • Generalize the test patterns for reuse across all tools
  • Apply this pattern to more critical tools like shell

This PR intentionally keeps the implementation in the specific tool to demonstrate the pattern clearly before abstraction.

Related

Introduces a pub/sub message bus architecture to decouple tool confirmation
logic from core tool implementation. This is the first PR in a series to
implement the Tool Confirmation Message Bus as described in issue #7231.

## Changes
- Add MessageBus class with event-driven publish/subscribe pattern
- Implement PolicyEngine with configurable rules for tool execution control
- Integrate MessageBus and PolicyEngine into Config class
- Add comprehensive unit tests for both components

## Features
- Priority-based policy rules with tool name and args pattern matching
- Non-interactive mode support (ASK_USER becomes DENY)
- Error handling and message validation
- Support for multiple message types (confirmation, rejection, execution)

## Testing
- 25 new unit tests covering all functionality
- Full test suite passes (1964 total tests)
- TypeScript compilation and linting checks pass
Fixes the security vulnerability identified by Gemini Code Assist where
JSON.stringify() could produce different output for the same arguments
due to non-deterministic property ordering. This could allow security
policies to be bypassed.

Changes:
- Implement stableStringify() method with sorted keys for consistent output
- Ensure pattern matching works regardless of argument property order
- Add comprehensive tests for order-independent matching
- Test nested objects to ensure deep stability

This ensures PolicyEngine rules cannot be bypassed by reordering
object properties in tool arguments.
…eStringify

Addresses the denial-of-service vulnerability identified in code review where
circular references in tool arguments could cause infinite recursion and
stack overflow in the stableStringify method.

Changes:
- Implement WeakSet tracking to detect and handle circular references
- Replace recursive calls with inner stringify function that tracks visited objects
- Return "[Circular]" placeholder when circular reference detected
- Add comprehensive tests for both shallow and deep circular references
- Use proper TypeScript types instead of 'any' in tests

Security impact:
- Prevents potential DoS attacks via crafted tool arguments
- Ensures PolicyEngine remains robust against malformed input
- Maintains stable stringification while preventing infinite recursion
…ringify

Fixed multiple security vulnerabilities identified in code review:
- Correctly handle repeated non-circular objects (no false positives)
- Properly handle undefined and function values per JSON spec
- Use Set with ancestor chain tracking instead of WeakSet
- Ensure generated JSON is always valid
- Add comprehensive test coverage for edge cases

The stableStringify method now:
- Tracks ancestor chain per branch to detect true circular references
- Omits undefined/function values from objects (per JSON spec)
- Converts undefined/functions to null in arrays (per JSON spec)
- Produces deterministic output with sorted keys for consistent pattern matching
- Wrap array test inputs in objects to match Record<string, unknown> type
- Add explicit type annotation for testCases array
- Ensure all test inputs have correct types for args parameter
…mentation

Fixes based on code review:
- Remove readonly modifier from rules array since it's mutated by addRule/removeRulesForTool
- Add support for toJSON methods per JSON.stringify specification
- Add comprehensive documentation for stableStringify method
- Add tests for toJSON handling including edge cases

The stableStringify method now fully respects the JSON specification while
maintaining deterministic output for security policy matching.
Replace JSON.stringify with safeJsonStringify to handle potential
circular references in message objects. This prevents TypeError
exceptions when logging invalid messages with circular references
in toolCall.args.
Implements PR 2 of issue #7231 - Tool Implementation & Integration

- Add PolicyEngine integration to WebSearchTool
- Implement shouldConfirmExecute method that checks policies
- Support ALLOW, DENY, and ASK_USER policy decisions
- Add comprehensive tests for policy-based confirmation flow
- Mock both policy decisions and UI responses in tests

The web-search tool now respects PolicyEngine rules:
- ALLOW: Execute immediately without confirmation
- DENY: Block execution with error message
- ASK_USER: Request user confirmation via UI

This lays the groundwork for integrating more tools with the
new MessageBus and PolicyEngine architecture.
@allenhutchison allenhutchison requested a review from a team as a code owner September 9, 2025 13:53
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @allenhutchison, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks the second phase of the Tool Confirmation Message Bus architecture by integrating the PolicyEngine into the web-search tool. This integration enables the system to enforce predefined policies for tool execution, allowing for immediate execution, blocking, or prompting the user for confirmation. The changes provide a working example of this integration pattern, laying the groundwork for future abstraction and application to other critical tools.

Highlights

  • PolicyEngine Integration: The PolicyEngine has been integrated with the web-search tool, serving as a proof of concept for controlled tool execution.
  • Confirmation Logic: The shouldConfirmExecute method has been implemented within the web-search tool, allowing it to respect policy decisions (ALLOW, DENY, ASK_USER) before execution.
  • Comprehensive Test Coverage: Extensive test coverage has been added to validate all policy scenarios, including mocking policy decisions and UI confirmation responses.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully integrates the PolicyEngine with the web-search tool, providing a solid proof-of-concept for the new tool confirmation architecture. The implementation correctly handles ALLOW, DENY, and ASK_USER policy decisions, and the accompanying tests are comprehensive. My review focuses on improving security by preventing potential information leaks in error messages and console logs, where sensitive query parameters could be exposed. Addressing these points will help establish a secure pattern for other tools to follow.

Comment on lines +94 to +96
if (decision === PolicyDecision.DENY) {
throw new Error(`Web search blocked by policy for query: "${this.params.query}"`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Potential information leak. Throwing an error that includes the full search query can leak sensitive information if the query contains personal data. This information might end up in logs, which could be a security risk. It's safer to throw a more generic error message. The specific details of the blocked call can be logged internally by the policy engine if needed for debugging.

This change would also require updating the test in packages/core/src/tools/web-search.test.ts at line 294 to expect the new, generic error message.

Suggested change
if (decision === PolicyDecision.DENY) {
throw new Error(`Web search blocked by policy for query: "${this.params.query}"`);
}
if (decision === PolicyDecision.DENY) {
throw new Error('Web search blocked by policy.');
}

Comment on lines +104 to +111
onConfirm: async (outcome: ToolConfirmationOutcome) => {
// In the future, we might want to update rules based on the outcome
// For now, just log the decision
if (outcome === ToolConfirmationOutcome.ProceedAlways) {
// Could add a rule to always allow web searches with similar patterns
console.log(`User chose to always allow web searches like: "${this.params.query}"`);
}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using console.log in library code is generally discouraged as it can pollute the consumer application's logs and is not easily configurable. More importantly, logging the full query here could be an information leak if the query contains sensitive data. While this is a placeholder, it would be better to avoid logging potentially sensitive arguments directly. Consider logging a truncated or hashed version of the query, or emitting an event that the application can handle.

This change would also require updating the corresponding test in packages/core/src/tools/web-search.test.ts at lines 341-343.

      onConfirm: async (outcome: ToolConfirmationOutcome) => {
        // In the future, we might want to update rules based on the outcome
        // For now, just log the decision
        if (outcome === ToolConfirmationOutcome.ProceedAlways) {
          // Could add a rule to always allow web searches with similar patterns
          // Avoid logging the full query to prevent leaking sensitive information.
          console.log(`User chose to always allow similar web searches.`);
        }
      }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. generally remove this log before landing. If you want it, tweak this to console.debug so it is only shown when the app is run in debug mode. console.log will show for all users if they open the console.

@gemini-cli gemini-cli bot added kind/enhancement priority/p2 Important but can be addressed in a future release. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality labels Sep 9, 2025
}

if (decision === PolicyDecision.DENY) {
throw new Error(`Web search blocked by policy for query: "${this.params.query}"`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit for discussion: don't love that this is throwing an exception. all things being equal Gemini CLI should not throw exceptions as part of normal operations (improves experience debugging with pause on caught exceptions). Fine with leaving as is but would be ideal if instead we could report this in the return value rather throwing errors to indicate rejections. Having rejections be throws of generic errors also increases the risk that internal bugs causing errors are treated the same as intentional policy rejections.

// For now, just log the decision
if (outcome === ToolConfirmationOutcome.ProceedAlways) {
// Could add a rule to always allow web searches with similar patterns
console.log(`User chose to always allow web searches like: "${this.params.query}"`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about avoiding console.log

This commit addresses several comments from PR #7835.

- Refactors `removeRulesForTool` in `PolicyEngine` to use `.filter()` for better readability.
- Extracts `stableStringify` into its own utility file and makes `ruleMatches` a file-level function.
- Adds structural types to message objects in `message-bus.test.ts` to improve type safety.
- Makes `ToolExecutionSuccess` and `ToolExecutionFailure` interfaces generic for more flexible typing.
- Fixes failing tests in `message-bus.test.ts`.
Copy link
Copy Markdown
Contributor

@abhipatel12 abhipatel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great in general! Just had a few questions!

prompt: `Allow web search for: "${this.params.query}"?`,
onConfirm: async (outcome: ToolConfirmationOutcome) => {
// In the future, we might want to update rules based on the outcome
// For now, just log the decision
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering.

For Always Allow, why aren't we updating the policy engine so that during the next web_fetch, it's run automatically? Why in the future?

return {
type: 'info',
title: 'Confirm Web Search',
prompt: `Allow web search for: "${this.params.query}"?`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking on this PR and purely for discussion, I think it would be ideal if we could eventually transition our permission request to be event-driven as well.

In that world, we can separate concerns between the core and cli. For example, the title and prompt above, imo, seem like a CLI concern. In the core we can focus on just providing the base data required and the CLI can decide how it wants to render that information.

Base automatically changed from adh/issue-7231 to main September 11, 2025 16:52
@allenhutchison allenhutchison deleted the adh/issue-7231-pr2-tool-integration branch October 16, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality priority/p2 Important but can be addressed in a future release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants