feat: Tool Integration with PolicyEngine (PR 2 of #7231)#8078
feat: Tool Integration with PolicyEngine (PR 2 of #7231)#8078allenhutchison wants to merge 12 commits intomainfrom
Conversation
Introduces a pub/sub message bus architecture to decouple tool confirmation logic from core tool implementation. This is the first PR in a series to implement the Tool Confirmation Message Bus as described in issue #7231. ## Changes - Add MessageBus class with event-driven publish/subscribe pattern - Implement PolicyEngine with configurable rules for tool execution control - Integrate MessageBus and PolicyEngine into Config class - Add comprehensive unit tests for both components ## Features - Priority-based policy rules with tool name and args pattern matching - Non-interactive mode support (ASK_USER becomes DENY) - Error handling and message validation - Support for multiple message types (confirmation, rejection, execution) ## Testing - 25 new unit tests covering all functionality - Full test suite passes (1964 total tests) - TypeScript compilation and linting checks pass
Fixes the security vulnerability identified by Gemini Code Assist where JSON.stringify() could produce different output for the same arguments due to non-deterministic property ordering. This could allow security policies to be bypassed. Changes: - Implement stableStringify() method with sorted keys for consistent output - Ensure pattern matching works regardless of argument property order - Add comprehensive tests for order-independent matching - Test nested objects to ensure deep stability This ensures PolicyEngine rules cannot be bypassed by reordering object properties in tool arguments.
…eStringify Addresses the denial-of-service vulnerability identified in code review where circular references in tool arguments could cause infinite recursion and stack overflow in the stableStringify method. Changes: - Implement WeakSet tracking to detect and handle circular references - Replace recursive calls with inner stringify function that tracks visited objects - Return "[Circular]" placeholder when circular reference detected - Add comprehensive tests for both shallow and deep circular references - Use proper TypeScript types instead of 'any' in tests Security impact: - Prevents potential DoS attacks via crafted tool arguments - Ensures PolicyEngine remains robust against malformed input - Maintains stable stringification while preventing infinite recursion
…ringify Fixed multiple security vulnerabilities identified in code review: - Correctly handle repeated non-circular objects (no false positives) - Properly handle undefined and function values per JSON spec - Use Set with ancestor chain tracking instead of WeakSet - Ensure generated JSON is always valid - Add comprehensive test coverage for edge cases The stableStringify method now: - Tracks ancestor chain per branch to detect true circular references - Omits undefined/function values from objects (per JSON spec) - Converts undefined/functions to null in arrays (per JSON spec) - Produces deterministic output with sorted keys for consistent pattern matching
- Wrap array test inputs in objects to match Record<string, unknown> type - Add explicit type annotation for testCases array - Ensure all test inputs have correct types for args parameter
…mentation Fixes based on code review: - Remove readonly modifier from rules array since it's mutated by addRule/removeRulesForTool - Add support for toJSON methods per JSON.stringify specification - Add comprehensive documentation for stableStringify method - Add tests for toJSON handling including edge cases The stableStringify method now fully respects the JSON specification while maintaining deterministic output for security policy matching.
Replace JSON.stringify with safeJsonStringify to handle potential circular references in message objects. This prevents TypeError exceptions when logging invalid messages with circular references in toolCall.args.
Implements PR 2 of issue #7231 - Tool Implementation & Integration - Add PolicyEngine integration to WebSearchTool - Implement shouldConfirmExecute method that checks policies - Support ALLOW, DENY, and ASK_USER policy decisions - Add comprehensive tests for policy-based confirmation flow - Mock both policy decisions and UI responses in tests The web-search tool now respects PolicyEngine rules: - ALLOW: Execute immediately without confirmation - DENY: Block execution with error message - ASK_USER: Request user confirmation via UI This lays the groundwork for integrating more tools with the new MessageBus and PolicyEngine architecture.
There was a problem hiding this comment.
Summary of Changes
Hello @allenhutchison, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request marks the second phase of the Tool Confirmation Message Bus architecture by integrating the PolicyEngine into the web-search tool. This integration enables the system to enforce predefined policies for tool execution, allowing for immediate execution, blocking, or prompting the user for confirmation. The changes provide a working example of this integration pattern, laying the groundwork for future abstraction and application to other critical tools.
Highlights
- PolicyEngine Integration: The PolicyEngine has been integrated with the web-search tool, serving as a proof of concept for controlled tool execution.
- Confirmation Logic: The
shouldConfirmExecutemethod has been implemented within the web-search tool, allowing it to respect policy decisions (ALLOW, DENY, ASK_USER) before execution. - Comprehensive Test Coverage: Extensive test coverage has been added to validate all policy scenarios, including mocking policy decisions and UI confirmation responses.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request successfully integrates the PolicyEngine with the web-search tool, providing a solid proof-of-concept for the new tool confirmation architecture. The implementation correctly handles ALLOW, DENY, and ASK_USER policy decisions, and the accompanying tests are comprehensive. My review focuses on improving security by preventing potential information leaks in error messages and console logs, where sensitive query parameters could be exposed. Addressing these points will help establish a secure pattern for other tools to follow.
| if (decision === PolicyDecision.DENY) { | ||
| throw new Error(`Web search blocked by policy for query: "${this.params.query}"`); | ||
| } |
There was a problem hiding this comment.
Potential information leak. Throwing an error that includes the full search query can leak sensitive information if the query contains personal data. This information might end up in logs, which could be a security risk. It's safer to throw a more generic error message. The specific details of the blocked call can be logged internally by the policy engine if needed for debugging.
This change would also require updating the test in packages/core/src/tools/web-search.test.ts at line 294 to expect the new, generic error message.
| if (decision === PolicyDecision.DENY) { | |
| throw new Error(`Web search blocked by policy for query: "${this.params.query}"`); | |
| } | |
| if (decision === PolicyDecision.DENY) { | |
| throw new Error('Web search blocked by policy.'); | |
| } |
| onConfirm: async (outcome: ToolConfirmationOutcome) => { | ||
| // In the future, we might want to update rules based on the outcome | ||
| // For now, just log the decision | ||
| if (outcome === ToolConfirmationOutcome.ProceedAlways) { | ||
| // Could add a rule to always allow web searches with similar patterns | ||
| console.log(`User chose to always allow web searches like: "${this.params.query}"`); | ||
| } | ||
| }, |
There was a problem hiding this comment.
Using console.log in library code is generally discouraged as it can pollute the consumer application's logs and is not easily configurable. More importantly, logging the full query here could be an information leak if the query contains sensitive data. While this is a placeholder, it would be better to avoid logging potentially sensitive arguments directly. Consider logging a truncated or hashed version of the query, or emitting an event that the application can handle.
This change would also require updating the corresponding test in packages/core/src/tools/web-search.test.ts at lines 341-343.
onConfirm: async (outcome: ToolConfirmationOutcome) => {
// In the future, we might want to update rules based on the outcome
// For now, just log the decision
if (outcome === ToolConfirmationOutcome.ProceedAlways) {
// Could add a rule to always allow web searches with similar patterns
// Avoid logging the full query to prevent leaking sensitive information.
console.log(`User chose to always allow similar web searches.`);
}
}There was a problem hiding this comment.
agreed. generally remove this log before landing. If you want it, tweak this to console.debug so it is only shown when the app is run in debug mode. console.log will show for all users if they open the console.
| } | ||
|
|
||
| if (decision === PolicyDecision.DENY) { | ||
| throw new Error(`Web search blocked by policy for query: "${this.params.query}"`); |
There was a problem hiding this comment.
optional nit for discussion: don't love that this is throwing an exception. all things being equal Gemini CLI should not throw exceptions as part of normal operations (improves experience debugging with pause on caught exceptions). Fine with leaving as is but would be ideal if instead we could report this in the return value rather throwing errors to indicate rejections. Having rejections be throws of generic errors also increases the risk that internal bugs causing errors are treated the same as intentional policy rejections.
| // For now, just log the decision | ||
| if (outcome === ToolConfirmationOutcome.ProceedAlways) { | ||
| // Could add a rule to always allow web searches with similar patterns | ||
| console.log(`User chose to always allow web searches like: "${this.params.query}"`); |
There was a problem hiding this comment.
same comment about avoiding console.log
This commit addresses several comments from PR #7835. - Refactors `removeRulesForTool` in `PolicyEngine` to use `.filter()` for better readability. - Extracts `stableStringify` into its own utility file and makes `ruleMatches` a file-level function. - Adds structural types to message objects in `message-bus.test.ts` to improve type safety. - Makes `ToolExecutionSuccess` and `ToolExecutionFailure` interfaces generic for more flexible typing. - Fixes failing tests in `message-bus.test.ts`.
abhipatel12
left a comment
There was a problem hiding this comment.
Looks great in general! Just had a few questions!
| prompt: `Allow web search for: "${this.params.query}"?`, | ||
| onConfirm: async (outcome: ToolConfirmationOutcome) => { | ||
| // In the future, we might want to update rules based on the outcome | ||
| // For now, just log the decision |
There was a problem hiding this comment.
Just wondering.
For Always Allow, why aren't we updating the policy engine so that during the next web_fetch, it's run automatically? Why in the future?
| return { | ||
| type: 'info', | ||
| title: 'Confirm Web Search', | ||
| prompt: `Allow web search for: "${this.params.query}"?`, |
There was a problem hiding this comment.
Non-blocking on this PR and purely for discussion, I think it would be ideal if we could eventually transition our permission request to be event-driven as well.
In that world, we can separate concerns between the core and cli. For example, the title and prompt above, imo, seem like a CLI concern. In the core we can focus on just providing the base data required and the CLI can decide how it wants to render that information.
Summary
This PR implements the second phase of the Tool Confirmation Message Bus architecture (#7231) by integrating the PolicyEngine with an actual tool.
shouldConfirmExecutemethod that respects policy decisionsImplementation Details
The web-search tool now:
Test Coverage
Added tests that mock:
Future Work
As noted, as we integrate this functionality into other tools, we will:
This PR intentionally keeps the implementation in the specific tool to demonstrate the pattern clearly before abstraction.
Related