Agent Supervision System and Open Source Model Enhancements#206
Merged
Agent Supervision System and Open Source Model Enhancements#206
Conversation
…dates Bumps the npm_and_yarn group with 2 updates in the /frontend directory: [minimatch](https://github.com/isaacs/minimatch) and [rollup](https://github.com/rollup/rollup). Updates `minimatch` from 3.1.2 to 3.1.5 - [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md) - [Commits](isaacs/minimatch@v3.1.2...v3.1.5) Updates `rollup` from 4.53.1 to 4.59.0 - [Release notes](https://github.com/rollup/rollup/releases) - [Changelog](https://github.com/rollup/rollup/blob/master/CHANGELOG.md) - [Commits](rollup/rollup@v4.53.1...v4.59.0) --- updated-dependencies: - dependency-name: minimatch dependency-version: 3.1.5 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: rollup dependency-version: 4.59.0 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com>
The performAgentChain loop has no iteration cap, allowing infinite loops when a model repeatedly calls the same tool. The repeating detector returns a message (not an error), so the loop never breaks. Add two safety mechanisms: - Hard cap of 100 iterations on the main agent chain loop - Escalation to error after 5 consecutive repeating detections (8 total identical calls: 3 before first detection + 5 detections) The soft "please try another tool" response is preserved for the first 4 detections, giving the LLM a chance to course-correct before aborting. Closes #175 Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
- Rename maxRepeatingDetectionsBeforeErr to maxSoftDetectionsBeforeAbort
for clarity (name now matches behavior: 4 soft warnings before abort)
- Adjust threshold value from 5 to 4 and remove -1 from condition
(same runtime behavior: abort on 7th consecutive identical call)
- Use errors.New() instead of fmt.Errorf("%s", msg) for non-formatted
error strings (more idiomatic Go)
Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Detached terminal commands (detach=true) inherit the parent context. When the parent context is canceled (e.g., agent delegation timeout), the detached goroutine's ctx.Done() fires and kills the background command, even though it has its own timeout. Use context.WithoutCancel(ctx) for the detached goroutine. This preserves context values (tracing, logging) but prevents parent cancellation from propagating. The command's own timeout via context.WithTimeout in getExecResult continues to work. Non-detached commands are unchanged and still respect parent cancellation. Closes #176 Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Validates the core fix: detached goroutine must survive parent context cancellation (context.WithoutCancel behavior). TestExecCommandDetachSurvivesParentCancel: - Starts detach=true command, cancels parent ctx after quick return - Asserts goroutine does NOT see cancellation (ctxWasCanceled=false) - This test would FAIL without context.WithoutCancel TestExecCommandNonDetachRespectsParentCancel: - Starts detach=false command, cancels parent ctx after 200ms - Asserts command DOES fail with context error - Ensures WithoutCancel was NOT applied to non-detach path Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Add comprehensive test coverage for the repeating tool call detection logic that guards against infinite agent chain loops (related to #175). TestRepeatingDetector (9 cases): - nil function call, first/second/third identical calls - threshold triggering at RepeatingToolCallThreshold (3) - funcCalls reset on different call - escalation threshold validation (6 vs 7 consecutive calls) - argument normalization (message field stripping, key ordering) TestRepeatingDetectorEscalationThreshold: - Validates escalation math: abort at len >= threshold + 4 = 7 TestClearCallArguments (3 cases): - message field stripping, key sorting, invalid JSON passthrough Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
- Replace hardcoded +4 with testMaxSoftDetectionsBeforeAbort constant with sync comment pointing to performer.go - Add test case for same function name with different non-message args resetting funcCalls (covers the other reset condition in detect()) Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Add config loading, provider type, models loading, model prefix, missing API key, and usage tests for the four Chinese LLM providers that were expanded in PR #185 but had no test coverage. GLM tests use non-negative price assertions to accommodate free-tier models (glm-4.7-flash, glm-4.5-flash). Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
…nd/npm_and_yarn-8a6d6a6aaf chore(deps): bump the npm_and_yarn group across 1 directory with 2 updates
fix: prevent infinite loop in performAgentChain on repeating tool calls
…tion fix: isolate detached command context from parent cancellation
…eration-cap test: add unit tests for repeatingDetector and clearCallArguments
test: add unit tests for DeepSeek, GLM, Kimi, and Qwen providers
…anning - Added new environment variables for execution monitoring and agent planning in `.env.example` and updated `docker-compose.yml` to include these configurations. - Implemented execution monitoring features, including thresholds for tool call limits and automatic mentor intervention. - Introduced task planning capabilities for agents to generate structured execution plans. - Updated documentation to reflect new agent supervision settings and their usage. This update aims to improve the reliability and efficiency of agent operations in complex scenarios.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of the Change
Problem
Smaller open source models (< 32B parameters) required additional supervision to match cloud API quality. Agents could get stuck in loops or miss optimal attack paths. Vector store searches returned insufficient results due to single-query limitation. Target information leakage in vector memory could cause agents to switch targets mid-attack. No production-grade guide existed for air-gapped deployments with local LLM inference.
Solution
Implemented multi-layered agent supervision system with:
Closes #175, #176
Type of Change
Areas Affected
Testing and Verification
Test Configuration
Test Steps
go test ./...)Test Results
Security Considerations
Enhanced Security:
No New Attack Vectors:
Performance Impact
With Supervision Features Enabled (for models < 32B):
Multi-Query Vector Search:
Infinite Loop Prevention:
vLLM Performance (Qwen3.5-27B-FP8 on 4× RTX 5090):
Documentation Updates
Deployment Notes
New Environment Variables (Optional - Beta Features Disabled by Default):
Database Migrations:
20260310_153000_agent_supervision.sql- Adds supervision-related columns and indexesRecommended Configuration for Open Source Models < 32B:
Enable supervision features for 2x quality improvement:
Configure adviser with enhanced settings (see examples/configs/vllm-qwen3.5-27b-fp8.provider.yml)
Compatibility:
Checklist
Code Quality
go fmtandgo vet(for Go code)npm run lint(for TypeScript/JavaScript code)Security
Compatibility
Documentation
Additional Notes
Key Changes by Category
🤖 Agent Supervision System (New - Beta)
Execution Monitoring (PRs #178, #179, #180):
executionMonitorDetectortracks tool call patterns (same tool: 5, total: 10 thresholds)performMentorinvokes adviser agent for execution analysis<mentor_analysis>and<original_result>sectionsEXECUTION_MONITOR_ENABLED,EXECUTION_MONITOR_SAME_TOOL_LIMIT,EXECUTION_MONITOR_TOTAL_TOOL_LIMITTask Planning:
performPlannergenerates 3-7 step execution plans via adviser in planning modequestion_task_planner.tmplprompt template for structured planningtask_assignment_wrapper.tmplwraps requests with execution plansAGENT_PLANNING_STEP_ENABLEDTool Call Limits:
MAX_GENERAL_AGENT_TOOL_CALLS,MAX_LIMITED_AGENT_TOOL_CALLSPrompt Templates:
question_execution_monitor.tmplfor mentor invocationsquestion_task_planner.tmplfor plan generationtask_assignment_wrapper.tmplfor wrapping specialist requests🧠 Memory System Enhancements
Multi-Query Vector Search:
search_in_memory,search_guide,search_answer,search_codenow support up to 5 simultaneous queriesqueriesarray parameter replaces singlequerystringTarget Anonymization:
memory_utils.gowith IP/domain anonymization functions203.0.113.1→TARGET_IP_1,example.com→TARGET_DOMAIN_1)memory_utils_test.goEnhanced Storage:
🐛 Critical Bug Fixes
Infinite Loop Prevention (PR #178, Closes #175):
performAgentChainmain loopmaxSoftDetectionsBeforeAbortwith sync commentsContext Isolation for Detached Commands (PR #179, Closes #176):
context.WithoutCancel()for detached goroutinesTestExecCommandDetachSurvivesParentCancel,TestExecCommandNonDetachRespectsParentCancel🧪 Test Coverage
Chinese Provider Tests (PR #189):
Repeating Detector Tests (PR #180):
TestRepeatingDetector: 9 cases covering threshold triggering, reset logic, escalationTestRepeatingDetectorEscalationThreshold: validates abort math (threshold + 4 = 7)TestClearCallArguments: message field stripping, key sorting, invalid JSON passthroughConfig Tests:
backend/pkg/config/config_test.gowith supervision settings validation📚 Documentation & Guides
vLLM + Qwen3.5-27B-FP8 Guide (New):
examples/guides/vllm-qwen35-27b-fp8.mdProvider Configurations:
vllm-qwen3.5-27b-fp8.provider.ymlwith optimal sampling parametersvllm-qwen3.5-27b-fp8-no-think.provider.ymlfor faster inference without thinkingConfiguration Documentation:
backend/docs/config.md: New Agent Supervision Settings section with usage details, recommended settings, supervision system integrationbackend/docs/flow_execution.md: Expanded Advanced Agent Supervision section with execution monitoring, task planning, integration diagramsfalsefor beta featuresREADME Updates:
🔧 Installer Enhancements
AI Agents Settings Form Expansion:
createBooleanField,createIntegerField,validateBooleanField,validateIntegerField,formatNumberGetAIAgentsConfig,UpdateAIAgentsConfig,ResetAIAgentsConfigOllama Enhancements:
OLLAMA_SERVER_API_KEYfor Ollama Cloud supportPENTAGI_OLLAMA_DIRvolume configurationmin_pparameter support across all providers📦 Dependencies
langchaingo v0.1.14-update.5:
🎨 Code Quality
ftester Improvements:
Provider Enhancements:
Merged Pull Requests
Issues Addressed
Migration Path
For existing deployments:
git pull origin feature/next_releasedocker compose builddocker compose up -dEXECUTION_MONITOR_ENABLED=trueandAGENT_PLANNING_STEP_ENABLED=truein.envexamples/guides/vllm-qwen35-27b-fp8.mdNo manual intervention required. Existing deployments continue working without changes.