Component
API or orchestration
Describe the feature
When the Bedrock Guardrail blocks a task during hydration, the error is completely generic - "Task context blocked by content policy". The Bedrock response includes assessment details (filter type, confidence level) but screenWithGuardrail() discards all of it and just returns a string. The guardrail_blocked event and bgagent status only show the generic message. The user has no idea what triggered the block or how to work around it.
Use case
I submitted a task from a GitHub issue that was literally "create a uv scaffolder, project name xyz, version 0.0.1, license MIT" and it got blocked by the PROMPT_ATTACK filter at HIGH as a false positive. I had no way to understand what triggered it, no way to debug, and no way to move forward. I had to dig into the code to figure out what was going on.
Proposed solution
The API is authenticated and bgagent events <task-id> already exists, so there's a safe channel to surface this:
- Pipe the Bedrock assessment through -
screenWithGuardrail() should return the filter type and confidence from the response, include that in the guardrail_blocked event metadata, so bgagent events shows something like "PROMPT_ATTACK filter triggered at HIGH confidence" instead of just "Task context blocked by content policy"
- Better error in
bgagent status - something actionable, like suggesting the user resubmit with --task instead of --issue to bypass the issue content that triggered the filter
I saw that the roadmap (Iteration 5) already mentions per-repo guardrail configuration. Even when it lands, I think the assessment details would help users understand and work around false positives.
Other information
No response
Acknowledgements
Component
API or orchestration
Describe the feature
When the Bedrock Guardrail blocks a task during hydration, the error is completely generic - "Task context blocked by content policy". The Bedrock response includes assessment details (filter type, confidence level) but
screenWithGuardrail()discards all of it and just returns a string. Theguardrail_blockedevent andbgagent statusonly show the generic message. The user has no idea what triggered the block or how to work around it.Use case
I submitted a task from a GitHub issue that was literally "create a uv scaffolder, project name xyz, version 0.0.1, license MIT" and it got blocked by the
PROMPT_ATTACKfilter atHIGHas a false positive. I had no way to understand what triggered it, no way to debug, and no way to move forward. I had to dig into the code to figure out what was going on.Proposed solution
The API is authenticated and
bgagent events <task-id>already exists, so there's a safe channel to surface this:screenWithGuardrail()should return the filter type and confidence from the response, include that in theguardrail_blockedevent metadata, sobgagent eventsshows something like "PROMPT_ATTACK filter triggered at HIGH confidence" instead of just "Task context blocked by content policy"bgagent status- something actionable, like suggesting the user resubmit with--taskinstead of--issueto bypass the issue content that triggered the filterI saw that the roadmap (Iteration 5) already mentions per-repo guardrail configuration. Even when it lands, I think the assessment details would help users understand and work around false positives.
Other information
No response
Acknowledgements