fix(canister-security): correct inspect_message bypass wording; add evals#185
Open
fix(canister-security): correct inspect_message bypass wording; add evals#185
Conversation
… evals Replaces incorrect "malicious boundary node" wording — inspect_message runs on a single replica, so it is a malicious replica node (not a boundary node) that can skip the check. Wording now matches the official IC security docs. Adds evaluations/canister-security.json with 8 output evals and 17 trigger evals. Strongest signal: the callback-trap/finally eval scores 4/4 with the skill vs 1/4 without — Claude defaults to try/catch (wrong) and never surfaces the IC-specific cleanup-context semantics of finally.
Skill Validation ReportValidating skill: /home/runner/work/icskills/icskills/skills/canister-securityStructure
Frontmatter
Markdown
Tokens
Content Analysis
Contamination Analysis
Result: passed Project Checks |
Replace three code-writing prompts with adversarial code-review prompts that lead agents toward wrong answers without the skill: - Global boolean reentrancy guard (misses finally + per-caller locking) - Balance deducted after await (TOCTOU, misses trap handling) - Serializing heap data in preupgrade (misses instruction limit + persistent actor) All three now show skill uplift: 5/5 vs 4/5, 5/5 vs 4/5, 4/4 vs 2/4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eval results
Output evals — WITH skill vs WITHOUT skill baseline
Trigger evals