Exploratory testing discovered that the logs and audit tools return different behavior_fingerprint values for the exact same workflow run, indicating a data consistency issue.
Problem Description
When comparing the behavior_fingerprint for the same run ID (23701814578) between the logs and audit tools, the values differ significantly:
| Field |
logs tool output |
audit tool output |
execution_style |
"directed" |
"exploratory" |
resource_profile |
"lean" |
"heavy" |
agentic_fraction |
0 |
0.5 |
tool_breadth |
"narrow" |
"narrow" ✅ |
actuation_style |
"read_only" |
"read_only" ✅ |
dispatch_mode |
"standalone" |
"standalone" ✅ |
Three out of six fields differ for the same run. This suggests the fingerprint computation is not deterministic or uses different data sources depending on which tool is called.
Tool
- Tool:
audit and logs (both affected)
- Run: §23701814578 — "GPL Dependency Cleaner (gpclean)"
Steps to Reproduce
- Call
logs tool with workflow_name: "GPL Dependency Cleaner (gpclean)" and count: 1
- Observe
behavior_fingerprint for run 23701814578:
{"execution_style":"directed","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"lean","dispatch_mode":"standalone","agentic_fraction":0}
- Call
audit tool with run_id_or_url: "23701814578"
- Observe
behavior_fingerprint for the same run:
{"execution_style":"exploratory","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"heavy","dispatch_mode":"standalone","agentic_fraction":0.5}
Expected Behavior
Both tools should return identical behavior_fingerprint values for the same run, since the fingerprint should be a deterministic function of the run's actual execution data.
Actual Behavior
Three fields (execution_style, resource_profile, agentic_fraction) differ between the two tools for the same run.
Impact
- Severity: High — users who use
logs for quick scanning and audit for deep-dives will see conflicting signals about a run's execution behavior
- Frequency: Observed consistently for the same run ID
- Affected: Any analysis or alerting built on top of
behavior_fingerprint data
- Workaround: None — the values differ per tool call
Hypothesis
The logs tool may be computing the fingerprint from a cached/lightweight summary, while the audit tool recomputes from raw log data. The discrepancy could be due to:
- Different data sources (cached vs. live computation)
- Different algorithms or weighting for field calculations
- A caching issue where the
logs summary was generated before full log processing completed
Additional Observations from Testing Session
Other observations from the same testing session (Run ID: 23702146707):
audit with invalid run ID returns a raw MCP error -32603: failed to fetch run metadata instead of a user-friendly error message
logs with non-existent workflow name returns MCP error -32602 instead of a structured empty response with an informative message
compile tool requires .md suffix (e.g., ace-editor.md) while logs and status tools use display names or IDs without extension — inconsistent naming convention across tools
- Failed run audit (
run 23701844529) does not surface the specific error message (Authentication failed) in the errors field, even though it's visible in the downloaded detection.log
Environment
- Repository: github/gh-aw
- Testing Run ID: 23702146707
- Date: 2026-03-29
- Affected Run: §23701814578
Generated by Daily CLI Tools Exploratory Tester · ◷
Exploratory testing discovered that the
logsandaudittools return differentbehavior_fingerprintvalues for the exact same workflow run, indicating a data consistency issue.Problem Description
When comparing the
behavior_fingerprintfor the same run ID (23701814578) between thelogsandaudittools, the values differ significantly:logstool outputaudittool outputexecution_style"directed""exploratory"resource_profile"lean""heavy"agentic_fraction00.5tool_breadth"narrow""narrow"✅actuation_style"read_only""read_only"✅dispatch_mode"standalone""standalone"✅Three out of six fields differ for the same run. This suggests the fingerprint computation is not deterministic or uses different data sources depending on which tool is called.
Tool
auditandlogs(both affected)Steps to Reproduce
logstool withworkflow_name: "GPL Dependency Cleaner (gpclean)"andcount: 1behavior_fingerprintfor run23701814578:{"execution_style":"directed","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"lean","dispatch_mode":"standalone","agentic_fraction":0}audittool withrun_id_or_url: "23701814578"behavior_fingerprintfor the same run:{"execution_style":"exploratory","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"heavy","dispatch_mode":"standalone","agentic_fraction":0.5}Expected Behavior
Both tools should return identical
behavior_fingerprintvalues for the same run, since the fingerprint should be a deterministic function of the run's actual execution data.Actual Behavior
Three fields (
execution_style,resource_profile,agentic_fraction) differ between the two tools for the same run.Impact
logsfor quick scanning andauditfor deep-dives will see conflicting signals about a run's execution behaviorbehavior_fingerprintdataHypothesis
The
logstool may be computing the fingerprint from a cached/lightweight summary, while theaudittool recomputes from raw log data. The discrepancy could be due to:logssummary was generated before full log processing completedAdditional Observations from Testing Session
Other observations from the same testing session (Run ID: 23702146707):
auditwith invalid run ID returns a raw MCP error-32603: failed to fetch run metadatainstead of a user-friendly error messagelogswith non-existent workflow name returns MCP error-32602instead of a structured empty response with an informative messagecompiletool requires.mdsuffix (e.g.,ace-editor.md) whilelogsandstatustools use display names or IDs without extension — inconsistent naming convention across toolsrun 23701844529) does not surface the specific error message (Authentication failed) in theerrorsfield, even though it's visible in the downloadeddetection.logEnvironment