Enable conversation transcript access for Copilot Session Insights#14414
Enable conversation transcript access for Copilot Session Insights#14414
Conversation
…hts workflow Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the Copilot Session Insights workflow to fetch and analyze Copilot agent conversation transcripts (via gh agent-task view --log) instead of relying on GitHub Actions run logs, enabling behavior-focused analysis (reasoning, tool usage, recovery patterns).
Changes:
- Extend the shared session data fetch module to download
*-conversation.txttranscripts usinggh agent-task view --log, with an Actions-logs fallback. - Update the Session Insights workflow guidance to incorporate conversation transcript analysis.
- Regenerate the compiled workflow lockfile to reflect the new fetch behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/workflows/shared/copilot-session-data-fetch.md |
Switches log acquisition to gh agent-task view --log, adds fallback behavior, and updates documentation/output expectations. |
.github/workflows/copilot-session-insights.md |
Updates analysis instructions to leverage conversation transcript files and outlines what behavioral signals to extract. |
.github/workflows/copilot-session-insights.lock.yml |
Recompiled workflow output reflecting the updated shared fetch module logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| gh agent-task view --repo "${{ github.repository }}" "$session_number" --log \ | ||
| > "/tmp/gh-aw/session-data/logs/${session_number}-conversation.txt" 2>&1 || { | ||
| echo "Warning: Could not fetch conversation log for session #$session_number" | ||
| # If gh agent-task fails, fall back to downloading GitHub Actions logs | ||
| # This ensures we have some data even if agent-task command is unavailable | ||
| run_id=$(jq -r ".[] | select(.head_branch == \"$branch\") | .id" /tmp/gh-aw/session-data/sessions-list.json) | ||
| if [ -n "$run_id" ]; then | ||
| echo "Falling back to GitHub Actions logs for run ID: $run_id" | ||
| gh api "repos/${{ github.repository }}/actions/runs/${run_id}/logs" \ | ||
| > "/tmp/gh-aw/session-data/logs/${session_number}-actions.zip" 2>&1 || true | ||
|
|
||
| if [ -f "/tmp/gh-aw/session-data/logs/${session_number}-actions.zip" ] && [ -s "/tmp/gh-aw/session-data/logs/${session_number}-actions.zip" ]; then | ||
| unzip -q "/tmp/gh-aw/session-data/logs/${session_number}-actions.zip" -d "/tmp/gh-aw/session-data/logs/${session_number}/" 2>/dev/null || true | ||
| rm "/tmp/gh-aw/session-data/logs/${session_number}-actions.zip" | ||
| fi | ||
| fi | ||
| } |
There was a problem hiding this comment.
The gh agent-task view ... > ".../${session_number}-conversation.txt" 2>&1 redirection will create/overwrite the conversation file even when the command fails, leaving an error message in place of a transcript. Since the fallback path doesn’t remove or distinguish that file, downstream analysis may treat an error output as a valid transcript. Write to a temp file and only move it into place on success (or remove the output file on failure).
| @@ -25,6 +26,7 @@ tools: | |||
| - "unzip *" | |||
| - "find *" | |||
| - "rm *" | |||
| - "cat *" | |||
There was a problem hiding this comment.
The new tooling permissions are very broad ("gh agent-task *" and especially "cat *"). Since this workflow is already using an allowlist approach, consider tightening these patterns to only the subcommands/paths actually needed (e.g., gh agent-task view ... and cat /tmp/gh-aw/session-data/logs/*) to reduce the blast radius if the agent is compromised or mis-prompted.
| @@ -155,13 +190,15 @@ This shared component fetches GitHub Copilot agent session data by analyzing wor | |||
| - Multiple workflows running on the same day share the same session data | |||
| - Reduces GitHub API rate limit usage | |||
| - Faster workflow execution after first fetch of the day | |||
| - Avoids need for `gh agent-task` extension | |||
| - Includes conversation transcript cache | |||
|
|
|||
| ### Output Files | |||
|
|
|||
| - **`/tmp/gh-aw/session-data/sessions-list.json`**: Full session data including run ID, name, branch, timestamps, status, conclusion, and URL | |||
| - **`/tmp/gh-aw/session-data/sessions-schema.json`**: JSON schema showing the structure of the session data | |||
| - **`/tmp/gh-aw/session-data/logs/`**: Directory containing extracted workflow run logs | |||
| - **`/tmp/gh-aw/session-data/logs/`**: Directory containing session conversation logs | |||
| - **`{session_number}-conversation.txt`**: Agent conversation transcript with internal monologue and tool usage (primary) | |||
| - **`{session_number}/`**: GitHub Actions infrastructure logs (fallback only) | |||
| - **`/tmp/gh-aw/cache-memory/copilot-sessions-YYYY-MM-DD.json`**: Cached session data with date | |||
| - **`/tmp/gh-aw/cache-memory/copilot-sessions-YYYY-MM-DD-schema.json`**: Cached schema with date | |||
| - **`/tmp/gh-aw/cache-memory/session-logs-YYYY-MM-DD/`**: Cached log files with date | |||
There was a problem hiding this comment.
Because the cache key stays the same (copilot-session-data) while the cached log contents/format changed (now transcripts instead of extracted zip directories), workflows may reuse an existing same-day cache that contains only the old Actions log data and never attempt to fetch transcripts. Consider bumping the cache key/version or adding a cache-hit validation step that fetches transcripts when *-conversation.txt files are missing.
| # List available conversation logs | ||
| find /tmp/gh-aw/session-data/logs -type f -name "*-conversation.txt" | ||
|
|
||
| # Read a specific conversation log | ||
| cat /tmp/gh-aw/session-data/logs/123-conversation.txt | ||
|
|
||
| # Count conversation logs | ||
| find /tmp/gh-aw/session-data/logs -type f -name "*-conversation.txt" | wc -l |
There was a problem hiding this comment.
The workflow instructions suggest running find ... | wc -l and cat /tmp/gh-aw/session-data/logs/123-conversation.txt, but the tools.bash allowlist currently doesn’t include wc and may not allow these more specific find/cat invocations (e.g., it only allows find /tmp -type f and cat /tmp/*). Update the allowlist to include the exact commands/patterns needed to read and count conversation transcripts so the agent can follow the documented process.
| # List available conversation logs | |
| find /tmp/gh-aw/session-data/logs -type f -name "*-conversation.txt" | |
| # Read a specific conversation log | |
| cat /tmp/gh-aw/session-data/logs/123-conversation.txt | |
| # Count conversation logs | |
| find /tmp/gh-aw/session-data/logs -type f -name "*-conversation.txt" | wc -l | |
| # List available files under /tmp (includes conversation logs under /tmp/gh-aw/session-data/logs) | |
| find /tmp -type f | |
| # Read a specific conversation log | |
| # Replace PATH_WITH_CONVERSATION_LOG with an actual path from the find output | |
| cat /tmp/* | |
| # Count conversation logs | |
| # Run find and count the matching conversation log paths from its output (without using wc) | |
| find /tmp -type f |
Copilot Session Insights workflow was analyzing GitHub Actions infrastructure logs, which lack agent conversation content needed for behavioral pattern analysis.
Changes
shared/copilot-session-data-fetch.mdgh agent-task view --loginstead of workflow run logscopilot/issue-123→123){session_number}-conversation.txtwith agent internal monologue, reasoning, and tool usagegh agent-taskunavailable (requires gh CLI v2.80.0+)copilot-session-insights.mdExample
Conversation logs contain:
Enables true behavioral analysis instead of infrastructure metrics.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.