From 1b9a35c89e71f2b55c1cc4b8b820bf5add33592d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 4 Jan 2026 05:56:15 +0000 Subject: [PATCH 1/5] Initial plan From e7d4de403090e34582144484afe4c9b85ae4829e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 4 Jan 2026 06:04:23 +0000 Subject: [PATCH 2/5] =?UTF-8?q?Refactor=20copilot-session-insights=20workf?= =?UTF-8?q?low=20(748=E2=86=92403=20lines)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Extract chart generation to shared/session-analysis-charts.md - Extract analysis strategies to shared/session-analysis-strategies.md - Remove 345 lines of redundant content (46% reduction) - Workflow compiles successfully Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- .../copilot-session-insights.lock.yml | 936 +++++++----------- .github/workflows/copilot-session-insights.md | 385 +------ .../shared/session-analysis-charts.md | 118 +++ .../shared/session-analysis-strategies.md | 241 +++++ 4 files changed, 749 insertions(+), 931 deletions(-) create mode 100644 .github/workflows/shared/session-analysis-charts.md create mode 100644 .github/workflows/shared/session-analysis-strategies.md diff --git a/.github/workflows/copilot-session-insights.lock.yml b/.github/workflows/copilot-session-insights.lock.yml index 6d670fbec03..43083e2e4d1 100644 --- a/.github/workflows/copilot-session-insights.lock.yml +++ b/.github/workflows/copilot-session-insights.lock.yml @@ -24,8 +24,9 @@ # Resolved workflow manifest: # Imports: # - shared/copilot-session-data-fetch.md +# - shared/session-analysis-charts.md +# - shared/session-analysis-strategies.md # - shared/reporting.md -# - shared/trends.md # - shared/jqschema.md # - shared/python-dataviz.md @@ -505,188 +506,346 @@ jobs: cat << 'PROMPT_EOF' > "$GH_AW_PROMPT" - ## Report Structure + # Session Analysis Chart Generation - 1. **Overview**: 1-2 paragraphs summarizing key findings - 2. **Details**: Use `
Full Report` for expanded content + You are an expert at creating session analysis trend charts that reveal insights about Copilot agent session patterns over time. - ## Workflow Run References + ## ๐Ÿ“Š Chart Generation Requirements - - Format run IDs as links: `[ยง12345](https://github.com/owner/repo/actions/runs/12345)` - - Include up to 3 most relevant run URLs at end under `**References:**` - - Do NOT add footer attribution (system adds automatically) + **IMPORTANT**: Generate exactly 2 trend charts that showcase Copilot agent session patterns over time. - # Trends Visualization Guide + ### Chart Generation Process - You are an expert at creating compelling trend visualizations that reveal insights from data over time. + **Phase 1: Data Collection** - ## Trending Chart Best Practices + Collect data for the past 30 days (or available data) from cache memory and session logs: - When generating trending charts, focus on: + 1. **Session Completion Data**: + - Count of sessions completed successfully per day + - Count of sessions failed/abandoned per day + - Completion rate percentage per day - ### 1. **Time Series Excellence** - - Use line charts for continuous trends over time - - Add trend lines or moving averages to highlight patterns - - Include clear date/time labels on the x-axis - - Show confidence intervals or error bands when relevant + 2. **Session Duration Data**: + - Average session duration per day (in minutes) + - Median session duration per day + - Number of sessions with loops/retries - ### 2. **Comparative Trends** - - Use multi-line charts to compare multiple trends - - Apply distinct colors for each series with a clear legend - - Consider using area charts for stacked trends - - Highlight key inflection points or anomalies + **Phase 2: Data Preparation** - ### 3. **Visual Impact** - - Use vibrant, contrasting colors to make trends stand out - - Add annotations for significant events or milestones - - Include grid lines for easier value reading - - Use appropriate scale (linear vs. logarithmic) + 1. Create CSV files in `/tmp/gh-aw/python/data/` with the collected data: + - `session_completion.csv` - Daily completion counts and rates + - `session_duration.csv` - Daily duration statistics - ### 4. **Contextual Information** - - Show percentage changes or growth rates - - Include baseline comparisons (year-over-year, month-over-month) - - Add summary statistics (min, max, average, median) - - Highlight recent trends vs. historical patterns + 2. Each CSV should have a date column and metric columns with appropriate headers - ## Example Trend Chart Types + **Phase 3: Chart Generation** - ### Temporal Trends - ```python - # Line chart with multiple trends - fig, ax = plt.subplots(figsize=(12, 7), dpi=300) - for column in data.columns: - ax.plot(data.index, data[column], marker='o', label=column, linewidth=2) - ax.set_title('Trends Over Time', fontsize=16, fontweight='bold') - ax.set_xlabel('Date', fontsize=12) - ax.set_ylabel('Value', fontsize=12) - ax.legend(loc='best') - ax.grid(True, alpha=0.3) - plt.xticks(rotation=45) - ``` + Generate exactly **2 high-quality trend charts**: - ### Growth Rates - ```python - # Bar chart showing period-over-period growth - fig, ax = plt.subplots(figsize=(10, 6), dpi=300) - growth_data.plot(kind='bar', ax=ax, color=sns.color_palette("husl")) - ax.set_title('Growth Rates by Period', fontsize=16, fontweight='bold') - ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8) - ax.set_ylabel('Growth %', fontsize=12) + **Chart 1: Session Completion Trends** + - Multi-line chart showing: + - Successful completions (line, green) + - Failed/abandoned sessions (line, red) + - Completion rate percentage (line with secondary y-axis) + - X-axis: Date (last 30 days) + - Y-axis: Count (left), Percentage (right) + - Save as: `/tmp/gh-aw/python/charts/session_completion_trends.png` + + **Chart 2: Session Duration & Efficiency** + - Dual visualization showing: + - Average session duration (line) + - Median session duration (line) + - Sessions with loops (bar chart overlay) + - X-axis: Date (last 30 days) + - Y-axis: Duration in minutes + - Save as: `/tmp/gh-aw/python/charts/session_duration_trends.png` + + **Chart Quality Requirements**: + - DPI: 300 minimum + - Figure size: 12x7 inches for better readability + - Use seaborn styling with a professional color palette + - Include grid lines for easier reading + - Clear, large labels and legend + - Title with context (e.g., "Session Completion Rates - Last 30 Days") + - Annotations for significant changes or anomalies + + **Phase 4: Upload Charts** + + 1. Upload both charts using the `upload asset` tool + 2. Collect the returned URLs for embedding in the discussion + + **Phase 5: Embed Charts in Discussion** + + Include the charts in your analysis report with this structure: + + ```markdown + ## ๐Ÿ“ˆ Session Trends Analysis + + ### Completion Patterns + ![Session Completion Trends](URL_FROM_UPLOAD_ASSET_CHART_1) + + [Brief 2-3 sentence analysis of completion trends, highlighting improvements in success rates or concerning patterns] + + ### Duration & Efficiency + ![Session Duration Trends](URL_FROM_UPLOAD_ASSET_CHART_2) + + [Brief 2-3 sentence analysis of session duration patterns, noting efficiency improvements or areas needing attention] ``` - ### Moving Averages - ```python - # Trend with moving average overlay - fig, ax = plt.subplots(figsize=(12, 7), dpi=300) - ax.plot(dates, values, label='Actual', alpha=0.5, linewidth=1) - ax.plot(dates, moving_avg, label='7-day Moving Average', linewidth=2.5) - ax.fill_between(dates, values, moving_avg, alpha=0.2) + ### Python Implementation Notes + + - Use pandas for data manipulation and date handling + - Use matplotlib.pyplot and seaborn for visualization + - Set appropriate date formatters for x-axis labels + - Use `plt.xticks(rotation=45)` for readable date labels + - Apply `plt.tight_layout()` before saving + - Handle cases where data might be sparse or missing + + ### Error Handling + + If insufficient data is available (less than 7 days): + - Generate the charts with available data + - Add a note in the analysis mentioning the limited data range + - Consider using a bar chart instead of line chart for very sparse data + + # Session Analysis Strategies + + Comprehensive strategies for analyzing Copilot agent sessions to extract insights, identify patterns, and recommend improvements. + + ## Standard Analysis Strategies + + These strategies should be applied to every session analysis: + + ### 1. Completion Analysis + - Did the session complete successfully? + - Was the task abandoned or aborted? + - Look for error messages or failure indicators + - Track completion rate + + ### 2. Loop Detection + - Identify repetitive agent responses + - Detect circular reasoning or stuck patterns + - Count iteration loops without progress + - Flag sessions with excessive retries + + ### 3. Prompt Structure Analysis + - Analyze task description clarity + - Identify effective prompt patterns + - Cluster prompts by keywords or structure + - Correlate prompt quality with success + + ### 4. Context Confusion Detection + - Look for signs of missing context + - Identify requests for clarification + - Track contextual misunderstandings + - Note when agent asks for more information + + ### 5. Error Recovery Analysis + - How does the agent handle errors? + - Track error types and recovery strategies + - Measure time to recover from failures + - Identify successful vs. failed recoveries + + ### 6. Tool Usage Patterns + - Which tools are used most frequently? + - Are tools used effectively? + - Identify missing or unavailable tools + - Track tool execution success rates + + ## Experimental Strategies (30% of runs) + + **Determine if this is an experimental run**: + ```bash + # Generate random number between 0-100 + RANDOM_VALUE=$((RANDOM % 100)) + # If value < 30, this is an experimental run ``` - ## Data Preparation for Trends + **Novel Analysis Methods to Try** (rotate through these): + + ### 1. Semantic Clustering + - Group prompts by semantic similarity + - Identify common themes across sessions + - Find outlier prompts that perform differently + - Use keyword extraction and comparison + + ### 2. Temporal Analysis + - Analyze session duration patterns + - Identify time-of-day effects + - Track performance trends over time + - Correlate timing with success rates + + ### 3. Code Quality Metrics + - If sessions produce code, analyze quality + - Check for test coverage mentions + - Look for documentation updates + - Track code review feedback + + ### 4. User Interaction Patterns + - Analyze back-and-forth exchanges + - Measure clarification request frequency + - Track user guidance effectiveness + - Identify optimal interaction patterns + + ### 5. Cross-Session Learning + - Compare similar tasks across sessions + - Identify improvement over time + - Track recurring issues + - Find evolving solution strategies - ### Time-Based Indexing - ```python - # Convert to datetime and set as index - data['date'] = pd.to_datetime(data['date']) - data.set_index('date', inplace=True) - data = data.sort_index() + **Record Experimental Results**: + - Store strategy name and description + - Record what was measured + - Note insights discovered + - Save to cache for future reference + + ## Data Collection per Session + + For each session, collect: + - **Session ID**: Unique identifier + - **Timestamp**: When the session occurred + - **Task Type**: Category of task (bug fix, feature, refactor, etc.) + - **Duration**: Time from start to completion + - **Status**: Success, failure, abandoned, in-progress + - **Loop Count**: Number of repetitive cycles detected + - **Tool Usage**: List of tools used and their success + - **Error Count**: Number of errors encountered + - **Prompt Quality Score**: Assessed quality (1-10) + - **Context Issues**: Boolean flag for confusion detected + - **Notes**: Any notable observations + + ## Cache Memory Management + + ### Cache Structure + ``` + /tmp/gh-aw/cache-memory/session-analysis/ + โ”œโ”€โ”€ history.json # Historical analysis results + โ”œโ”€โ”€ strategies.json # Discovered analytical strategies + โ””โ”€โ”€ patterns.json # Known behavioral patterns ``` - ### Resampling and Aggregation - ```python - # Resample daily data to weekly - weekly_data = data.resample('W').mean() + ### Initialize Cache + + If cache files don't exist, create them with initial structure: + ```bash + mkdir -p /tmp/gh-aw/cache-memory/session-analysis/ - # Calculate rolling statistics - data['rolling_mean'] = data['value'].rolling(window=7).mean() - data['rolling_std'] = data['value'].rolling(window=7).std() + cat > /tmp/gh-aw/cache-memory/session-analysis/history.json << 'EOF' + { + "analyses": [], + "last_updated": "YYYY-MM-DD", + "version": "1.0" + } + EOF ``` - ### Growth Calculations - ```python - # Calculate percentage change - data['pct_change'] = data['value'].pct_change() * 100 + ### Update Historical Data - # Calculate year-over-year growth - data['yoy_growth'] = data['value'].pct_change(periods=365) * 100 + Update cache memory with today's analysis: + ```bash + # Update history.json with today's results + # Include: date, sessions_analyzed, completion_rate, average_duration_minutes + # Include: experimental_strategy (if applicable), key_insights array ``` - ## Color Palettes for Trends + ### Store Discovered Strategies - Use these palettes for impactful trend visualizations: + If this was an experimental run, save the new strategy: + - Strategy name and description + - Results and effectiveness metrics + - Save to strategies.json - - **Sequential trends**: `sns.color_palette("viridis", n_colors=5)` - - **Diverging trends**: `sns.color_palette("RdYlGn", n_colors=7)` - - **Multiple series**: `sns.color_palette("husl", n_colors=8)` - - **Categorical**: `sns.color_palette("Set2", n_colors=6)` + ### Update Pattern Database - ## Annotation Best Practices + Add newly discovered patterns: + - Pattern type and frequency + - Correlation with success/failure + - Save to patterns.json - ```python - # Annotate key points - max_idx = data['value'].idxmax() - max_val = data['value'].max() - ax.annotate(f'Peak: {max_val:.2f}', - xy=(max_idx, max_val), - xytext=(10, 20), - textcoords='offset points', - arrowprops=dict(arrowstyle='->', color='red'), - fontsize=10, - fontweight='bold') - ``` + ### Maintain Cache Size + + Keep cache manageable: + - Retain last 90 days of analysis history + - Keep top 20 most effective strategies + - Maintain comprehensive pattern database - ## Styling for Awesome Charts + ## Insight Synthesis - ```python - import matplotlib.pyplot as plt - import seaborn as sns + Aggregate observations across all analyzed sessions: - # Set professional style - sns.set_style("whitegrid") - sns.set_context("notebook", font_scale=1.2) + ### Success Factors - # Custom color palette - custom_colors = ["#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A", "#98D8C8"] - sns.set_palette(custom_colors) + Identify patterns associated with successful completions: + - Common prompt characteristics + - Effective tool combinations + - Optimal context provision + - Successful error recovery strategies + - Clear task descriptions - # Figure with optimal dimensions - fig, ax = plt.subplots(figsize=(14, 8), dpi=300) + **Example Analysis**: + ``` + SUCCESS PATTERNS: + - Sessions with specific file references: 85% success rate + - Prompts including expected outcomes: 78% success rate + - Tasks under 100 lines of change: 90% success rate + ``` - # ... your plotting code ... + ### Failure Signals - # Tight layout for clean appearance - plt.tight_layout() + Identify common indicators of confusion or inefficiency: + - Vague or ambiguous prompts + - Missing context clues + - Circular reasoning patterns + - Repeated failed attempts + - Tool unavailability - # Save with high quality - plt.savefig('/tmp/gh-aw/python/charts/trend_chart.png', - dpi=300, - bbox_inches='tight', - facecolor='white', - edgecolor='none') + **Example Analysis**: ``` + FAILURE INDICATORS: + - Prompts with "just fix it": 45% success rate + - Missing file paths: 40% success rate + - Tasks requiring >5 iterations: 30% success rate + ``` + + ### Prompt Quality Indicators + + Analyze what makes prompts effective: + - Specific vs. general instructions + - Context richness + - Clear acceptance criteria + - File/code references + - Expected behavior descriptions + + **Categorize Prompts**: + - **High Quality**: Specific, contextual, clear outcomes + - **Medium Quality**: Some clarity but missing details + - **Low Quality**: Vague, ambiguous, lacking context + + ## Recommendations Format + + Based on the analysis, generate actionable recommendations: + + 1. **For Users**: How to write better task descriptions + 2. **For System**: Potential improvements to agent behavior + 3. **For Tools**: Missing capabilities or integrations - ## Tips for Trending Charts + Include: + - Prompt improvement templates + - Best practice guidelines + - Tool usage suggestions + - Context provision tips + - Error handling strategies - 1. **Start with the story**: What trend are you trying to show? - 2. **Choose the right timeframe**: Match granularity to the pattern - 3. **Smooth noise**: Use moving averages for volatile data - 4. **Show context**: Include historical baselines or benchmarks - 5. **Highlight insights**: Use annotations to draw attention - 6. **Test readability**: Ensure labels and legends are clear - 7. **Optimize colors**: Use colorblind-friendly palettes - 8. **Export high quality**: Always use DPI 300+ for presentations + ## Report Structure - ## Common Trend Patterns to Visualize + 1. **Overview**: 1-2 paragraphs summarizing key findings + 2. **Details**: Use `
Full Report` for expanded content - - **Seasonal patterns**: Monthly or quarterly cycles - - **Long-term growth**: Exponential or linear trends - - **Volatility changes**: Periods of stability vs. fluctuation - - **Correlations**: How multiple trends relate - - **Anomalies**: Outliers or unusual events - - **Forecasts**: Projected future trends with uncertainty + ## Workflow Run References - Remember: The best trending charts tell a clear story, make patterns obvious, and inspire action based on the insights revealed. + - Format run IDs as links: `[ยง12345](https://github.com/owner/repo/actions/runs/12345)` + - Include up to 3 most relevant run URLs at end under `**References:**` + - Do NOT add footer attribution (system adds automatically) ## jqschema - JSON Schema Discovery @@ -864,33 +1023,62 @@ jobs: plt.savefig('/tmp/gh-aw/python/charts/my_chart.png', dpi=300, bbox_inches='tight') ``` - ### Step 2: Upload as Asset - Use the `upload asset` tool to upload the chart file. The tool will return a GitHub raw content URL. - - ### Step 3: Include in Markdown Report - When creating your discussion or issue, include the image using markdown: - - ```markdown - ## Visualization Results - - ![Chart Description](https://raw.githubusercontent.com/owner/repo/assets/workflow-name/my_chart.png) - - The chart above shows... - ``` - - **Important**: Assets are published to an orphaned git branch and become URL-addressable after workflow completion. - - ## Cache Memory Integration - - The cache memory at `/tmp/gh-aw/cache-memory/` is available for storing reusable code: - - **Helper Functions to Cache:** - - Data loading utilities: `data_loader.py` - - Chart styling functions: `chart_utils.py` - - Common data transformations: `transforms.py` - - **Check Cache Before Creating:** - ```bash + PROMPT_EOF + - name: Substitute placeholders + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 + env: + GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt + GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} + GH_AW_GITHUB_WORKFLOW: ${{ github.workflow }} + with: + script: | + const substitutePlaceholders = require('/tmp/gh-aw/actions/substitute_placeholders.cjs'); + + // Call the substitution function + return await substitutePlaceholders({ + file: process.env.GH_AW_PROMPT, + substitutions: { + GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, + GH_AW_GITHUB_RUN_ID: process.env.GH_AW_GITHUB_RUN_ID, + GH_AW_GITHUB_WORKFLOW: process.env.GH_AW_GITHUB_WORKFLOW + } + }); + - name: Append prompt (part 2) + env: + GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt + GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} + GH_AW_GITHUB_WORKFLOW: ${{ github.workflow }} + run: | + cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" + ### Step 2: Upload as Asset + Use the `upload asset` tool to upload the chart file. The tool will return a GitHub raw content URL. + + ### Step 3: Include in Markdown Report + When creating your discussion or issue, include the image using markdown: + + ```markdown + ## Visualization Results + + ![Chart Description](https://raw.githubusercontent.com/owner/repo/assets/workflow-name/my_chart.png) + + The chart above shows... + ``` + + **Important**: Assets are published to an orphaned git branch and become URL-addressable after workflow completion. + + ## Cache Memory Integration + + The cache memory at `/tmp/gh-aw/cache-memory/` is available for storing reusable code: + + **Helper Functions to Cache:** + - Data loading utilities: `data_loader.py` + - Chart styling functions: `chart_utils.py` + - Common data transformations: `transforms.py` + + **Check Cache Before Creating:** + ```bash # Check if helper exists in cache if [ -f /tmp/gh-aw/cache-memory/data_loader.py ]; then cp /tmp/gh-aw/cache-memory/data_loader.py /tmp/gh-aw/python/ @@ -1024,137 +1212,8 @@ jobs: # Copilot Agent Session Analysis - PROMPT_EOF - - name: Substitute placeholders - uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 - env: - GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} - GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} - GH_AW_GITHUB_WORKFLOW: ${{ github.workflow }} - with: - script: | - const substitutePlaceholders = require('/tmp/gh-aw/actions/substitute_placeholders.cjs'); - - // Call the substitution function - return await substitutePlaceholders({ - file: process.env.GH_AW_PROMPT, - substitutions: { - GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, - GH_AW_GITHUB_RUN_ID: process.env.GH_AW_GITHUB_RUN_ID, - GH_AW_GITHUB_WORKFLOW: process.env.GH_AW_GITHUB_WORKFLOW - } - }); - - name: Append prompt (part 2) - env: - GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} - GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} - GH_AW_GITHUB_WORKFLOW: ${{ github.workflow }} - run: | - cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" You are an AI analytics agent specializing in analyzing Copilot agent sessions to extract insights, identify behavioral patterns, and recommend improvements. - ## ๐Ÿ“Š Trend Charts Requirement - - **IMPORTANT**: Generate exactly 2 trend charts that showcase Copilot agent session patterns over time. - - ### Chart Generation Process - - **Phase 1: Data Collection** - - Collect data for the past 30 days (or available data) from cache memory and session logs: - - 1. **Session Completion Data**: - - Count of sessions completed successfully per day - - Count of sessions failed/abandoned per day - - Completion rate percentage per day - - 2. **Session Duration Data**: - - Average session duration per day (in minutes) - - Median session duration per day - - Number of sessions with loops/retries - - **Phase 2: Data Preparation** - - 1. Create CSV files in `/tmp/gh-aw/python/data/` with the collected data: - - `session_completion.csv` - Daily completion counts and rates - - `session_duration.csv` - Daily duration statistics - - 2. Each CSV should have a date column and metric columns with appropriate headers - - **Phase 3: Chart Generation** - - Generate exactly **2 high-quality trend charts**: - - **Chart 1: Session Completion Trends** - - Multi-line chart showing: - - Successful completions (line, green) - - Failed/abandoned sessions (line, red) - - Completion rate percentage (line with secondary y-axis) - - X-axis: Date (last 30 days) - - Y-axis: Count (left), Percentage (right) - - Save as: `/tmp/gh-aw/python/charts/session_completion_trends.png` - - **Chart 2: Session Duration & Efficiency** - - Dual visualization showing: - - Average session duration (line) - - Median session duration (line) - - Sessions with loops (bar chart overlay) - - X-axis: Date (last 30 days) - - Y-axis: Duration in minutes - - Save as: `/tmp/gh-aw/python/charts/session_duration_trends.png` - - **Chart Quality Requirements**: - - DPI: 300 minimum - - Figure size: 12x7 inches for better readability - - Use seaborn styling with a professional color palette - - Include grid lines for easier reading - - Clear, large labels and legend - - Title with context (e.g., "Session Completion Rates - Last 30 Days") - - Annotations for significant changes or anomalies - - **Phase 4: Upload Charts** - - 1. Upload both charts using the `upload asset` tool - 2. Collect the returned URLs for embedding in the discussion - - **Phase 5: Embed Charts in Discussion** - - Include the charts in your analysis report with this structure: - - ```markdown - ## ๐Ÿ“ˆ Session Trends Analysis - - ### Completion Patterns - ![Session Completion Trends](URL_FROM_UPLOAD_ASSET_CHART_1) - - [Brief 2-3 sentence analysis of completion trends, highlighting improvements in success rates or concerning patterns] - - ### Duration & Efficiency - ![Session Duration Trends](URL_FROM_UPLOAD_ASSET_CHART_2) - - [Brief 2-3 sentence analysis of session duration patterns, noting efficiency improvements or areas needing attention] - ``` - - ### Python Implementation Notes - - - Use pandas for data manipulation and date handling - - Use matplotlib.pyplot and seaborn for visualization - - Set appropriate date formatters for x-axis labels - - Use `plt.xticks(rotation=45)` for readable date labels - - Apply `plt.tight_layout()` before saving - - Handle cases where data might be sparse or missing - - ### Error Handling - - If insufficient data is available (less than 7 days): - - Generate the charts with available data - - Add a note in the analysis mentioning the limited data range - - Consider using a bar chart instead of line chart for very sparse data - - --- - ## Mission Analyze approximately 50 Copilot agent sessions to identify: @@ -1170,12 +1229,13 @@ jobs: - **Repository**: __GH_AW_GITHUB_REPOSITORY__ - **Analysis Period**: Most recent ~50 agent sessions - **Cache Memory**: `/tmp/gh-aw/cache-memory/` + - **Pre-fetched Data**: Available at `/tmp/gh-aw/session-data/` ## Task Overview ### Phase 0: Setup and Prerequisites - **Pre-fetched Data Available**: This workflow includes a shared component (`copilot-session-data-fetch.md`) that fetches Copilot agent session data. The data should be available at: + **Pre-fetched Data Available**: Session data has been fetched by the `copilot-session-data-fetch` shared module: - `/tmp/gh-aw/session-data/sessions-list.json` - List of sessions with metadata - `/tmp/gh-aw/session-data/logs/` - Individual session log files @@ -1184,282 +1244,35 @@ jobs: 2. Initialize or restore cache-memory from `/tmp/gh-aw/cache-memory/` 3. Load historical analysis data if available - **Cache Memory Structure**: - ``` - /tmp/gh-aw/cache-memory/ - โ”œโ”€โ”€ session-analysis/ - โ”‚ โ”œโ”€โ”€ history.json # Historical analysis results - โ”‚ โ”œโ”€โ”€ strategies.json # Discovered analytical strategies - โ”‚ โ””โ”€โ”€ patterns.json # Known behavioral patterns - ``` - - ### Phase 1: Data Acquisition - - The session data has already been fetched in the preparation step. You should: - - 1. **Verify Downloaded Data**: - ```bash - # Check sessions list - jq '.' /tmp/gh-aw/session-data/sessions-list.json - - # Count sessions - jq 'length' /tmp/gh-aw/session-data/sessions-list.json - - # List log files - find /tmp/gh-aw/session-data/logs/ -maxdepth 1 -ls - ``` - - 2. **Extract Session Metadata**: - - Session IDs - - Creation timestamps - - Task titles and descriptions - - Current state (open, completed, failed) - - Pull request numbers (if available) - - 3. **Sample Strategy**: - - Use all available sessions (up to 50) - - If more than 50 sessions exist, they were already limited in the fetch step - - Record which sessions are being analyzed - - ### Phase 2: Session Analysis + ### Phase 1: Session Analysis For each downloaded session log in `/tmp/gh-aw/session-data/logs/`: - #### 2.1 Load Historical Context + 1. **Load Historical Context**: Check cache memory for previous analysis results, known strategies, and identified patterns (see `session-analysis-strategies` shared module) - Check cache memory for: - - Previous analysis results (`/tmp/gh-aw/cache-memory/session-analysis/history.json`) - - Known strategies (`/tmp/gh-aw/cache-memory/session-analysis/strategies.json`) - - Identified patterns (`/tmp/gh-aw/cache-memory/session-analysis/patterns.json`) + 2. **Apply Analysis Strategies**: Use the standard and experimental strategies defined in the imported `session-analysis-strategies` module - If cache files don't exist, create them with initial structure: - ```json - { - "analyses": [], - "last_updated": "YYYY-MM-DD", - "version": "1.0" - } - ``` - - #### 2.2 Apply Analysis Strategies - - **Standard Analysis Strategies** (Always Apply): - - 1. **Completion Analysis**: - - Did the session complete successfully? - - Was the task abandoned or aborted? - - Look for error messages or failure indicators - - Track completion rate + 3. **Collect Session Data**: Gather metrics for each session as defined in the shared module - 2. **Loop Detection**: - - Identify repetitive agent responses - - Detect circular reasoning or stuck patterns - - Count iteration loops without progress - - Flag sessions with excessive retries + ### Phase 2: Generate Trend Charts - 3. **Prompt Structure Analysis**: - - Analyze task description clarity - - Identify effective prompt patterns - - Cluster prompts by keywords or structure - - Correlate prompt quality with success + Follow the chart generation process defined in the `session-analysis-charts` shared module to create: + - Session completion trends chart + - Session duration & efficiency chart - 4. **Context Confusion Detection**: - - Look for signs of missing context - - Identify requests for clarification - - Track contextual misunderstandings - - Note when agent asks for more information - - 5. **Error Recovery Analysis**: - - How does the agent handle errors? - - Track error types and recovery strategies - - Measure time to recover from failures - - Identify successful vs. failed recoveries - - 6. **Tool Usage Patterns**: - - Which tools are used most frequently? - - Are tools used effectively? - - Identify missing or unavailable tools - - Track tool execution success rates - - #### 2.3 Experimental Strategies (30% of runs) - - **Determine if this is an experimental run**: - ```bash - # Generate random number between 0-100 - RANDOM_VALUE=$((RANDOM % 100)) - # If value < 30, this is an experimental run - ``` - - **Novel Analysis Methods to Try** (rotate through these): - - 1. **Semantic Clustering**: - - Group prompts by semantic similarity - - Identify common themes across sessions - - Find outlier prompts that perform differently - - Use keyword extraction and comparison - - 2. **Temporal Analysis**: - - Analyze session duration patterns - - Identify time-of-day effects - - Track performance trends over time - - Correlate timing with success rates - - 3. **Code Quality Metrics**: - - If sessions produce code, analyze quality - - Check for test coverage mentions - - Look for documentation updates - - Track code review feedback - - 4. **User Interaction Patterns**: - - Analyze back-and-forth exchanges - - Measure clarification request frequency - - Track user guidance effectiveness - - Identify optimal interaction patterns - - 5. **Cross-Session Learning**: - - Compare similar tasks across sessions - - Identify improvement over time - - Track recurring issues - - Find evolving solution strategies - - **Record Experimental Results**: - - Store strategy name and description - - Record what was measured - - Note insights discovered - - Save to cache for future reference - - #### 2.4 Data Collection - - For each session, collect: - - **Session ID**: Unique identifier - - **Timestamp**: When the session occurred - - **Task Type**: Category of task (bug fix, feature, refactor, etc.) - - **Duration**: Time from start to completion - - **Status**: Success, failure, abandoned, in-progress - - **Loop Count**: Number of repetitive cycles detected - - **Tool Usage**: List of tools used and their success - - **Error Count**: Number of errors encountered - - **Prompt Quality Score**: Assessed quality (1-10) - - **Context Issues**: Boolean flag for confusion detected - - **Notes**: Any notable observations + Upload charts and collect URLs for embedding in the report. ### Phase 3: Insight Synthesis - Aggregate observations across all analyzed sessions: - - #### 3.1 Success Factors - - Identify patterns associated with successful completions: - - Common prompt characteristics - - Effective tool combinations - - Optimal context provision - - Successful error recovery strategies - - Clear task descriptions - - **Example Analysis**: - ``` - SUCCESS PATTERNS: - - Sessions with specific file references: 85% success rate - - Prompts including expected outcomes: 78% success rate - - Tasks under 100 lines of change: 90% success rate - ``` - - #### 3.2 Failure Signals - - Identify common indicators of confusion or inefficiency: - - Vague or ambiguous prompts - - Missing context clues - - Circular reasoning patterns - - Repeated failed attempts - - Tool unavailability - - **Example Analysis**: - ``` - FAILURE INDICATORS: - - Prompts with "just fix it": 45% success rate - - Missing file paths: 40% success rate - - Tasks requiring >5 iterations: 30% success rate - ``` - - #### 3.3 Prompt Quality Indicators - - Analyze what makes prompts effective: - - Specific vs. general instructions - - Context richness - - Clear acceptance criteria - - File/code references - - Expected behavior descriptions - - **Categorize Prompts**: - - **High Quality**: Specific, contextual, clear outcomes - - **Medium Quality**: Some clarity but missing details - - **Low Quality**: Vague, ambiguous, lacking context - - #### 3.4 Recommendations - - Based on the analysis, generate actionable recommendations: - - Prompt improvement templates - - Best practice guidelines - - Tool usage suggestions - - Context provision tips - - Error handling strategies - - **Format Recommendations**: - 1. **For Users**: How to write better task descriptions - 2. **For System**: Potential improvements to agent behavior - 3. **For Tools**: Missing capabilities or integrations + Aggregate observations across all analyzed sessions using the synthesis patterns from the `session-analysis-strategies` module: + - Identify success factors + - Identify failure signals + - Analyze prompt quality indicators + - Generate actionable recommendations ### Phase 4: Cache Memory Management - #### 4.1 Update Historical Data - - Update cache memory with today's analysis: - - ```bash - mkdir -p /tmp/gh-aw/cache-memory/session-analysis/ - - # Update history.json - cat > /tmp/gh-aw/cache-memory/session-analysis/history.json << 'EOF' - { - "analyses": [ - { - "date": "YYYY-MM-DD", - "sessions_analyzed": 50, - "completion_rate": 0.72, - "average_duration_minutes": 8.5, - "experimental_strategy": "semantic_clustering", - "key_insights": ["insight 1", "insight 2"] - } - ], - "last_updated": "YYYY-MM-DD" - } - EOF - ``` - - #### 4.2 Store Discovered Strategies - - If this was an experimental run, save the new strategy: - - ```bash - # Update strategies.json - # Add strategy name, description, results, effectiveness - ``` - - #### 4.3 Update Pattern Database - - Add newly discovered patterns: - - ```bash - # Update patterns.json - # Include pattern type, frequency, correlation with success/failure - ``` - - #### 4.4 Maintain Cache Size - - Keep cache manageable: - - Retain last 90 days of analysis history - - Keep top 20 most effective strategies - - Maintain comprehensive pattern database + Update cache memory with today's analysis following the cache management patterns in the `session-analysis-strategies` shared module. ### Phase 5: Create Analysis Discussion @@ -1590,15 +1403,6 @@ jobs: 3. **[Recommendation 3]**: [Specific guidance] - Example: [Before/After example] - PROMPT_EOF - - name: Append prompt (part 3) - env: - GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} - GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} - GH_AW_GITHUB_WORKFLOW: ${{ github.workflow }} - run: | - cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" ### For System Improvements diff --git a/.github/workflows/copilot-session-insights.md b/.github/workflows/copilot-session-insights.md index 4aa69b3e58a..2283beeea96 100644 --- a/.github/workflows/copilot-session-insights.md +++ b/.github/workflows/copilot-session-insights.md @@ -48,8 +48,9 @@ tools: imports: - shared/copilot-session-data-fetch.md + - shared/session-analysis-charts.md + - shared/session-analysis-strategies.md - shared/reporting.md - - shared/trends.md timeout-minutes: 20 @@ -59,106 +60,6 @@ timeout-minutes: 20 You are an AI analytics agent specializing in analyzing Copilot agent sessions to extract insights, identify behavioral patterns, and recommend improvements. -## ๐Ÿ“Š Trend Charts Requirement - -**IMPORTANT**: Generate exactly 2 trend charts that showcase Copilot agent session patterns over time. - -### Chart Generation Process - -**Phase 1: Data Collection** - -Collect data for the past 30 days (or available data) from cache memory and session logs: - -1. **Session Completion Data**: - - Count of sessions completed successfully per day - - Count of sessions failed/abandoned per day - - Completion rate percentage per day - -2. **Session Duration Data**: - - Average session duration per day (in minutes) - - Median session duration per day - - Number of sessions with loops/retries - -**Phase 2: Data Preparation** - -1. Create CSV files in `/tmp/gh-aw/python/data/` with the collected data: - - `session_completion.csv` - Daily completion counts and rates - - `session_duration.csv` - Daily duration statistics - -2. Each CSV should have a date column and metric columns with appropriate headers - -**Phase 3: Chart Generation** - -Generate exactly **2 high-quality trend charts**: - -**Chart 1: Session Completion Trends** -- Multi-line chart showing: - - Successful completions (line, green) - - Failed/abandoned sessions (line, red) - - Completion rate percentage (line with secondary y-axis) -- X-axis: Date (last 30 days) -- Y-axis: Count (left), Percentage (right) -- Save as: `/tmp/gh-aw/python/charts/session_completion_trends.png` - -**Chart 2: Session Duration & Efficiency** -- Dual visualization showing: - - Average session duration (line) - - Median session duration (line) - - Sessions with loops (bar chart overlay) -- X-axis: Date (last 30 days) -- Y-axis: Duration in minutes -- Save as: `/tmp/gh-aw/python/charts/session_duration_trends.png` - -**Chart Quality Requirements**: -- DPI: 300 minimum -- Figure size: 12x7 inches for better readability -- Use seaborn styling with a professional color palette -- Include grid lines for easier reading -- Clear, large labels and legend -- Title with context (e.g., "Session Completion Rates - Last 30 Days") -- Annotations for significant changes or anomalies - -**Phase 4: Upload Charts** - -1. Upload both charts using the `upload asset` tool -2. Collect the returned URLs for embedding in the discussion - -**Phase 5: Embed Charts in Discussion** - -Include the charts in your analysis report with this structure: - -```markdown -## ๐Ÿ“ˆ Session Trends Analysis - -### Completion Patterns -![Session Completion Trends](URL_FROM_UPLOAD_ASSET_CHART_1) - -[Brief 2-3 sentence analysis of completion trends, highlighting improvements in success rates or concerning patterns] - -### Duration & Efficiency -![Session Duration Trends](URL_FROM_UPLOAD_ASSET_CHART_2) - -[Brief 2-3 sentence analysis of session duration patterns, noting efficiency improvements or areas needing attention] -``` - -### Python Implementation Notes - -- Use pandas for data manipulation and date handling -- Use matplotlib.pyplot and seaborn for visualization -- Set appropriate date formatters for x-axis labels -- Use `plt.xticks(rotation=45)` for readable date labels -- Apply `plt.tight_layout()` before saving -- Handle cases where data might be sparse or missing - -### Error Handling - -If insufficient data is available (less than 7 days): -- Generate the charts with available data -- Add a note in the analysis mentioning the limited data range -- Consider using a bar chart instead of line chart for very sparse data - ---- - ## Mission Analyze approximately 50 Copilot agent sessions to identify: @@ -174,12 +75,13 @@ Create a comprehensive report and publish it as a GitHub Discussion for team rev - **Repository**: ${{ github.repository }} - **Analysis Period**: Most recent ~50 agent sessions - **Cache Memory**: `/tmp/gh-aw/cache-memory/` +- **Pre-fetched Data**: Available at `/tmp/gh-aw/session-data/` ## Task Overview ### Phase 0: Setup and Prerequisites -**Pre-fetched Data Available**: This workflow includes a shared component (`copilot-session-data-fetch.md`) that fetches Copilot agent session data. The data should be available at: +**Pre-fetched Data Available**: Session data has been fetched by the `copilot-session-data-fetch` shared module: - `/tmp/gh-aw/session-data/sessions-list.json` - List of sessions with metadata - `/tmp/gh-aw/session-data/logs/` - Individual session log files @@ -188,282 +90,35 @@ Create a comprehensive report and publish it as a GitHub Discussion for team rev 2. Initialize or restore cache-memory from `/tmp/gh-aw/cache-memory/` 3. Load historical analysis data if available -**Cache Memory Structure**: -``` -/tmp/gh-aw/cache-memory/ -โ”œโ”€โ”€ session-analysis/ -โ”‚ โ”œโ”€โ”€ history.json # Historical analysis results -โ”‚ โ”œโ”€โ”€ strategies.json # Discovered analytical strategies -โ”‚ โ””โ”€โ”€ patterns.json # Known behavioral patterns -``` - -### Phase 1: Data Acquisition - -The session data has already been fetched in the preparation step. You should: - -1. **Verify Downloaded Data**: - ```bash - # Check sessions list - jq '.' /tmp/gh-aw/session-data/sessions-list.json - - # Count sessions - jq 'length' /tmp/gh-aw/session-data/sessions-list.json - - # List log files - find /tmp/gh-aw/session-data/logs/ -maxdepth 1 -ls - ``` - -2. **Extract Session Metadata**: - - Session IDs - - Creation timestamps - - Task titles and descriptions - - Current state (open, completed, failed) - - Pull request numbers (if available) - -3. **Sample Strategy**: - - Use all available sessions (up to 50) - - If more than 50 sessions exist, they were already limited in the fetch step - - Record which sessions are being analyzed - -### Phase 2: Session Analysis +### Phase 1: Session Analysis For each downloaded session log in `/tmp/gh-aw/session-data/logs/`: -#### 2.1 Load Historical Context - -Check cache memory for: -- Previous analysis results (`/tmp/gh-aw/cache-memory/session-analysis/history.json`) -- Known strategies (`/tmp/gh-aw/cache-memory/session-analysis/strategies.json`) -- Identified patterns (`/tmp/gh-aw/cache-memory/session-analysis/patterns.json`) - -If cache files don't exist, create them with initial structure: -```json -{ - "analyses": [], - "last_updated": "YYYY-MM-DD", - "version": "1.0" -} -``` - -#### 2.2 Apply Analysis Strategies - -**Standard Analysis Strategies** (Always Apply): - -1. **Completion Analysis**: - - Did the session complete successfully? - - Was the task abandoned or aborted? - - Look for error messages or failure indicators - - Track completion rate - -2. **Loop Detection**: - - Identify repetitive agent responses - - Detect circular reasoning or stuck patterns - - Count iteration loops without progress - - Flag sessions with excessive retries - -3. **Prompt Structure Analysis**: - - Analyze task description clarity - - Identify effective prompt patterns - - Cluster prompts by keywords or structure - - Correlate prompt quality with success - -4. **Context Confusion Detection**: - - Look for signs of missing context - - Identify requests for clarification - - Track contextual misunderstandings - - Note when agent asks for more information - -5. **Error Recovery Analysis**: - - How does the agent handle errors? - - Track error types and recovery strategies - - Measure time to recover from failures - - Identify successful vs. failed recoveries - -6. **Tool Usage Patterns**: - - Which tools are used most frequently? - - Are tools used effectively? - - Identify missing or unavailable tools - - Track tool execution success rates - -#### 2.3 Experimental Strategies (30% of runs) - -**Determine if this is an experimental run**: -```bash -# Generate random number between 0-100 -RANDOM_VALUE=$((RANDOM % 100)) -# If value < 30, this is an experimental run -``` - -**Novel Analysis Methods to Try** (rotate through these): - -1. **Semantic Clustering**: - - Group prompts by semantic similarity - - Identify common themes across sessions - - Find outlier prompts that perform differently - - Use keyword extraction and comparison - -2. **Temporal Analysis**: - - Analyze session duration patterns - - Identify time-of-day effects - - Track performance trends over time - - Correlate timing with success rates - -3. **Code Quality Metrics**: - - If sessions produce code, analyze quality - - Check for test coverage mentions - - Look for documentation updates - - Track code review feedback - -4. **User Interaction Patterns**: - - Analyze back-and-forth exchanges - - Measure clarification request frequency - - Track user guidance effectiveness - - Identify optimal interaction patterns - -5. **Cross-Session Learning**: - - Compare similar tasks across sessions - - Identify improvement over time - - Track recurring issues - - Find evolving solution strategies - -**Record Experimental Results**: -- Store strategy name and description -- Record what was measured -- Note insights discovered -- Save to cache for future reference - -#### 2.4 Data Collection - -For each session, collect: -- **Session ID**: Unique identifier -- **Timestamp**: When the session occurred -- **Task Type**: Category of task (bug fix, feature, refactor, etc.) -- **Duration**: Time from start to completion -- **Status**: Success, failure, abandoned, in-progress -- **Loop Count**: Number of repetitive cycles detected -- **Tool Usage**: List of tools used and their success -- **Error Count**: Number of errors encountered -- **Prompt Quality Score**: Assessed quality (1-10) -- **Context Issues**: Boolean flag for confusion detected -- **Notes**: Any notable observations +1. **Load Historical Context**: Check cache memory for previous analysis results, known strategies, and identified patterns (see `session-analysis-strategies` shared module) -### Phase 3: Insight Synthesis +2. **Apply Analysis Strategies**: Use the standard and experimental strategies defined in the imported `session-analysis-strategies` module -Aggregate observations across all analyzed sessions: +3. **Collect Session Data**: Gather metrics for each session as defined in the shared module -#### 3.1 Success Factors +### Phase 2: Generate Trend Charts -Identify patterns associated with successful completions: -- Common prompt characteristics -- Effective tool combinations -- Optimal context provision -- Successful error recovery strategies -- Clear task descriptions +Follow the chart generation process defined in the `session-analysis-charts` shared module to create: +- Session completion trends chart +- Session duration & efficiency chart -**Example Analysis**: -``` -SUCCESS PATTERNS: -- Sessions with specific file references: 85% success rate -- Prompts including expected outcomes: 78% success rate -- Tasks under 100 lines of change: 90% success rate -``` +Upload charts and collect URLs for embedding in the report. -#### 3.2 Failure Signals - -Identify common indicators of confusion or inefficiency: -- Vague or ambiguous prompts -- Missing context clues -- Circular reasoning patterns -- Repeated failed attempts -- Tool unavailability - -**Example Analysis**: -``` -FAILURE INDICATORS: -- Prompts with "just fix it": 45% success rate -- Missing file paths: 40% success rate -- Tasks requiring >5 iterations: 30% success rate -``` - -#### 3.3 Prompt Quality Indicators - -Analyze what makes prompts effective: -- Specific vs. general instructions -- Context richness -- Clear acceptance criteria -- File/code references -- Expected behavior descriptions - -**Categorize Prompts**: -- **High Quality**: Specific, contextual, clear outcomes -- **Medium Quality**: Some clarity but missing details -- **Low Quality**: Vague, ambiguous, lacking context - -#### 3.4 Recommendations - -Based on the analysis, generate actionable recommendations: -- Prompt improvement templates -- Best practice guidelines -- Tool usage suggestions -- Context provision tips -- Error handling strategies +### Phase 3: Insight Synthesis -**Format Recommendations**: -1. **For Users**: How to write better task descriptions -2. **For System**: Potential improvements to agent behavior -3. **For Tools**: Missing capabilities or integrations +Aggregate observations across all analyzed sessions using the synthesis patterns from the `session-analysis-strategies` module: +- Identify success factors +- Identify failure signals +- Analyze prompt quality indicators +- Generate actionable recommendations ### Phase 4: Cache Memory Management -#### 4.1 Update Historical Data - -Update cache memory with today's analysis: - -```bash -mkdir -p /tmp/gh-aw/cache-memory/session-analysis/ - -# Update history.json -cat > /tmp/gh-aw/cache-memory/session-analysis/history.json << 'EOF' -{ - "analyses": [ - { - "date": "YYYY-MM-DD", - "sessions_analyzed": 50, - "completion_rate": 0.72, - "average_duration_minutes": 8.5, - "experimental_strategy": "semantic_clustering", - "key_insights": ["insight 1", "insight 2"] - } - ], - "last_updated": "YYYY-MM-DD" -} -EOF -``` - -#### 4.2 Store Discovered Strategies - -If this was an experimental run, save the new strategy: - -```bash -# Update strategies.json -# Add strategy name, description, results, effectiveness -``` - -#### 4.3 Update Pattern Database - -Add newly discovered patterns: - -```bash -# Update patterns.json -# Include pattern type, frequency, correlation with success/failure -``` - -#### 4.4 Maintain Cache Size - -Keep cache manageable: -- Retain last 90 days of analysis history -- Keep top 20 most effective strategies -- Maintain comprehensive pattern database +Update cache memory with today's analysis following the cache management patterns in the `session-analysis-strategies` shared module. ### Phase 5: Create Analysis Discussion diff --git a/.github/workflows/shared/session-analysis-charts.md b/.github/workflows/shared/session-analysis-charts.md new file mode 100644 index 00000000000..334e92978ba --- /dev/null +++ b/.github/workflows/shared/session-analysis-charts.md @@ -0,0 +1,118 @@ +--- +# Session Analysis Charts +# Reusable chart generation for Copilot session analysis workflows +# +# Usage: +# imports: +# - shared/session-analysis-charts.md +# +# This import provides: +# - Python environment for chart generation +# - Instructions for creating session analysis charts +# - Best practices for session trend visualization + +imports: + - shared/python-dataviz.md +--- + +# Session Analysis Chart Generation + +You are an expert at creating session analysis trend charts that reveal insights about Copilot agent session patterns over time. + +## ๐Ÿ“Š Chart Generation Requirements + +**IMPORTANT**: Generate exactly 2 trend charts that showcase Copilot agent session patterns over time. + +### Chart Generation Process + +**Phase 1: Data Collection** + +Collect data for the past 30 days (or available data) from cache memory and session logs: + +1. **Session Completion Data**: + - Count of sessions completed successfully per day + - Count of sessions failed/abandoned per day + - Completion rate percentage per day + +2. **Session Duration Data**: + - Average session duration per day (in minutes) + - Median session duration per day + - Number of sessions with loops/retries + +**Phase 2: Data Preparation** + +1. Create CSV files in `/tmp/gh-aw/python/data/` with the collected data: + - `session_completion.csv` - Daily completion counts and rates + - `session_duration.csv` - Daily duration statistics + +2. Each CSV should have a date column and metric columns with appropriate headers + +**Phase 3: Chart Generation** + +Generate exactly **2 high-quality trend charts**: + +**Chart 1: Session Completion Trends** +- Multi-line chart showing: + - Successful completions (line, green) + - Failed/abandoned sessions (line, red) + - Completion rate percentage (line with secondary y-axis) +- X-axis: Date (last 30 days) +- Y-axis: Count (left), Percentage (right) +- Save as: `/tmp/gh-aw/python/charts/session_completion_trends.png` + +**Chart 2: Session Duration & Efficiency** +- Dual visualization showing: + - Average session duration (line) + - Median session duration (line) + - Sessions with loops (bar chart overlay) +- X-axis: Date (last 30 days) +- Y-axis: Duration in minutes +- Save as: `/tmp/gh-aw/python/charts/session_duration_trends.png` + +**Chart Quality Requirements**: +- DPI: 300 minimum +- Figure size: 12x7 inches for better readability +- Use seaborn styling with a professional color palette +- Include grid lines for easier reading +- Clear, large labels and legend +- Title with context (e.g., "Session Completion Rates - Last 30 Days") +- Annotations for significant changes or anomalies + +**Phase 4: Upload Charts** + +1. Upload both charts using the `upload asset` tool +2. Collect the returned URLs for embedding in the discussion + +**Phase 5: Embed Charts in Discussion** + +Include the charts in your analysis report with this structure: + +```markdown +## ๐Ÿ“ˆ Session Trends Analysis + +### Completion Patterns +![Session Completion Trends](URL_FROM_UPLOAD_ASSET_CHART_1) + +[Brief 2-3 sentence analysis of completion trends, highlighting improvements in success rates or concerning patterns] + +### Duration & Efficiency +![Session Duration Trends](URL_FROM_UPLOAD_ASSET_CHART_2) + +[Brief 2-3 sentence analysis of session duration patterns, noting efficiency improvements or areas needing attention] +``` + +### Python Implementation Notes + +- Use pandas for data manipulation and date handling +- Use matplotlib.pyplot and seaborn for visualization +- Set appropriate date formatters for x-axis labels +- Use `plt.xticks(rotation=45)` for readable date labels +- Apply `plt.tight_layout()` before saving +- Handle cases where data might be sparse or missing + +### Error Handling + +If insufficient data is available (less than 7 days): +- Generate the charts with available data +- Add a note in the analysis mentioning the limited data range +- Consider using a bar chart instead of line chart for very sparse data diff --git a/.github/workflows/shared/session-analysis-strategies.md b/.github/workflows/shared/session-analysis-strategies.md new file mode 100644 index 00000000000..84ad07e0771 --- /dev/null +++ b/.github/workflows/shared/session-analysis-strategies.md @@ -0,0 +1,241 @@ +--- +# Session Analysis Strategies +# Reusable analysis patterns for Copilot session analysis +# +# Usage: +# imports: +# - shared/session-analysis-strategies.md +# +# This import provides: +# - Standard and experimental analysis strategies +# - Cache memory management patterns +# - Pattern detection methodologies +--- + +# Session Analysis Strategies + +Comprehensive strategies for analyzing Copilot agent sessions to extract insights, identify patterns, and recommend improvements. + +## Standard Analysis Strategies + +These strategies should be applied to every session analysis: + +### 1. Completion Analysis +- Did the session complete successfully? +- Was the task abandoned or aborted? +- Look for error messages or failure indicators +- Track completion rate + +### 2. Loop Detection +- Identify repetitive agent responses +- Detect circular reasoning or stuck patterns +- Count iteration loops without progress +- Flag sessions with excessive retries + +### 3. Prompt Structure Analysis +- Analyze task description clarity +- Identify effective prompt patterns +- Cluster prompts by keywords or structure +- Correlate prompt quality with success + +### 4. Context Confusion Detection +- Look for signs of missing context +- Identify requests for clarification +- Track contextual misunderstandings +- Note when agent asks for more information + +### 5. Error Recovery Analysis +- How does the agent handle errors? +- Track error types and recovery strategies +- Measure time to recover from failures +- Identify successful vs. failed recoveries + +### 6. Tool Usage Patterns +- Which tools are used most frequently? +- Are tools used effectively? +- Identify missing or unavailable tools +- Track tool execution success rates + +## Experimental Strategies (30% of runs) + +**Determine if this is an experimental run**: +```bash +# Generate random number between 0-100 +RANDOM_VALUE=$((RANDOM % 100)) +# If value < 30, this is an experimental run +``` + +**Novel Analysis Methods to Try** (rotate through these): + +### 1. Semantic Clustering +- Group prompts by semantic similarity +- Identify common themes across sessions +- Find outlier prompts that perform differently +- Use keyword extraction and comparison + +### 2. Temporal Analysis +- Analyze session duration patterns +- Identify time-of-day effects +- Track performance trends over time +- Correlate timing with success rates + +### 3. Code Quality Metrics +- If sessions produce code, analyze quality +- Check for test coverage mentions +- Look for documentation updates +- Track code review feedback + +### 4. User Interaction Patterns +- Analyze back-and-forth exchanges +- Measure clarification request frequency +- Track user guidance effectiveness +- Identify optimal interaction patterns + +### 5. Cross-Session Learning +- Compare similar tasks across sessions +- Identify improvement over time +- Track recurring issues +- Find evolving solution strategies + +**Record Experimental Results**: +- Store strategy name and description +- Record what was measured +- Note insights discovered +- Save to cache for future reference + +## Data Collection per Session + +For each session, collect: +- **Session ID**: Unique identifier +- **Timestamp**: When the session occurred +- **Task Type**: Category of task (bug fix, feature, refactor, etc.) +- **Duration**: Time from start to completion +- **Status**: Success, failure, abandoned, in-progress +- **Loop Count**: Number of repetitive cycles detected +- **Tool Usage**: List of tools used and their success +- **Error Count**: Number of errors encountered +- **Prompt Quality Score**: Assessed quality (1-10) +- **Context Issues**: Boolean flag for confusion detected +- **Notes**: Any notable observations + +## Cache Memory Management + +### Cache Structure +``` +/tmp/gh-aw/cache-memory/session-analysis/ +โ”œโ”€โ”€ history.json # Historical analysis results +โ”œโ”€โ”€ strategies.json # Discovered analytical strategies +โ””โ”€โ”€ patterns.json # Known behavioral patterns +``` + +### Initialize Cache + +If cache files don't exist, create them with initial structure: +```bash +mkdir -p /tmp/gh-aw/cache-memory/session-analysis/ + +cat > /tmp/gh-aw/cache-memory/session-analysis/history.json << 'EOF' +{ + "analyses": [], + "last_updated": "YYYY-MM-DD", + "version": "1.0" +} +EOF +``` + +### Update Historical Data + +Update cache memory with today's analysis: +```bash +# Update history.json with today's results +# Include: date, sessions_analyzed, completion_rate, average_duration_minutes +# Include: experimental_strategy (if applicable), key_insights array +``` + +### Store Discovered Strategies + +If this was an experimental run, save the new strategy: +- Strategy name and description +- Results and effectiveness metrics +- Save to strategies.json + +### Update Pattern Database + +Add newly discovered patterns: +- Pattern type and frequency +- Correlation with success/failure +- Save to patterns.json + +### Maintain Cache Size + +Keep cache manageable: +- Retain last 90 days of analysis history +- Keep top 20 most effective strategies +- Maintain comprehensive pattern database + +## Insight Synthesis + +Aggregate observations across all analyzed sessions: + +### Success Factors + +Identify patterns associated with successful completions: +- Common prompt characteristics +- Effective tool combinations +- Optimal context provision +- Successful error recovery strategies +- Clear task descriptions + +**Example Analysis**: +``` +SUCCESS PATTERNS: +- Sessions with specific file references: 85% success rate +- Prompts including expected outcomes: 78% success rate +- Tasks under 100 lines of change: 90% success rate +``` + +### Failure Signals + +Identify common indicators of confusion or inefficiency: +- Vague or ambiguous prompts +- Missing context clues +- Circular reasoning patterns +- Repeated failed attempts +- Tool unavailability + +**Example Analysis**: +``` +FAILURE INDICATORS: +- Prompts with "just fix it": 45% success rate +- Missing file paths: 40% success rate +- Tasks requiring >5 iterations: 30% success rate +``` + +### Prompt Quality Indicators + +Analyze what makes prompts effective: +- Specific vs. general instructions +- Context richness +- Clear acceptance criteria +- File/code references +- Expected behavior descriptions + +**Categorize Prompts**: +- **High Quality**: Specific, contextual, clear outcomes +- **Medium Quality**: Some clarity but missing details +- **Low Quality**: Vague, ambiguous, lacking context + +## Recommendations Format + +Based on the analysis, generate actionable recommendations: + +1. **For Users**: How to write better task descriptions +2. **For System**: Potential improvements to agent behavior +3. **For Tools**: Missing capabilities or integrations + +Include: +- Prompt improvement templates +- Best practice guidelines +- Tool usage suggestions +- Context provision tips +- Error handling strategies From 4c1b279ffd95a6ca7478a89062cec273660ecc80 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 4 Jan 2026 06:09:44 +0000 Subject: [PATCH 3/5] =?UTF-8?q?Refactor=20ci-coach=20workflow=20(725?= =?UTF-8?q?=E2=86=92280=20lines)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Extract CI data analysis to shared/ci-data-analysis.md - Extract optimization strategies to shared/ci-optimization-strategies.md - Remove 445 lines of redundant content (61% reduction) - Workflow compiles successfully Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- .github/workflows/ci-coach.lock.yml | 764 ++++++++---------- .github/workflows/ci-coach.md | 552 ++----------- .github/workflows/shared/ci-data-analysis.md | 173 ++++ .../shared/ci-optimization-strategies.md | 192 +++++ 4 files changed, 739 insertions(+), 942 deletions(-) create mode 100644 .github/workflows/shared/ci-data-analysis.md create mode 100644 .github/workflows/shared/ci-optimization-strategies.md diff --git a/.github/workflows/ci-coach.lock.yml b/.github/workflows/ci-coach.lock.yml index 2a942452205..c335b7c3a55 100644 --- a/.github/workflows/ci-coach.lock.yml +++ b/.github/workflows/ci-coach.lock.yml @@ -23,8 +23,10 @@ # # Resolved workflow manifest: # Imports: -# - shared/jqschema.md +# - shared/ci-data-analysis.md +# - shared/ci-optimization-strategies.md # - shared/reporting.md +# - shared/jqschema.md name: "CI Optimization Coach" "on": @@ -117,8 +119,6 @@ jobs: package-manager-cache: false - name: Create gh-aw temp directory run: bash /tmp/gh-aw/actions/create_gh_aw_tmp_dir.sh - - name: Set up jq utilities directory - run: "mkdir -p /tmp/gh-aw\ncat > /tmp/gh-aw/jqschema.sh << 'EOF'\n#!/usr/bin/env bash\n# jqschema.sh\njq -c '\ndef walk(f):\n . as $in |\n if type == \"object\" then\n reduce keys[] as $k ({}; . + {($k): ($in[$k] | walk(f))})\n elif type == \"array\" then\n if length == 0 then [] else [.[0] | walk(f)] end\n else\n type\n end;\nwalk(.)\n'\nEOF\nchmod +x /tmp/gh-aw/jqschema.sh" - env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} name: Download CI workflow runs from last 7 days @@ -148,6 +148,8 @@ jobs: run: |- mkdir -p /tmp/gh-aw go test -v -json -count=1 -timeout=3m -tags '!integration' -run='^Test' ./... | tee /tmp/gh-aw/test-results.json + - name: Set up jq utilities directory + run: "mkdir -p /tmp/gh-aw\ncat > /tmp/gh-aw/jqschema.sh << 'EOF'\n#!/usr/bin/env bash\n# jqschema.sh\njq -c '\ndef walk(f):\n . as $in |\n if type == \"object\" then\n reduce keys[] as $k ({}; . + {($k): ($in[$k] | walk(f))})\n elif type == \"array\" then\n if length == 0 then [] else [.[0] | walk(f)] end\n else\n type\n end;\nwalk(.)\n'\nEOF\nchmod +x /tmp/gh-aw/jqschema.sh" # Cache memory file share configuration from frontmatter processed below - name: Create cache-memory directory @@ -488,145 +490,44 @@ jobs: env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt GH_AW_SAFE_OUTPUTS: ${{ env.GH_AW_SAFE_OUTPUTS }} - GH_AW_GITHUB_EVENT_AFTER: ${{ github.event.after }} - GH_AW_GITHUB_EVENT_BEFORE: ${{ github.event.before }} GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} run: | bash /tmp/gh-aw/actions/create_prompt_first.sh cat << 'PROMPT_EOF' > "$GH_AW_PROMPT" - ## jqschema - JSON Schema Discovery - - A utility script is available at `/tmp/gh-aw/jqschema.sh` to help you discover the structure of complex JSON responses. - - ### Purpose - - Generate a compact structural schema (keys + types) from JSON input. This is particularly useful when: - - Analyzing tool outputs from GitHub search (search_code, search_issues, search_repositories) - - Exploring API responses with large payloads - - Understanding the structure of unfamiliar data without verbose output - - Planning queries before fetching full data - - ### Usage - - ```bash - # Analyze a file - cat data.json | /tmp/gh-aw/jqschema.sh - - # Analyze command output - echo '{"name": "test", "count": 42, "items": [{"id": 1}]}' | /tmp/gh-aw/jqschema.sh - - # Analyze GitHub search results - gh api search/repositories?q=language:go | /tmp/gh-aw/jqschema.sh - ``` - - ### How It Works - - The script transforms JSON data by: - 1. Replacing object values with their type names ("string", "number", "boolean", "null") - 2. Reducing arrays to their first element's structure (or empty array if empty) - 3. Recursively processing nested structures - 4. Outputting compact (minified) JSON - - ### Example - - **Input:** - ```json - { - "total_count": 1000, - "items": [ - {"login": "user1", "id": 123, "verified": true}, - {"login": "user2", "id": 456, "verified": false} - ] - } - ``` - - **Output:** - ```json - {"total_count":"number","items":[{"login":"string","id":"number","verified":"boolean"}]} - ``` - - ### Best Practices - - **Use this script when:** - - You need to understand the structure of tool outputs before requesting full data - - GitHub search tools return large datasets (use `perPage: 1` and pipe through schema minifier first) - - Exploring unfamiliar APIs or data structures - - Planning data extraction strategies - - **Example workflow for GitHub search tools:** - ```bash - # Step 1: Get schema with minimal data (fetch just 1 result) - # This helps understand the structure before requesting large datasets - echo '{}' | gh api search/repositories -f q="language:go" -f per_page=1 | /tmp/gh-aw/jqschema.sh - - # Output shows the schema: - # {"incomplete_results":"boolean","items":[{...}],"total_count":"number"} - - # Step 2: Review schema to understand available fields - - # Step 3: Request full data with confidence about structure - # Now you know what fields are available and can query efficiently - ``` - - **Using with GitHub MCP tools:** - When using tools like `search_code`, `search_issues`, or `search_repositories`, pipe the output through jqschema to discover available fields: - ```bash - # Save a minimal search result to a file - gh api search/code -f q="jq in:file language:bash" -f per_page=1 > /tmp/sample.json - - # Generate schema to understand structure - cat /tmp/sample.json | /tmp/gh-aw/jqschema.sh - - # Now you know which fields exist and can use them in your analysis - ``` - - ## Report Structure - - 1. **Overview**: 1-2 paragraphs summarizing key findings - 2. **Details**: Use `
Full Report` for expanded content - - ## Workflow Run References - - - Format run IDs as links: `[ยง12345](https://github.com/owner/repo/actions/runs/12345)` - - Include up to 3 most relevant run URLs at end under `**References:**` - - Do NOT add footer attribution (system adds automatically) - - # CI Optimization Coach + # CI Data Analysis - You are the CI Optimization Coach, an expert system that analyzes CI workflow performance to identify opportunities for optimization, efficiency improvements, and cost reduction. + Pre-downloaded CI run data and artifacts are available for analysis: - ## Mission + ## Available Data - Analyze the CI workflow daily to identify concrete optimization opportunities that can make the test suite more efficient while minimizing costs. The workflow has already built the project, run linters, and run tests, so you can validate any proposed changes before creating a pull request. + 1. **CI Runs**: `/tmp/ci-runs.json` + - Last 100 workflow runs with status, timing, and metadata + + 2. **Artifacts**: `/tmp/ci-artifacts/` + - Coverage reports and benchmark results from recent successful runs + + 3. **CI Configuration**: `.github/workflows/ci.yml` + - Current CI workflow configuration + + 4. **Cache Memory**: `/tmp/cache-memory/` + - Historical analysis data from previous runs + + 5. **Test Results**: `/tmp/gh-aw/test-results.json` + - JSON output from Go unit tests with performance and timing data - ## Current Context + ## Test Case Locations - - **Repository**: __GH_AW_GITHUB_REPOSITORY__ - - **Run Number**: #__GH_AW_GITHUB_RUN_NUMBER__ - - **Target Workflow**: `.github/workflows/ci.yml` + Go test cases are located throughout the repository: + - **Command tests**: `./cmd/gh-aw/*_test.go` + - **Workflow tests**: `./pkg/workflow/*_test.go` + - **CLI tests**: `./pkg/cli/*_test.go` + - **Parser tests**: `./pkg/parser/*_test.go` + - **Campaign tests**: `./pkg/campaign/*_test.go` + - **Other package tests**: Various `./pkg/*/test.go` files - ## Data Available + ## Environment Setup - ### Pre-downloaded Data - 1. **CI Runs**: `/tmp/ci-runs.json` - Last 100 workflow runs with status, timing, and metadata - 2. **Artifacts**: `/tmp/ci-artifacts/` - Coverage reports and benchmark results from recent successful runs - 3. **CI Configuration**: `.github/workflows/ci.yml` - Current CI workflow configuration - 4. **Cache Memory**: `/tmp/cache-memory/` - Historical analysis data from previous runs - 5. **Test Results**: `/tmp/gh-aw/test-results.json` - JSON output from Go unit tests with performance and timing data - - ### Test Case Information - The Go test cases are located throughout the repository: - - **Command tests**: `./cmd/gh-aw/*_test.go` - CLI command and main entry point tests - - **Workflow tests**: `./pkg/workflow/*_test.go` - Workflow compilation, validation, and execution tests - - **CLI tests**: `./pkg/cli/*_test.go` - Command implementation tests - - **Parser tests**: `./pkg/parser/*_test.go` - Frontmatter and schema parsing tests - - **Campaign tests**: `./pkg/campaign/*_test.go` - Campaign specification tests - - **Other package tests**: Various `./pkg/*/test.go` files throughout the codebase - - The `/tmp/gh-aw/test-results.json` file contains detailed timing and performance data for each test case in JSON format, allowing you to identify slow tests, flaky tests, and optimization opportunities. - - ### Environment Setup The workflow has already completed: - โœ… **Linting**: Dev dependencies installed, linters run successfully - โœ… **Building**: Code built with `make build`, lock files compiled with `make recompile` @@ -637,39 +538,7 @@ jobs: - Validate changes immediately by running `make lint`, `make build`, or `make test-unit` - Ensure proposed optimizations don't break functionality before creating a PR - ## Analysis Framework - - ### Phase 1: Study CI Configuration (5 minutes) - - Read and understand the current CI workflow structure: - - ```bash - # Read the CI workflow configuration - cat .github/workflows/ci.yml - - # Understand the job structure - # - lint (runs first) - # - test (depends on lint) - # - integration (depends on test, matrix strategy) - # - build (depends on lint) - # - js (depends on lint) - # - bench (depends on test) - # - fuzz (depends on test) - # - security (depends on test) - # - security-scan (depends on test, matrix strategy) - # - actions-build (depends on lint) - # - logs-token-check (depends on test) - ``` - - **Key aspects to analyze:** - - Job dependencies and parallelization opportunities - - Cache usage patterns (Go cache, Node cache) - - Matrix strategy effectiveness - - Timeout configurations - - Concurrency groups - - Artifact retention policies - - ### Phase 2: Analyze Run Data (5 minutes) + ## Analyzing Run Data Parse the downloaded CI runs data: @@ -683,10 +552,6 @@ jobs: by_branch: group_by(.headBranch) | map({branch: .[0].headBranch, count: length}), by_event: group_by(.event) | map({event: .[0].event, count: length}) }' - - # Calculate average duration (if available in run details) - # Check for patterns in failures - # Identify flaky tests or jobs ``` **Metrics to extract:** @@ -696,7 +561,7 @@ jobs: - Cache hit rates from step summaries - Resource usage patterns - ### Phase 3: Review Artifacts (3 minutes) + ## Review Artifacts Examine downloaded artifacts for insights: @@ -708,7 +573,7 @@ jobs: # Check benchmark results for performance trends ``` - ### Phase 4: Load Historical Context (2 minutes) + ## Historical Context Check cache memory for previous analyses: @@ -722,33 +587,41 @@ jobs: # Compare current metrics with historical baselines ``` - ### Phase 5: Identify Optimization Opportunities (10 minutes) + # CI Optimization Analysis Strategies - Look for concrete improvements in these categories: + Comprehensive strategies for analyzing CI workflows to identify optimization opportunities. - #### 1. **Job Parallelization** - - Are there jobs that could run in parallel but currently don't? - - Can dependencies be restructured to reduce critical path? - - Example: Could some test jobs start earlier? + ## Phase 1: CI Configuration Study + + Read and understand the current CI workflow structure: - #### 2. **Cache Optimization** - - Are cache hit rates optimal? - - Could we cache more aggressively (e.g., dependencies, build artifacts)? - - Are cache keys properly scoped? - - Example: Cache npm dependencies globally vs. per-job + ```bash + # Read the CI workflow configuration + cat .github/workflows/ci.yml - #### 3. **Test Suite Restructuring** + # Understand the job structure + # - lint (runs first) + # - test (depends on lint) + # - integration (depends on test, matrix strategy) + # - build (depends on lint) + # etc. + ``` - Analyze the current test suite structure and suggest optimizations for execution time: + **Key aspects to analyze:** + - Job dependencies and parallelization opportunities + - Cache usage patterns (Go cache, Node cache) + - Matrix strategy effectiveness + - Timeout configurations + - Concurrency groups + - Artifact retention policies - **A. Test Coverage Analysis** โš ๏ธ **CRITICAL** + ## Phase 2: Test Coverage Analysis - Before analyzing test performance, ensure ALL tests are actually being executed: + ### Critical: Ensure ALL Tests are Executed **Step 1: Get complete list of all tests** ```bash # List all test functions in the repository - cd /home/runner/work/gh-aw/gh-aw go test -list='^Test' ./... 2>&1 | grep -E '^Test' > /tmp/all-tests.txt # Count total tests @@ -766,7 +639,6 @@ jobs: grep -r "//go:build integration" --include="*_test.go" . | cut -d: -f1 | sort -u > /tmp/integration-test-files.txt # Estimate number of integration tests - # (This is approximate - we'll validate coverage in next step) echo "Files with integration tests:" wc -l < /tmp/integration-test-files.txt ``` @@ -775,277 +647,290 @@ jobs: ```bash # The integration job has a matrix with specific patterns # Each matrix entry targets specific packages and test patterns - # Example: pattern: "TestCompile|TestPoutine" in ./pkg/cli # CRITICAL CHECK: Are there tests that don't match ANY pattern? # Extract all integration test patterns from ci.yml cat .github/workflows/ci.yml | grep -A 2 'pattern:' | grep 'pattern:' > /tmp/matrix-patterns.txt - # For each matrix group with empty pattern, those run ALL remaining tests in that package - # Groups with pattern="" are catch-all groups for their package - # Check for catch-all groups cat .github/workflows/ci.yml | grep -B 2 'pattern: ""' | grep 'name:' > /tmp/catchall-groups.txt - - echo "Matrix groups with catch-all patterns (pattern: ''):" - cat /tmp/catchall-groups.txt ``` **Step 4: Identify coverage gaps** ```bash - # Check if each package in the repository is covered by at least one matrix group - # List all packages with integration tests - find . -path ./vendor -prune -o -name "*_test.go" -print | grep -E "integration" | sed 's|/[^/]*$||' | sort -u > /tmp/integration-packages.txt - - # List packages covered in matrix - cat .github/workflows/ci.yml | grep 'packages:' | awk '{print $2}' | tr -d '"' | sort -u > /tmp/covered-packages.txt - - # Compare and find gaps - echo "Packages with integration tests:" - cat /tmp/integration-packages.txt - - echo "Packages covered in CI matrix:" - cat /tmp/covered-packages.txt - - # Check for packages not covered - comm -23 /tmp/integration-packages.txt /tmp/covered-packages.txt > /tmp/uncovered-packages.txt - - if [ -s /tmp/uncovered-packages.txt ]; then - echo "โš ๏ธ WARNING: Packages with tests but NOT in CI matrix:" - cat /tmp/uncovered-packages.txt - echo "These tests are NOT being executed!" - fi - ``` - - **Step 5: Validate catch-all coverage** - ```bash - # For packages that have BOTH specific patterns AND a catch-all group, verify the catch-all exists - # For packages with ONLY specific patterns, check if all tests are covered - - # Example for ./pkg/cli: - # - Has many matrix entries with specific patterns - # - Should have a catch-all entry (pattern: "") to ensure all remaining tests run - - # Check each package - for pkg in ./pkg/cli ./pkg/workflow ./pkg/parser ./cmd/gh-aw; do - echo "Checking package: $pkg" - - # Count matrix entries for this package - SPECIFIC_PATTERNS=$(cat .github/workflows/ci.yml | grep -A 1 "packages: \"$pkg\"" | grep 'pattern:' | grep -v 'pattern: ""' | wc -l) - HAS_CATCHALL=$(cat .github/workflows/ci.yml | grep -A 1 "packages: \"$pkg\"" | grep 'pattern: ""' | wc -l) - - echo " - Specific pattern groups: $SPECIFIC_PATTERNS" - echo " - Has catch-all group: $HAS_CATCHALL" - - if [ "$SPECIFIC_PATTERNS" -gt 0 ] && [ "$HAS_CATCHALL" -eq 0 ]; then - echo " โš ๏ธ WARNING: $pkg has specific patterns but NO catch-all group!" - echo " Tests not matching any specific pattern will NOT run!" - fi - done + # Check if each package with tests is covered by at least one matrix group + # Compare packages with tests vs. packages in CI matrix + # Identify any "orphaned" tests not executed by any job ``` **Required Action if Gaps Found:** - - If any tests are not covered by the CI matrix, you MUST propose adding: + If any tests are not covered by the CI matrix, propose adding: 1. **Catch-all matrix groups** for packages with specific patterns but no catch-all - - Example: Add a "CLI Other" group with `pattern: ""` for ./pkg/cli - - Example: Add a "Workflow Misc" group with `pattern: ""` for ./pkg/workflow - 2. **New matrix entries** for packages not in the matrix at all - - Add matrix entry with package path and empty pattern - Example fix for missing catch-all: + Example fix: ```yaml - name: "CLI Other" # Catch-all for tests not matched by specific patterns packages: "./pkg/cli" pattern: "" # Empty pattern runs all remaining tests ``` - **Expected Outcome:** - - โœ… All tests in repository are covered by at least one CI job - - โœ… Each package with integration tests has either: - - A single matrix entry (with or without pattern), OR - - Multiple specific pattern entries PLUS a catch-all entry (pattern: "") - - โŒ No tests should be "orphaned" (not executed by any job) - - **B. Test Splitting Analysis** - - Review the current test matrix configuration (integration tests split into groups) - - Analyze if test groups are balanced in terms of execution time - - Check if any test group consistently takes much longer than others - - Suggest rebalancing test groups to minimize the longest-running group + ## Phase 3: Test Performance Optimization - **Example Analysis:** - ```bash - # Extract test durations from downloaded run data - # Identify if certain matrix jobs are bottlenecks - cat /tmp/ci-runs.json | jq '.[] | select(.conclusion=="success") | .jobs[] | select(.name | contains("Integration")) | {name, duration}' - - # Look for imbalanced matrix groups - # If "Integration: Workflow" takes 8 minutes while others take 3 minutes, suggest splitting it - ``` + ### A. Test Splitting Analysis + - Review current test matrix configuration + - Analyze if test groups are balanced in execution time + - Suggest rebalancing to minimize longest-running group - **Restructuring Suggestions:** - - If unit tests take >5 minutes, suggest splitting by package (e.g., `./pkg/cli`, `./pkg/workflow`, `./pkg/parser`) - - If integration matrix is imbalanced, suggest redistributing tests: - - Move slow tests from overloaded groups to faster groups - - Split large test groups (like "Workflow" with no pattern filter) into more specific groups - - Example: Split "CLI Logs & Firewall" if TestLogs and TestFirewall are both slow - - **C. Test Parallelization Within Jobs** - - Check if tests are running sequentially when they could run in parallel + ### B. Test Parallelization Within Jobs + - Check if tests run sequentially when they could run in parallel - Suggest using `go test -parallel=N` to increase parallelism - - Analyze if `-count=1` (disables test caching) is necessary for all tests - - Example: Unit tests could run with `-parallel=4` to utilize multiple cores + - Analyze if `-count=1` is necessary for all tests - **D. Test Selection Optimization** + ### C. Test Selection Optimization - Suggest path-based test filtering to skip irrelevant tests - Recommend running only affected tests for non-main branch pushes - - Example configuration: - ```yaml - - name: Check for code changes - id: code-changes - run: | - if git diff --name-only __GH_AW_GITHUB_EVENT_BEFORE__..__GH_AW_GITHUB_EVENT_AFTER__ | grep -E '\.(go|js|cjs)$'; then - echo "has_code_changes=true" >> $GITHUB_OUTPUT - fi - - - name: Run tests - if: steps.code-changes.outputs.has_code_changes == 'true' - run: go test ./... - ``` - - **E. Test Timeout Optimization** - - Review current timeout settings (currently 3 minutes for tests) - - Check if timeouts are too conservative or too tight based on actual run times + + ### D. Test Timeout Optimization + - Review current timeout settings + - Check if timeouts are too conservative or too tight - Suggest adjusting per-job timeouts based on historical data - - Example: If unit tests consistently complete in 1.5 minutes, timeout could be 2 minutes instead of 3 - **F. Test Dependencies Analysis** - - Examine test job dependencies (test โ†’ integration โ†’ bench/fuzz/security) + ### E. Test Dependencies Analysis + - Examine test job dependencies - Suggest removing unnecessary dependencies to enable more parallelism - - Example: Could `integration`, `bench`, `fuzz`, and `security` all depend on `lint` instead of `test`? - - This allows integration tests to run while unit tests are still running - - Only makes sense if they don't need unit test artifacts - PROMPT_EOF - - name: Substitute placeholders - uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 - env: - GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_EVENT_AFTER: ${{ github.event.after }} - GH_AW_GITHUB_EVENT_BEFORE: ${{ github.event.before }} - GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} - GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} - with: - script: | - const substitutePlaceholders = require('/tmp/gh-aw/actions/substitute_placeholders.cjs'); - - // Call the substitution function - return await substitutePlaceholders({ - file: process.env.GH_AW_PROMPT, - substitutions: { - GH_AW_GITHUB_EVENT_AFTER: process.env.GH_AW_GITHUB_EVENT_AFTER, - GH_AW_GITHUB_EVENT_BEFORE: process.env.GH_AW_GITHUB_EVENT_BEFORE, - GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, - GH_AW_GITHUB_RUN_NUMBER: process.env.GH_AW_GITHUB_RUN_NUMBER - } - }); - - name: Append prompt (part 2) - env: - GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_EVENT_AFTER: ${{ github.event.after }} - GH_AW_GITHUB_EVENT_BEFORE: ${{ github.event.before }} - GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} - GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} - run: | - cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" - **G. Selective Test Execution** - - Suggest running expensive tests (benchmarks, fuzz tests) only on main branch or on-demand - - Recommend running security scans only on main or for security-related file changes - - Example: - ```yaml - if: github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch' - ``` + ### F. Selective Test Execution + - Suggest running expensive tests only on main branch or on-demand + - Recommend running security scans conditionally - **H. Test Caching Improvements** - - Check if test results could be cached (with appropriate cache keys) - - Suggest caching test binaries to speed up reruns - - Example: Cache compiled test binaries keyed by go.sum + source files - - **I. Matrix Strategy Optimization** + ### G. Matrix Strategy Optimization - Analyze if all integration test matrix jobs are necessary - Check if some matrix jobs could be combined or run conditionally - Suggest reducing matrix size for PR builds vs. main branch builds - - Example: Run full matrix on main, reduced matrix on PRs - **J. Test Infrastructure** - - Check if tests could benefit from faster runners (e.g., ubuntu-latest-4-core) - - Analyze if test containers could be used to improve isolation and speed - - Suggest pre-warming test environments with cached dependencies + ## Phase 4: Resource Optimization + + ### Job Parallelization + - Identify jobs that could run in parallel but currently don't + - Restructure dependencies to reduce critical path + - Example: Could some test jobs start earlier? + + ### Cache Optimization + - Analyze cache hit rates + - Suggest caching more aggressively (dependencies, build artifacts) + - Check if cache keys are properly scoped + + ### Resource Right-Sizing + - Check if timeouts are set appropriately + - Evaluate if jobs could run on faster runners + - Review concurrency groups + + ### Artifact Management + - Check if retention days are optimal + - Identify unnecessary artifacts + - Example: Coverage reports only need 7 days retention + + ### Dependency Installation + - Check for redundant dependency installations + - Suggest using dependency caching more effectively + - Example: Sharing `node_modules` between jobs + + ## Phase 5: Cost-Benefit Analysis + + For each potential optimization: + - **Impact**: How much time/cost savings? + - **Effort**: How difficult to implement? + - **Risk**: Could it break the build or miss issues? + - **Priority**: High/Medium/Low + + ## Optimization Categories + + 1. **Job Parallelization** - Reduce critical path + 2. **Cache Optimization** - Improve cache hit rates + 3. **Test Suite Restructuring** - Balance test execution + 4. **Resource Right-Sizing** - Optimize timeouts and runners + 5. **Artifact Management** - Reduce unnecessary uploads + 6. **Matrix Strategy** - Balance breadth vs. speed + 7. **Conditional Execution** - Skip unnecessary jobs + 8. **Dependency Installation** - Reduce redundant work + + ## Expected Metrics + + Track these metrics before and after optimization: + - Total CI duration (wall clock time) + - Critical path duration + - Cache hit rates + - Test execution time + - Resource utilization + - Cost per CI run + + ## Report Structure + + 1. **Overview**: 1-2 paragraphs summarizing key findings + 2. **Details**: Use `
Full Report` for expanded content + + ## Workflow Run References + + - Format run IDs as links: `[ยง12345](https://github.com/owner/repo/actions/runs/12345)` + - Include up to 3 most relevant run URLs at end under `**References:**` + - Do NOT add footer attribution (system adds automatically) + + ## jqschema - JSON Schema Discovery + + A utility script is available at `/tmp/gh-aw/jqschema.sh` to help you discover the structure of complex JSON responses. + + ### Purpose + + Generate a compact structural schema (keys + types) from JSON input. This is particularly useful when: + - Analyzing tool outputs from GitHub search (search_code, search_issues, search_repositories) + - Exploring API responses with large payloads + - Understanding the structure of unfamiliar data without verbose output + - Planning queries before fetching full data + + ### Usage - **Concrete Restructuring Example:** + ```bash + # Analyze a file + cat data.json | /tmp/gh-aw/jqschema.sh - Current structure: + # Analyze command output + echo '{"name": "test", "count": 42, "items": [{"id": 1}]}' | /tmp/gh-aw/jqschema.sh + + # Analyze GitHub search results + gh api search/repositories?q=language:go | /tmp/gh-aw/jqschema.sh ``` - lint (2 min) โ†’ test (unit, 2.5 min) โ†’ integration (6 parallel groups, longest: 8 min) - โ†’ bench (3 min) - โ†’ fuzz (2 min) - โ†’ security (2 min) + + ### How It Works + + The script transforms JSON data by: + 1. Replacing object values with their type names ("string", "number", "boolean", "null") + 2. Reducing arrays to their first element's structure (or empty array if empty) + 3. Recursively processing nested structures + 4. Outputting compact (minified) JSON + + ### Example + + **Input:** + ```json + { + "total_count": 1000, + "items": [ + {"login": "user1", "id": 123, "verified": true}, + {"login": "user2", "id": 456, "verified": false} + ] + } ``` - Optimized structure suggestion: + **Output:** + ```json + {"total_count":"number","items":[{"login":"string","id":"number","verified":"boolean"}]} ``` - lint (2 min) โ†’ test-unit-1 (./pkg/cli, 1.5 min) โ”€โ” - โ†’ test-unit-2 (./pkg/workflow, 1.5 min) โ”œโ†’ integration-fast (4 groups, 4 min) - โ†’ test-unit-3 (./pkg/parser, 1 min) โ”€โ”€โ”€โ”€โ”˜ โ†’ integration-slow (2 groups, 4 min) - โ†’ bench (main only, 3 min) - โ†’ fuzz (main only, 2 min) + + ### Best Practices + + **Use this script when:** + - You need to understand the structure of tool outputs before requesting full data + - GitHub search tools return large datasets (use `perPage: 1` and pipe through schema minifier first) + - Exploring unfamiliar APIs or data structures + - Planning data extraction strategies + + **Example workflow for GitHub search tools:** + ```bash + # Step 1: Get schema with minimal data (fetch just 1 result) + # This helps understand the structure before requesting large datasets + echo '{}' | gh api search/repositories -f q="language:go" -f per_page=1 | /tmp/gh-aw/jqschema.sh + + # Output shows the schema: + # {"incomplete_results":"boolean","items":[{...}],"total_count":"number"} + + # Step 2: Review schema to understand available fields + + # Step 3: Request full data with confidence about structure + # Now you know what fields are available and can query efficiently ``` - Benefits: Reduces critical path from 12.5 min to ~7.5 min (40% improvement) + **Using with GitHub MCP tools:** + When using tools like `search_code`, `search_issues`, or `search_repositories`, pipe the output through jqschema to discover available fields: + ```bash + # Save a minimal search result to a file + gh api search/code -f q="jq in:file language:bash" -f per_page=1 > /tmp/sample.json - #### 4. **Resource Right-Sizing** - - Are timeouts set appropriately? - - Could jobs run on faster runners? - - Are concurrency groups optimal? - - Example: Reducing timeout from 30m to 10m if jobs typically complete in 5m + # Generate schema to understand structure + cat /tmp/sample.json | /tmp/gh-aw/jqschema.sh - #### 5. **Artifact Management** - - Are retention days optimal? - - Are we uploading unnecessary artifacts? - - Example: Coverage reports only need 7 days retention + # Now you know which fields exist and can use them in your analysis + ``` - #### 6. **Matrix Strategy** - - Is the matrix well-balanced? - - Could we reduce matrix combinations? - - Are all matrix configurations necessary? - - Example: Testing on fewer Node versions + # CI Optimization Coach - #### 7. **Conditional Execution** - - Can we skip jobs based on file paths? - - Should certain jobs only run on main branch? - - Example: Only run benchmarks on main branch pushes + You are the CI Optimization Coach, an expert system that analyzes CI workflow performance to identify opportunities for optimization, efficiency improvements, and cost reduction. - #### 8. **Dependency Installation** - - Are we installing dependencies multiple times unnecessarily? - - Could we use dependency caching more effectively? - - Example: Sharing `node_modules` between jobs + ## Mission - ### Phase 6: Cost-Benefit Analysis (3 minutes) + Analyze the CI workflow daily to identify concrete optimization opportunities that can make the test suite more efficient while minimizing costs. The workflow has already built the project, run linters, and run tests, so you can validate any proposed changes before creating a pull request. + + ## Current Context + - **Repository**: __GH_AW_GITHUB_REPOSITORY__ + - **Run Number**: #__GH_AW_GITHUB_RUN_NUMBER__ + - **Target Workflow**: `.github/workflows/ci.yml` + + ## Data Available + + The `ci-data-analysis` shared module has pre-downloaded CI run data and built the project. Available data: + ## Data Available + + The `ci-data-analysis` shared module has pre-downloaded CI run data and built the project. Available data: + + 1. **CI Runs**: `/tmp/ci-runs.json` - Last 100 workflow runs + 2. **Artifacts**: `/tmp/ci-artifacts/` - Coverage reports and benchmarks + 3. **CI Configuration**: `.github/workflows/ci.yml` - Current workflow + 4. **Cache Memory**: `/tmp/cache-memory/` - Historical analysis data + 5. **Test Results**: `/tmp/gh-aw/test-results.json` - Test performance data + + The project has been **built, linted, and tested** so you can validate changes immediately. + + ## Analysis Framework + + Follow the optimization strategies defined in the `ci-optimization-strategies` shared module: + + ### Phase 1: Study CI Configuration (5 minutes) + - Understand job dependencies and parallelization opportunities + - Analyze cache usage, matrix strategy, timeouts, and concurrency + + ### Phase 2: Analyze Test Coverage (10 minutes) + **CRITICAL**: Ensure all tests are executed by the CI matrix + - Check for orphaned tests not covered by any CI job + - Verify catch-all matrix groups exist for packages with specific patterns + - Identify coverage gaps and propose fixes if needed + + ### Phase 3: Identify Optimization Opportunities (10 minutes) + Apply the optimization strategies from the shared module: + 1. **Job Parallelization** - Reduce critical path + 2. **Cache Optimization** - Improve cache hit rates + 3. **Test Suite Restructuring** - Balance test execution + 4. **Resource Right-Sizing** - Optimize timeouts and runners + 5. **Artifact Management** - Reduce unnecessary uploads + 6. **Matrix Strategy** - Balance breadth vs. speed + 7. **Conditional Execution** - Skip unnecessary jobs + 8. **Dependency Installation** - Reduce redundant work + + ### Phase 4: Cost-Benefit Analysis (3 minutes) For each potential optimization: - - **Impact**: How much time/cost savings? (estimate in minutes and/or GitHub Actions minutes) + - **Impact**: How much time/cost savings? - **Risk**: What's the risk of breaking something? - **Effort**: How hard is it to implement? - **Priority**: High/Medium/Low - **Prioritize optimizations with:** - - High impact (>10% time savings) - - Low risk - - Low to medium effort + Prioritize optimizations with high impact, low risk, and low to medium effort. - ### Phase 7: Implement and Validate Changes (if improvements found) (8 minutes) + ### Phase 5: Implement and Validate Changes (8 minutes) + + If you identify improvements worth implementing: + + ### Phase 5: Implement and Validate Changes (8 minutes) If you identify improvements worth implementing: @@ -1056,31 +941,14 @@ jobs: 2. **Validate changes immediately**: ```bash - # Validate YAML syntax and workflow logic - make lint - - # Rebuild to ensure code still builds correctly - make build - - # Run unit tests to ensure no functionality is broken - make test-unit - - # Recompile workflows if you made any changes to workflow files - make recompile + make lint && make build && make test-unit && make recompile ``` - **IMPORTANT**: Only proceed to creating a PR if all validations pass. If tests fail or build breaks, either: - - Fix the issues and re-validate - - Abandon the changes if they're too risky - - 3. **Document changes** in the PR description: - - List each optimization with expected impact - - Explain the rationale - - Note any risks or trade-offs - - Include before/after metrics if possible - - Mention that changes have been validated (linted, built, tested) - - 4. **Save analysis** to cache memory for future reference: + **IMPORTANT**: Only proceed to creating a PR if all validations pass. + + 3. **Document changes** in the PR description (see template below) + + 4. **Save analysis** to cache memory: ```bash mkdir -p /tmp/cache-memory/ci-coach cat > /tmp/cache-memory/ci-coach/last-analysis.json << EOF @@ -1092,31 +960,45 @@ jobs: EOF ``` - 5. **Create the pull request** using the `create_pull_request` tool with: - - **Title**: Clear description of the optimization focus (e.g., "Optimize CI test parallelization") - - **Body**: Comprehensive description including: - - Summary of optimizations proposed - - Expected impact (time/cost savings) - - Risk assessment - - List of changes made to `.github/workflows/ci.yml` - - Validation results (make lint, make build, make test-unit) - - Reference to this workflow run (#__GH_AW_GITHUB_RUN_NUMBER__) - - The title will automatically be prefixed with "[ci-coach] " as configured in safe-outputs + 5. **Create pull request** using the `create_pull_request` tool (title auto-prefixed with "[ci-coach]") - ### Phase 8: No Changes Path + ### Phase 6: No Changes Path If no improvements are found or changes are too risky: + 1. Save analysis to cache memory + 2. Exit gracefully - no pull request needed + 3. Log findings for future reference - 1. **Save analysis** to cache memory documenting that CI is already well-optimized - 2. **Exit gracefully** - no pull request needed - 3. **Log findings** for future reference - - ## Output Requirements - - ### Pull Request Structure (if created) + ## Pull Request Structure (if created) ```markdown ## CI Optimization Proposal + PROMPT_EOF + - name: Substitute placeholders + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 + env: + GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt + GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} + with: + script: | + const substitutePlaceholders = require('/tmp/gh-aw/actions/substitute_placeholders.cjs'); + + // Call the substitution function + return await substitutePlaceholders({ + file: process.env.GH_AW_PROMPT, + substitutions: { + GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, + GH_AW_GITHUB_RUN_NUMBER: process.env.GH_AW_GITHUB_RUN_NUMBER + } + }); + - name: Append prompt (part 2) + env: + GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt + GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} + run: | + cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" ### Summary [Brief overview of proposed changes and expected benefits] @@ -1267,8 +1149,6 @@ jobs: uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_EVENT_AFTER: ${{ github.event.after }} - GH_AW_GITHUB_EVENT_BEFORE: ${{ github.event.before }} GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} with: @@ -1279,8 +1159,6 @@ jobs: return await substitutePlaceholders({ file: process.env.GH_AW_PROMPT, substitutions: { - GH_AW_GITHUB_EVENT_AFTER: process.env.GH_AW_GITHUB_EVENT_AFTER, - GH_AW_GITHUB_EVENT_BEFORE: process.env.GH_AW_GITHUB_EVENT_BEFORE, GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, GH_AW_GITHUB_RUN_NUMBER: process.env.GH_AW_GITHUB_RUN_NUMBER } @@ -1420,8 +1298,6 @@ jobs: uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_GITHUB_EVENT_AFTER: ${{ github.event.after }} - GH_AW_GITHUB_EVENT_BEFORE: ${{ github.event.before }} GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_NUMBER: ${{ github.run_number }} with: diff --git a/.github/workflows/ci-coach.md b/.github/workflows/ci-coach.md index 5271e6b6230..dc47ff119f8 100644 --- a/.github/workflows/ci-coach.md +++ b/.github/workflows/ci-coach.md @@ -14,75 +14,14 @@ engine: copilot tools: github: toolsets: [default] - bash: ["*"] edit: - cache-memory: true -steps: - - name: Download CI workflow runs from last 7 days - env: - GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} - run: | - # Download workflow runs for the ci workflow - gh run list --repo ${{ github.repository }} --workflow=ci.yml --limit 100 --json databaseId,status,conclusion,createdAt,updatedAt,displayTitle,headBranch,event,url,workflowDatabaseId,number > /tmp/ci-runs.json - - # Create directory for artifacts - mkdir -p /tmp/ci-artifacts - - # Download artifacts from recent runs (last 5 successful runs) - echo "Downloading artifacts from recent CI runs..." - gh run list --repo ${{ github.repository }} --workflow=ci.yml --status success --limit 5 --json databaseId | jq -r '.[].databaseId' | while read -r run_id; do - echo "Processing run $run_id" - gh run download "$run_id" --repo ${{ github.repository }} --dir "/tmp/ci-artifacts/$run_id" 2>/dev/null || echo "No artifacts for run $run_id" - done - - echo "CI runs data saved to /tmp/ci-runs.json" - echo "Artifacts saved to /tmp/ci-artifacts/" - - - name: Set up Node.js - uses: actions/setup-node@v6 - with: - node-version: "24" - cache: npm - cache-dependency-path: actions/setup/js/package-lock.json - - - name: Set up Go - uses: actions/setup-go@v6 - with: - go-version-file: go.mod - cache: true - - - name: Install dev dependencies - run: make deps-dev - - - name: Run linter - run: make lint - - - name: Lint error messages - run: make lint-errors - - - name: Install npm dependencies - run: npm ci - working-directory: ./actions/setup/js - - - name: Build code - run: make build - - - name: Rebuild lock files - env: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - run: make recompile - - - name: Run unit tests - continue-on-error: true - run: | - mkdir -p /tmp/gh-aw - go test -v -json -count=1 -timeout=3m -tags '!integration' -run='^Test' ./... | tee /tmp/gh-aw/test-results.json safe-outputs: create-pull-request: title-prefix: "[ci-coach] " timeout-minutes: 30 imports: - - shared/jqschema.md + - shared/ci-data-analysis.md + - shared/ci-optimization-strategies.md - shared/reporting.md --- @@ -102,412 +41,58 @@ Analyze the CI workflow daily to identify concrete optimization opportunities th ## Data Available -### Pre-downloaded Data -1. **CI Runs**: `/tmp/ci-runs.json` - Last 100 workflow runs with status, timing, and metadata -2. **Artifacts**: `/tmp/ci-artifacts/` - Coverage reports and benchmark results from recent successful runs -3. **CI Configuration**: `.github/workflows/ci.yml` - Current CI workflow configuration -4. **Cache Memory**: `/tmp/cache-memory/` - Historical analysis data from previous runs -5. **Test Results**: `/tmp/gh-aw/test-results.json` - JSON output from Go unit tests with performance and timing data - -### Test Case Information -The Go test cases are located throughout the repository: -- **Command tests**: `./cmd/gh-aw/*_test.go` - CLI command and main entry point tests -- **Workflow tests**: `./pkg/workflow/*_test.go` - Workflow compilation, validation, and execution tests -- **CLI tests**: `./pkg/cli/*_test.go` - Command implementation tests -- **Parser tests**: `./pkg/parser/*_test.go` - Frontmatter and schema parsing tests -- **Campaign tests**: `./pkg/campaign/*_test.go` - Campaign specification tests -- **Other package tests**: Various `./pkg/*/test.go` files throughout the codebase - -The `/tmp/gh-aw/test-results.json` file contains detailed timing and performance data for each test case in JSON format, allowing you to identify slow tests, flaky tests, and optimization opportunities. - -### Environment Setup -The workflow has already completed: -- โœ… **Linting**: Dev dependencies installed, linters run successfully -- โœ… **Building**: Code built with `make build`, lock files compiled with `make recompile` -- โœ… **Testing**: Unit tests run (with performance data collected in JSON format) - -This means you can: -- Make changes to code or configuration files -- Validate changes immediately by running `make lint`, `make build`, or `make test-unit` -- Ensure proposed optimizations don't break functionality before creating a PR - -## Analysis Framework - -### Phase 1: Study CI Configuration (5 minutes) - -Read and understand the current CI workflow structure: - -```bash -# Read the CI workflow configuration -cat .github/workflows/ci.yml - -# Understand the job structure -# - lint (runs first) -# - test (depends on lint) -# - integration (depends on test, matrix strategy) -# - build (depends on lint) -# - js (depends on lint) -# - bench (depends on test) -# - fuzz (depends on test) -# - security (depends on test) -# - security-scan (depends on test, matrix strategy) -# - actions-build (depends on lint) -# - logs-token-check (depends on test) -``` - -**Key aspects to analyze:** -- Job dependencies and parallelization opportunities -- Cache usage patterns (Go cache, Node cache) -- Matrix strategy effectiveness -- Timeout configurations -- Concurrency groups -- Artifact retention policies - -### Phase 2: Analyze Run Data (5 minutes) - -Parse the downloaded CI runs data: - -```bash -# Analyze run data -cat /tmp/ci-runs.json | jq ' -{ - total_runs: length, - by_status: group_by(.status) | map({status: .[0].status, count: length}), - by_conclusion: group_by(.conclusion) | map({conclusion: .[0].conclusion, count: length}), - by_branch: group_by(.headBranch) | map({branch: .[0].headBranch, count: length}), - by_event: group_by(.event) | map({event: .[0].event, count: length}) -}' - -# Calculate average duration (if available in run details) -# Check for patterns in failures -# Identify flaky tests or jobs -``` - -**Metrics to extract:** -- Success rate per job -- Average duration per job -- Failure patterns (which jobs fail most often) -- Cache hit rates from step summaries -- Resource usage patterns - -### Phase 3: Review Artifacts (3 minutes) - -Examine downloaded artifacts for insights: - -```bash -# List downloaded artifacts -find /tmp/ci-artifacts -type f -name "*.txt" -o -name "*.html" -o -name "*.json" - -# Analyze coverage reports if available -# Check benchmark results for performance trends -``` - -### Phase 4: Load Historical Context (2 minutes) - -Check cache memory for previous analyses: - -```bash -# Read previous optimization recommendations -if [ -f /tmp/cache-memory/ci-coach/last-analysis.json ]; then - cat /tmp/cache-memory/ci-coach/last-analysis.json -fi - -# Check if previous recommendations were implemented -# Compare current metrics with historical baselines -``` - -### Phase 5: Identify Optimization Opportunities (10 minutes) - -Look for concrete improvements in these categories: - -#### 1. **Job Parallelization** -- Are there jobs that could run in parallel but currently don't? -- Can dependencies be restructured to reduce critical path? -- Example: Could some test jobs start earlier? - -#### 2. **Cache Optimization** -- Are cache hit rates optimal? -- Could we cache more aggressively (e.g., dependencies, build artifacts)? -- Are cache keys properly scoped? -- Example: Cache npm dependencies globally vs. per-job - -#### 3. **Test Suite Restructuring** - -Analyze the current test suite structure and suggest optimizations for execution time: - -**A. Test Coverage Analysis** โš ๏ธ **CRITICAL** - -Before analyzing test performance, ensure ALL tests are actually being executed: - -**Step 1: Get complete list of all tests** -```bash -# List all test functions in the repository -cd /home/runner/work/gh-aw/gh-aw -go test -list='^Test' ./... 2>&1 | grep -E '^Test' > /tmp/all-tests.txt - -# Count total tests -TOTAL_TESTS=$(wc -l < /tmp/all-tests.txt) -echo "Total tests found: $TOTAL_TESTS" -``` - -**Step 2: Analyze unit test coverage** -```bash -# Unit tests run all non-integration tests -# Verify the test job's command captures all non-integration tests -# Current: go test -v -parallel=8 -timeout=3m -tags '!integration' -run='^Test' ./... - -# Get list of integration tests (tests with integration build tag) -grep -r "//go:build integration" --include="*_test.go" . | cut -d: -f1 | sort -u > /tmp/integration-test-files.txt - -# Estimate number of integration tests -# (This is approximate - we'll validate coverage in next step) -echo "Files with integration tests:" -wc -l < /tmp/integration-test-files.txt -``` - -**Step 3: Analyze integration test matrix coverage** -```bash -# The integration job has a matrix with specific patterns -# Each matrix entry targets specific packages and test patterns -# Example: pattern: "TestCompile|TestPoutine" in ./pkg/cli - -# CRITICAL CHECK: Are there tests that don't match ANY pattern? - -# Extract all integration test patterns from ci.yml -cat .github/workflows/ci.yml | grep -A 2 'pattern:' | grep 'pattern:' > /tmp/matrix-patterns.txt - -# For each matrix group with empty pattern, those run ALL remaining tests in that package -# Groups with pattern="" are catch-all groups for their package - -# Check for catch-all groups -cat .github/workflows/ci.yml | grep -B 2 'pattern: ""' | grep 'name:' > /tmp/catchall-groups.txt - -echo "Matrix groups with catch-all patterns (pattern: ''):" -cat /tmp/catchall-groups.txt -``` - -**Step 4: Identify coverage gaps** -```bash -# Check if each package in the repository is covered by at least one matrix group -# List all packages with integration tests -find . -path ./vendor -prune -o -name "*_test.go" -print | grep -E "integration" | sed 's|/[^/]*$||' | sort -u > /tmp/integration-packages.txt - -# List packages covered in matrix -cat .github/workflows/ci.yml | grep 'packages:' | awk '{print $2}' | tr -d '"' | sort -u > /tmp/covered-packages.txt - -# Compare and find gaps -echo "Packages with integration tests:" -cat /tmp/integration-packages.txt - -echo "Packages covered in CI matrix:" -cat /tmp/covered-packages.txt - -# Check for packages not covered -comm -23 /tmp/integration-packages.txt /tmp/covered-packages.txt > /tmp/uncovered-packages.txt - -if [ -s /tmp/uncovered-packages.txt ]; then - echo "โš ๏ธ WARNING: Packages with tests but NOT in CI matrix:" - cat /tmp/uncovered-packages.txt - echo "These tests are NOT being executed!" -fi -``` - -**Step 5: Validate catch-all coverage** -```bash -# For packages that have BOTH specific patterns AND a catch-all group, verify the catch-all exists -# For packages with ONLY specific patterns, check if all tests are covered - -# Example for ./pkg/cli: -# - Has many matrix entries with specific patterns -# - Should have a catch-all entry (pattern: "") to ensure all remaining tests run - -# Check each package -for pkg in ./pkg/cli ./pkg/workflow ./pkg/parser ./cmd/gh-aw; do - echo "Checking package: $pkg" - - # Count matrix entries for this package - SPECIFIC_PATTERNS=$(cat .github/workflows/ci.yml | grep -A 1 "packages: \"$pkg\"" | grep 'pattern:' | grep -v 'pattern: ""' | wc -l) - HAS_CATCHALL=$(cat .github/workflows/ci.yml | grep -A 1 "packages: \"$pkg\"" | grep 'pattern: ""' | wc -l) - - echo " - Specific pattern groups: $SPECIFIC_PATTERNS" - echo " - Has catch-all group: $HAS_CATCHALL" - - if [ "$SPECIFIC_PATTERNS" -gt 0 ] && [ "$HAS_CATCHALL" -eq 0 ]; then - echo " โš ๏ธ WARNING: $pkg has specific patterns but NO catch-all group!" - echo " Tests not matching any specific pattern will NOT run!" - fi -done -``` - -**Required Action if Gaps Found:** - -If any tests are not covered by the CI matrix, you MUST propose adding: -1. **Catch-all matrix groups** for packages with specific patterns but no catch-all - - Example: Add a "CLI Other" group with `pattern: ""` for ./pkg/cli - - Example: Add a "Workflow Misc" group with `pattern: ""` for ./pkg/workflow - -2. **New matrix entries** for packages not in the matrix at all - - Add matrix entry with package path and empty pattern - -Example fix for missing catch-all: -```yaml -- name: "CLI Other" # Catch-all for tests not matched by specific patterns - packages: "./pkg/cli" - pattern: "" # Empty pattern runs all remaining tests -``` - -**Expected Outcome:** -- โœ… All tests in repository are covered by at least one CI job -- โœ… Each package with integration tests has either: - - A single matrix entry (with or without pattern), OR - - Multiple specific pattern entries PLUS a catch-all entry (pattern: "") -- โŒ No tests should be "orphaned" (not executed by any job) - -**B. Test Splitting Analysis** -- Review the current test matrix configuration (integration tests split into groups) -- Analyze if test groups are balanced in terms of execution time -- Check if any test group consistently takes much longer than others -- Suggest rebalancing test groups to minimize the longest-running group - -**Example Analysis:** -```bash -# Extract test durations from downloaded run data -# Identify if certain matrix jobs are bottlenecks -cat /tmp/ci-runs.json | jq '.[] | select(.conclusion=="success") | .jobs[] | select(.name | contains("Integration")) | {name, duration}' - -# Look for imbalanced matrix groups -# If "Integration: Workflow" takes 8 minutes while others take 3 minutes, suggest splitting it -``` - -**Restructuring Suggestions:** -- If unit tests take >5 minutes, suggest splitting by package (e.g., `./pkg/cli`, `./pkg/workflow`, `./pkg/parser`) -- If integration matrix is imbalanced, suggest redistributing tests: - - Move slow tests from overloaded groups to faster groups - - Split large test groups (like "Workflow" with no pattern filter) into more specific groups - - Example: Split "CLI Logs & Firewall" if TestLogs and TestFirewall are both slow - -**C. Test Parallelization Within Jobs** -- Check if tests are running sequentially when they could run in parallel -- Suggest using `go test -parallel=N` to increase parallelism -- Analyze if `-count=1` (disables test caching) is necessary for all tests -- Example: Unit tests could run with `-parallel=4` to utilize multiple cores - -**D. Test Selection Optimization** -- Suggest path-based test filtering to skip irrelevant tests -- Recommend running only affected tests for non-main branch pushes -- Example configuration: - ```yaml - - name: Check for code changes - id: code-changes - run: | - if git diff --name-only ${{ github.event.before }}..${{ github.event.after }} | grep -E '\.(go|js|cjs)$'; then - echo "has_code_changes=true" >> $GITHUB_OUTPUT - fi - - - name: Run tests - if: steps.code-changes.outputs.has_code_changes == 'true' - run: go test ./... - ``` - -**E. Test Timeout Optimization** -- Review current timeout settings (currently 3 minutes for tests) -- Check if timeouts are too conservative or too tight based on actual run times -- Suggest adjusting per-job timeouts based on historical data -- Example: If unit tests consistently complete in 1.5 minutes, timeout could be 2 minutes instead of 3 - -**F. Test Dependencies Analysis** -- Examine test job dependencies (test โ†’ integration โ†’ bench/fuzz/security) -- Suggest removing unnecessary dependencies to enable more parallelism -- Example: Could `integration`, `bench`, `fuzz`, and `security` all depend on `lint` instead of `test`? - - This allows integration tests to run while unit tests are still running - - Only makes sense if they don't need unit test artifacts - -**G. Selective Test Execution** -- Suggest running expensive tests (benchmarks, fuzz tests) only on main branch or on-demand -- Recommend running security scans only on main or for security-related file changes -- Example: - ```yaml - if: github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch' - ``` - -**H. Test Caching Improvements** -- Check if test results could be cached (with appropriate cache keys) -- Suggest caching test binaries to speed up reruns -- Example: Cache compiled test binaries keyed by go.sum + source files - -**I. Matrix Strategy Optimization** -- Analyze if all integration test matrix jobs are necessary -- Check if some matrix jobs could be combined or run conditionally -- Suggest reducing matrix size for PR builds vs. main branch builds -- Example: Run full matrix on main, reduced matrix on PRs - -**J. Test Infrastructure** -- Check if tests could benefit from faster runners (e.g., ubuntu-latest-4-core) -- Analyze if test containers could be used to improve isolation and speed -- Suggest pre-warming test environments with cached dependencies - -**Concrete Restructuring Example:** - -Current structure: -``` -lint (2 min) โ†’ test (unit, 2.5 min) โ†’ integration (6 parallel groups, longest: 8 min) - โ†’ bench (3 min) - โ†’ fuzz (2 min) - โ†’ security (2 min) -``` - -Optimized structure suggestion: -``` -lint (2 min) โ†’ test-unit-1 (./pkg/cli, 1.5 min) โ”€โ” - โ†’ test-unit-2 (./pkg/workflow, 1.5 min) โ”œโ†’ integration-fast (4 groups, 4 min) - โ†’ test-unit-3 (./pkg/parser, 1 min) โ”€โ”€โ”€โ”€โ”˜ โ†’ integration-slow (2 groups, 4 min) - โ†’ bench (main only, 3 min) - โ†’ fuzz (main only, 2 min) -``` - -Benefits: Reduces critical path from 12.5 min to ~7.5 min (40% improvement) - -#### 4. **Resource Right-Sizing** -- Are timeouts set appropriately? -- Could jobs run on faster runners? -- Are concurrency groups optimal? -- Example: Reducing timeout from 30m to 10m if jobs typically complete in 5m +The `ci-data-analysis` shared module has pre-downloaded CI run data and built the project. Available data: +## Data Available -#### 5. **Artifact Management** -- Are retention days optimal? -- Are we uploading unnecessary artifacts? -- Example: Coverage reports only need 7 days retention +The `ci-data-analysis` shared module has pre-downloaded CI run data and built the project. Available data: -#### 6. **Matrix Strategy** -- Is the matrix well-balanced? -- Could we reduce matrix combinations? -- Are all matrix configurations necessary? -- Example: Testing on fewer Node versions +1. **CI Runs**: `/tmp/ci-runs.json` - Last 100 workflow runs +2. **Artifacts**: `/tmp/ci-artifacts/` - Coverage reports and benchmarks +3. **CI Configuration**: `.github/workflows/ci.yml` - Current workflow +4. **Cache Memory**: `/tmp/cache-memory/` - Historical analysis data +5. **Test Results**: `/tmp/gh-aw/test-results.json` - Test performance data -#### 7. **Conditional Execution** -- Can we skip jobs based on file paths? -- Should certain jobs only run on main branch? -- Example: Only run benchmarks on main branch pushes +The project has been **built, linted, and tested** so you can validate changes immediately. -#### 8. **Dependency Installation** -- Are we installing dependencies multiple times unnecessarily? -- Could we use dependency caching more effectively? -- Example: Sharing `node_modules` between jobs +## Analysis Framework -### Phase 6: Cost-Benefit Analysis (3 minutes) +Follow the optimization strategies defined in the `ci-optimization-strategies` shared module: +### Phase 1: Study CI Configuration (5 minutes) +- Understand job dependencies and parallelization opportunities +- Analyze cache usage, matrix strategy, timeouts, and concurrency + +### Phase 2: Analyze Test Coverage (10 minutes) +**CRITICAL**: Ensure all tests are executed by the CI matrix +- Check for orphaned tests not covered by any CI job +- Verify catch-all matrix groups exist for packages with specific patterns +- Identify coverage gaps and propose fixes if needed + +### Phase 3: Identify Optimization Opportunities (10 minutes) +Apply the optimization strategies from the shared module: +1. **Job Parallelization** - Reduce critical path +2. **Cache Optimization** - Improve cache hit rates +3. **Test Suite Restructuring** - Balance test execution +4. **Resource Right-Sizing** - Optimize timeouts and runners +5. **Artifact Management** - Reduce unnecessary uploads +6. **Matrix Strategy** - Balance breadth vs. speed +7. **Conditional Execution** - Skip unnecessary jobs +8. **Dependency Installation** - Reduce redundant work + +### Phase 4: Cost-Benefit Analysis (3 minutes) For each potential optimization: -- **Impact**: How much time/cost savings? (estimate in minutes and/or GitHub Actions minutes) +- **Impact**: How much time/cost savings? - **Risk**: What's the risk of breaking something? - **Effort**: How hard is it to implement? - **Priority**: High/Medium/Low -**Prioritize optimizations with:** -- High impact (>10% time savings) -- Low risk -- Low to medium effort +Prioritize optimizations with high impact, low risk, and low to medium effort. -### Phase 7: Implement and Validate Changes (if improvements found) (8 minutes) +### Phase 5: Implement and Validate Changes (8 minutes) + +If you identify improvements worth implementing: + +### Phase 5: Implement and Validate Changes (8 minutes) If you identify improvements worth implementing: @@ -518,31 +103,14 @@ If you identify improvements worth implementing: 2. **Validate changes immediately**: ```bash - # Validate YAML syntax and workflow logic - make lint - - # Rebuild to ensure code still builds correctly - make build - - # Run unit tests to ensure no functionality is broken - make test-unit - - # Recompile workflows if you made any changes to workflow files - make recompile + make lint && make build && make test-unit && make recompile ``` - **IMPORTANT**: Only proceed to creating a PR if all validations pass. If tests fail or build breaks, either: - - Fix the issues and re-validate - - Abandon the changes if they're too risky - -3. **Document changes** in the PR description: - - List each optimization with expected impact - - Explain the rationale - - Note any risks or trade-offs - - Include before/after metrics if possible - - Mention that changes have been validated (linted, built, tested) - -4. **Save analysis** to cache memory for future reference: + **IMPORTANT**: Only proceed to creating a PR if all validations pass. + +3. **Document changes** in the PR description (see template below) + +4. **Save analysis** to cache memory: ```bash mkdir -p /tmp/cache-memory/ci-coach cat > /tmp/cache-memory/ci-coach/last-analysis.json << EOF @@ -554,28 +122,16 @@ If you identify improvements worth implementing: EOF ``` -5. **Create the pull request** using the `create_pull_request` tool with: - - **Title**: Clear description of the optimization focus (e.g., "Optimize CI test parallelization") - - **Body**: Comprehensive description including: - - Summary of optimizations proposed - - Expected impact (time/cost savings) - - Risk assessment - - List of changes made to `.github/workflows/ci.yml` - - Validation results (make lint, make build, make test-unit) - - Reference to this workflow run (#${{ github.run_number }}) - - The title will automatically be prefixed with "[ci-coach] " as configured in safe-outputs +5. **Create pull request** using the `create_pull_request` tool (title auto-prefixed with "[ci-coach]") -### Phase 8: No Changes Path +### Phase 6: No Changes Path If no improvements are found or changes are too risky: +1. Save analysis to cache memory +2. Exit gracefully - no pull request needed +3. Log findings for future reference -1. **Save analysis** to cache memory documenting that CI is already well-optimized -2. **Exit gracefully** - no pull request needed -3. **Log findings** for future reference - -## Output Requirements - -### Pull Request Structure (if created) +## Pull Request Structure (if created) ```markdown ## CI Optimization Proposal diff --git a/.github/workflows/shared/ci-data-analysis.md b/.github/workflows/shared/ci-data-analysis.md new file mode 100644 index 00000000000..0e7a0a0d6bb --- /dev/null +++ b/.github/workflows/shared/ci-data-analysis.md @@ -0,0 +1,173 @@ +--- +# CI Data Analysis +# Shared module for analyzing CI run data +# +# Usage: +# imports: +# - shared/ci-data-analysis.md +# +# This import provides: +# - Pre-download CI runs and artifacts +# - Build and test the project +# - Collect performance metrics + +imports: + - shared/jqschema.md + +tools: + cache-memory: true + bash: ["*"] + +steps: + - name: Download CI workflow runs from last 7 days + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + # Download workflow runs for the ci workflow + gh run list --repo ${{ github.repository }} --workflow=ci.yml --limit 100 --json databaseId,status,conclusion,createdAt,updatedAt,displayTitle,headBranch,event,url,workflowDatabaseId,number > /tmp/ci-runs.json + + # Create directory for artifacts + mkdir -p /tmp/ci-artifacts + + # Download artifacts from recent runs (last 5 successful runs) + echo "Downloading artifacts from recent CI runs..." + gh run list --repo ${{ github.repository }} --workflow=ci.yml --status success --limit 5 --json databaseId | jq -r '.[].databaseId' | while read -r run_id; do + echo "Processing run $run_id" + gh run download "$run_id" --repo ${{ github.repository }} --dir "/tmp/ci-artifacts/$run_id" 2>/dev/null || echo "No artifacts for run $run_id" + done + + echo "CI runs data saved to /tmp/ci-runs.json" + echo "Artifacts saved to /tmp/ci-artifacts/" + + - name: Set up Node.js + uses: actions/setup-node@v6 + with: + node-version: "24" + cache: npm + cache-dependency-path: actions/setup/js/package-lock.json + + - name: Set up Go + uses: actions/setup-go@v6 + with: + go-version-file: go.mod + cache: true + + - name: Install dev dependencies + run: make deps-dev + + - name: Run linter + run: make lint + + - name: Lint error messages + run: make lint-errors + + - name: Install npm dependencies + run: npm ci + working-directory: ./actions/setup/js + + - name: Build code + run: make build + + - name: Rebuild lock files + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: make recompile + + - name: Run unit tests + continue-on-error: true + run: | + mkdir -p /tmp/gh-aw + go test -v -json -count=1 -timeout=3m -tags '!integration' -run='^Test' ./... | tee /tmp/gh-aw/test-results.json +--- + +# CI Data Analysis + +Pre-downloaded CI run data and artifacts are available for analysis: + +## Available Data + +1. **CI Runs**: `/tmp/ci-runs.json` + - Last 100 workflow runs with status, timing, and metadata + +2. **Artifacts**: `/tmp/ci-artifacts/` + - Coverage reports and benchmark results from recent successful runs + +3. **CI Configuration**: `.github/workflows/ci.yml` + - Current CI workflow configuration + +4. **Cache Memory**: `/tmp/cache-memory/` + - Historical analysis data from previous runs + +5. **Test Results**: `/tmp/gh-aw/test-results.json` + - JSON output from Go unit tests with performance and timing data + +## Test Case Locations + +Go test cases are located throughout the repository: +- **Command tests**: `./cmd/gh-aw/*_test.go` +- **Workflow tests**: `./pkg/workflow/*_test.go` +- **CLI tests**: `./pkg/cli/*_test.go` +- **Parser tests**: `./pkg/parser/*_test.go` +- **Campaign tests**: `./pkg/campaign/*_test.go` +- **Other package tests**: Various `./pkg/*/test.go` files + +## Environment Setup + +The workflow has already completed: +- โœ… **Linting**: Dev dependencies installed, linters run successfully +- โœ… **Building**: Code built with `make build`, lock files compiled with `make recompile` +- โœ… **Testing**: Unit tests run (with performance data collected in JSON format) + +This means you can: +- Make changes to code or configuration files +- Validate changes immediately by running `make lint`, `make build`, or `make test-unit` +- Ensure proposed optimizations don't break functionality before creating a PR + +## Analyzing Run Data + +Parse the downloaded CI runs data: + +```bash +# Analyze run data +cat /tmp/ci-runs.json | jq ' +{ + total_runs: length, + by_status: group_by(.status) | map({status: .[0].status, count: length}), + by_conclusion: group_by(.conclusion) | map({conclusion: .[0].conclusion, count: length}), + by_branch: group_by(.headBranch) | map({branch: .[0].headBranch, count: length}), + by_event: group_by(.event) | map({event: .[0].event, count: length}) +}' +``` + +**Metrics to extract:** +- Success rate per job +- Average duration per job +- Failure patterns (which jobs fail most often) +- Cache hit rates from step summaries +- Resource usage patterns + +## Review Artifacts + +Examine downloaded artifacts for insights: + +```bash +# List downloaded artifacts +find /tmp/ci-artifacts -type f -name "*.txt" -o -name "*.html" -o -name "*.json" + +# Analyze coverage reports if available +# Check benchmark results for performance trends +``` + +## Historical Context + +Check cache memory for previous analyses: + +```bash +# Read previous optimization recommendations +if [ -f /tmp/cache-memory/ci-coach/last-analysis.json ]; then + cat /tmp/cache-memory/ci-coach/last-analysis.json +fi + +# Check if previous recommendations were implemented +# Compare current metrics with historical baselines +``` diff --git a/.github/workflows/shared/ci-optimization-strategies.md b/.github/workflows/shared/ci-optimization-strategies.md new file mode 100644 index 00000000000..447929568b5 --- /dev/null +++ b/.github/workflows/shared/ci-optimization-strategies.md @@ -0,0 +1,192 @@ +--- +# CI Optimization Analysis Strategies +# Reusable analysis patterns for CI optimization workflows +# +# Usage: +# imports: +# - shared/ci-optimization-strategies.md +# +# This import provides: +# - Test coverage analysis patterns +# - Performance bottleneck identification +# - Matrix strategy optimization techniques +--- + +# CI Optimization Analysis Strategies + +Comprehensive strategies for analyzing CI workflows to identify optimization opportunities. + +## Phase 1: CI Configuration Study + +Read and understand the current CI workflow structure: + +```bash +# Read the CI workflow configuration +cat .github/workflows/ci.yml + +# Understand the job structure +# - lint (runs first) +# - test (depends on lint) +# - integration (depends on test, matrix strategy) +# - build (depends on lint) +# etc. +``` + +**Key aspects to analyze:** +- Job dependencies and parallelization opportunities +- Cache usage patterns (Go cache, Node cache) +- Matrix strategy effectiveness +- Timeout configurations +- Concurrency groups +- Artifact retention policies + +## Phase 2: Test Coverage Analysis + +### Critical: Ensure ALL Tests are Executed + +**Step 1: Get complete list of all tests** +```bash +# List all test functions in the repository +go test -list='^Test' ./... 2>&1 | grep -E '^Test' > /tmp/all-tests.txt + +# Count total tests +TOTAL_TESTS=$(wc -l < /tmp/all-tests.txt) +echo "Total tests found: $TOTAL_TESTS" +``` + +**Step 2: Analyze unit test coverage** +```bash +# Unit tests run all non-integration tests +# Verify the test job's command captures all non-integration tests +# Current: go test -v -parallel=8 -timeout=3m -tags '!integration' -run='^Test' ./... + +# Get list of integration tests (tests with integration build tag) +grep -r "//go:build integration" --include="*_test.go" . | cut -d: -f1 | sort -u > /tmp/integration-test-files.txt + +# Estimate number of integration tests +echo "Files with integration tests:" +wc -l < /tmp/integration-test-files.txt +``` + +**Step 3: Analyze integration test matrix coverage** +```bash +# The integration job has a matrix with specific patterns +# Each matrix entry targets specific packages and test patterns + +# CRITICAL CHECK: Are there tests that don't match ANY pattern? + +# Extract all integration test patterns from ci.yml +cat .github/workflows/ci.yml | grep -A 2 'pattern:' | grep 'pattern:' > /tmp/matrix-patterns.txt + +# Check for catch-all groups +cat .github/workflows/ci.yml | grep -B 2 'pattern: ""' | grep 'name:' > /tmp/catchall-groups.txt +``` + +**Step 4: Identify coverage gaps** +```bash +# Check if each package with tests is covered by at least one matrix group +# Compare packages with tests vs. packages in CI matrix +# Identify any "orphaned" tests not executed by any job +``` + +**Required Action if Gaps Found:** +If any tests are not covered by the CI matrix, propose adding: +1. **Catch-all matrix groups** for packages with specific patterns but no catch-all +2. **New matrix entries** for packages not in the matrix at all + +Example fix: +```yaml +- name: "CLI Other" # Catch-all for tests not matched by specific patterns + packages: "./pkg/cli" + pattern: "" # Empty pattern runs all remaining tests +``` + +## Phase 3: Test Performance Optimization + +### A. Test Splitting Analysis +- Review current test matrix configuration +- Analyze if test groups are balanced in execution time +- Suggest rebalancing to minimize longest-running group + +### B. Test Parallelization Within Jobs +- Check if tests run sequentially when they could run in parallel +- Suggest using `go test -parallel=N` to increase parallelism +- Analyze if `-count=1` is necessary for all tests + +### C. Test Selection Optimization +- Suggest path-based test filtering to skip irrelevant tests +- Recommend running only affected tests for non-main branch pushes + +### D. Test Timeout Optimization +- Review current timeout settings +- Check if timeouts are too conservative or too tight +- Suggest adjusting per-job timeouts based on historical data + +### E. Test Dependencies Analysis +- Examine test job dependencies +- Suggest removing unnecessary dependencies to enable more parallelism + +### F. Selective Test Execution +- Suggest running expensive tests only on main branch or on-demand +- Recommend running security scans conditionally + +### G. Matrix Strategy Optimization +- Analyze if all integration test matrix jobs are necessary +- Check if some matrix jobs could be combined or run conditionally +- Suggest reducing matrix size for PR builds vs. main branch builds + +## Phase 4: Resource Optimization + +### Job Parallelization +- Identify jobs that could run in parallel but currently don't +- Restructure dependencies to reduce critical path +- Example: Could some test jobs start earlier? + +### Cache Optimization +- Analyze cache hit rates +- Suggest caching more aggressively (dependencies, build artifacts) +- Check if cache keys are properly scoped + +### Resource Right-Sizing +- Check if timeouts are set appropriately +- Evaluate if jobs could run on faster runners +- Review concurrency groups + +### Artifact Management +- Check if retention days are optimal +- Identify unnecessary artifacts +- Example: Coverage reports only need 7 days retention + +### Dependency Installation +- Check for redundant dependency installations +- Suggest using dependency caching more effectively +- Example: Sharing `node_modules` between jobs + +## Phase 5: Cost-Benefit Analysis + +For each potential optimization: +- **Impact**: How much time/cost savings? +- **Effort**: How difficult to implement? +- **Risk**: Could it break the build or miss issues? +- **Priority**: High/Medium/Low + +## Optimization Categories + +1. **Job Parallelization** - Reduce critical path +2. **Cache Optimization** - Improve cache hit rates +3. **Test Suite Restructuring** - Balance test execution +4. **Resource Right-Sizing** - Optimize timeouts and runners +5. **Artifact Management** - Reduce unnecessary uploads +6. **Matrix Strategy** - Balance breadth vs. speed +7. **Conditional Execution** - Skip unnecessary jobs +8. **Dependency Installation** - Reduce redundant work + +## Expected Metrics + +Track these metrics before and after optimization: +- Total CI duration (wall clock time) +- Critical path duration +- Cache hit rates +- Test execution time +- Resource utilization +- Cost per CI run From f4abc7de9a05653d2a872f92d5460a0464e7f05c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 4 Jan 2026 06:14:45 +0000 Subject: [PATCH 4/5] Add workflow refactoring patterns documentation - Document refactoring principles and patterns - Provide examples from copilot-session-insights and ci-coach - Define shared module structure patterns - Include anti-patterns to avoid - Create refactoring checklist Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- .../workflows/shared/token-cost-analysis.md | 317 +++++++++++++++++ specs/workflow-refactoring-patterns.md | 323 ++++++++++++++++++ 2 files changed, 640 insertions(+) create mode 100644 .github/workflows/shared/token-cost-analysis.md create mode 100644 specs/workflow-refactoring-patterns.md diff --git a/.github/workflows/shared/token-cost-analysis.md b/.github/workflows/shared/token-cost-analysis.md new file mode 100644 index 00000000000..42aab875599 --- /dev/null +++ b/.github/workflows/shared/token-cost-analysis.md @@ -0,0 +1,317 @@ +--- +# Token Cost Analysis +# Shared module for analyzing token consumption and costs +# +# Usage: +# imports: +# - shared/token-cost-analysis.md +# +# This import provides: +# - Python environment for data analysis +# - Token aggregation patterns +# - Cost calculation methods +# - Historical tracking patterns + +imports: + - shared/python-dataviz.md +--- + +# Token Cost Analysis Patterns + +Patterns for processing and analyzing token consumption data from Copilot workflows. + +## Data Processing Patterns + +### Extract Per-Workflow Metrics + +Create aggregated statistics by workflow: + +```python +#!/usr/bin/env python3 +"""Process Copilot workflow logs and calculate per-workflow statistics""" +import json +import os +from collections import defaultdict + +# Load the logs +with open('/tmp/gh-aw/copilot-logs.json', 'r') as f: + runs = json.load(f) + +print(f"Processing {len(runs)} workflow runs...") + +# Aggregate by workflow +workflow_stats = defaultdict(lambda: { + 'total_tokens': 0, + 'total_cost': 0.0, + 'total_turns': 0, + 'run_count': 0, + 'total_duration_seconds': 0, + 'runs': [] +}) + +for run in runs: + workflow_name = run.get('WorkflowName', 'unknown') + tokens = run.get('TokenUsage', 0) + cost = run.get('EstimatedCost', 0.0) + turns = run.get('Turns', 0) + duration = run.get('Duration', 0) # in nanoseconds + created_at = run.get('CreatedAt', '') + + workflow_stats[workflow_name]['total_tokens'] += tokens + workflow_stats[workflow_name]['total_cost'] += cost + workflow_stats[workflow_name]['total_turns'] += turns + workflow_stats[workflow_name]['run_count'] += 1 + workflow_stats[workflow_name]['total_duration_seconds'] += duration / 1e9 + + workflow_stats[workflow_name]['runs'].append({ + 'date': created_at[:10], + 'tokens': tokens, + 'cost': cost, + 'turns': turns, + 'run_id': run.get('DatabaseID', run.get('Number', 0)) + }) + +# Calculate averages and save +output = [] +for workflow, stats in workflow_stats.items(): + count = stats['run_count'] + output.append({ + 'workflow': workflow, + 'total_tokens': stats['total_tokens'], + 'total_cost': stats['total_cost'], + 'total_turns': stats['total_turns'], + 'run_count': count, + 'avg_tokens': stats['total_tokens'] / count if count > 0 else 0, + 'avg_cost': stats['total_cost'] / count if count > 0 else 0, + 'avg_turns': stats['total_turns'] / count if count > 0 else 0, + 'avg_duration_seconds': stats['total_duration_seconds'] / count if count > 0 else 0, + 'runs': stats['runs'] + }) + +# Sort by total cost (highest first) +output.sort(key=lambda x: x['total_cost'], reverse=True) + +# Save processed data +os.makedirs('/tmp/gh-aw/python/data', exist_ok=True) +with open('/tmp/gh-aw/python/data/workflow_stats.json', 'w') as f: + json.dump(output, f, indent=2) + +print(f"โœ… Processed {len(output)} unique workflows") +``` + +### Store Historical Data + +Append today's metrics to persistent cache for trend tracking: + +```python +#!/usr/bin/env python3 +"""Store today's metrics in cache memory for historical tracking""" +import json +import os +from datetime import datetime + +# Load processed workflow stats +with open('/tmp/gh-aw/python/data/workflow_stats.json', 'r') as f: + workflow_stats = json.load(f) + +# Prepare today's summary +today = datetime.now().strftime('%Y-%m-%d') +today_summary = { + 'date': today, + 'timestamp': datetime.now().isoformat(), + 'workflows': {} +} + +# Aggregate totals +total_tokens = 0 +total_cost = 0.0 +total_runs = 0 + +for workflow in workflow_stats: + workflow_name = workflow['workflow'] + today_summary['workflows'][workflow_name] = { + 'tokens': workflow['total_tokens'], + 'cost': workflow['total_cost'], + 'runs': workflow['run_count'], + 'avg_tokens': workflow['avg_tokens'], + 'avg_cost': workflow['avg_cost'] + } + total_tokens += workflow['total_tokens'] + total_cost += workflow['total_cost'] + total_runs += workflow['run_count'] + +today_summary['totals'] = { + 'tokens': total_tokens, + 'cost': total_cost, + 'runs': total_runs +} + +# Ensure memory directory exists +memory_dir = '/tmp/gh-aw/repo-memory-default/memory/default' +os.makedirs(memory_dir, exist_ok=True) + +# Append to history (JSON Lines format) +history_file = f'{memory_dir}/history.jsonl' +with open(history_file, 'a') as f: + f.write(json.dumps(today_summary) + '\n') + +print(f"โœ… Stored metrics for {today}") +print(f"๐Ÿ“ˆ Total tokens: {total_tokens:,}") +print(f"๐Ÿ’ฐ Total cost: ${total_cost:.2f}") +print(f"๐Ÿ”„ Total runs: {total_runs}") +``` + +### Prepare Data for Visualization + +Create CSV files for trend chart generation: + +```python +#!/usr/bin/env python3 +"""Prepare CSV data for trend charts""" +import json +import os +import pandas as pd +from datetime import datetime + +# Load historical data from repo memory +memory_dir = '/tmp/gh-aw/repo-memory-default/memory/default' +history_file = f'{memory_dir}/history.jsonl' + +historical_data = [] +if os.path.exists(history_file): + with open(history_file, 'r') as f: + for line in f: + if line.strip(): + historical_data.append(json.loads(line)) + +# Load today's data if needed +if not historical_data: + with open('/tmp/gh-aw/python/data/workflow_stats.json', 'r') as f: + workflow_stats = json.load(f) + + today = datetime.now().strftime('%Y-%m-%d') + historical_data = [{ + 'date': today, + 'totals': { + 'tokens': sum(w['total_tokens'] for w in workflow_stats), + 'cost': sum(w['total_cost'] for w in workflow_stats), + 'runs': sum(w['run_count'] for w in workflow_stats) + } + }] + +# Create daily aggregates DataFrame +daily_data = [] +for entry in historical_data: + daily_data.append({ + 'date': entry['date'], + 'tokens': entry['totals']['tokens'], + 'cost': entry['totals']['cost'], + 'runs': entry['totals']['runs'] + }) + +df = pd.DataFrame(daily_data) +df.to_csv('/tmp/gh-aw/python/data/daily_trends.csv', index=False) + +print(f"โœ… Prepared trend data: {len(df)} days") +``` + +## Chart Generation Patterns + +### Token Usage Trends Chart + +```python +import matplotlib.pyplot as plt +import seaborn as sns +import pandas as pd + +# Load data +df = pd.read_csv('/tmp/gh-aw/python/data/daily_trends.csv') +df['date'] = pd.to_datetime(df['date']) + +# Create chart +fig, ax1 = plt.subplots(figsize=(12, 7), dpi=300) +sns.set_style("whitegrid") + +# Plot tokens +ax1.plot(df['date'], df['tokens'], marker='o', color='#4ECDC4', + linewidth=2.5, label='Token Usage') +ax1.set_xlabel('Date', fontsize=12, fontweight='bold') +ax1.set_ylabel('Tokens', fontsize=12, fontweight='bold') +ax1.tick_params(axis='y') + +# Create secondary axis for cost +ax2 = ax1.twinx() +ax2.plot(df['date'], df['cost'], marker='s', color='#FF6B6B', + linewidth=2.5, label='Cost (USD)') +ax2.set_ylabel('Cost (USD)', fontsize=12, fontweight='bold') + +# Add title and legend +plt.title('Copilot Token Usage and Cost Trends', fontsize=16, fontweight='bold') +lines1, labels1 = ax1.get_legend_handles_labels() +lines2, labels2 = ax2.get_legend_handles_labels() +ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left') + +# Format x-axis +plt.xticks(rotation=45) +plt.tight_layout() + +# Save +plt.savefig('/tmp/gh-aw/python/charts/token_trends.png', + dpi=300, bbox_inches='tight', facecolor='white') +``` + +### Top Workflows Bar Chart + +```python +# Load workflow stats +with open('/tmp/gh-aw/python/data/workflow_stats.json', 'r') as f: + workflows = json.load(f) + +# Get top 10 workflows by cost +top_workflows = workflows[:10] +names = [w['workflow'][:30] for w in top_workflows] # Truncate long names +costs = [w['total_cost'] for w in top_workflows] + +# Create bar chart +fig, ax = plt.subplots(figsize=(12, 8), dpi=300) +bars = ax.barh(names, costs, color=sns.color_palette("husl", len(names))) + +# Customize +ax.set_xlabel('Total Cost (USD)', fontsize=12, fontweight='bold') +ax.set_title('Top 10 Workflows by Token Cost', fontsize=16, fontweight='bold') +ax.grid(True, alpha=0.3, axis='x') + +# Add value labels +for bar in bars: + width = bar.get_width() + ax.text(width, bar.get_y() + bar.get_height()/2, + f'${width:.2f}', ha='left', va='center', fontweight='bold') + +plt.tight_layout() +plt.savefig('/tmp/gh-aw/python/charts/top_workflows.png', + dpi=300, bbox_inches='tight', facecolor='white') +``` + +## Usage Workflow + +1. **Process logs**: Run per-workflow metrics extraction +2. **Store history**: Append today's data to cache +3. **Prepare visualization data**: Create CSV files +4. **Generate charts**: Create trend and comparison charts +5. **Upload assets**: Use upload-asset tool to publish charts +6. **Create report**: Include charts in discussion/issue + +## Key Metrics + +- **Token Usage**: Total and per-workflow token consumption +- **Cost**: Estimated costs based on token usage +- **Turns**: Number of agent turns (conversation rounds) +- **Duration**: Time spent in workflow execution +- **Runs**: Number of workflow executions + +## Historical Tracking + +Store data in JSON Lines format for efficient append operations: +- **Location**: `/tmp/gh-aw/repo-memory-default/memory/default/history.jsonl` +- **Format**: One JSON object per line, one entry per day +- **Retention**: Keep full history (managed by repo-memory tool) diff --git a/specs/workflow-refactoring-patterns.md b/specs/workflow-refactoring-patterns.md new file mode 100644 index 00000000000..77126dae56e --- /dev/null +++ b/specs/workflow-refactoring-patterns.md @@ -0,0 +1,323 @@ +# Workflow Refactoring Patterns + +This document describes the patterns and practices used for refactoring large agentic workflows into smaller, maintainable modules. + +## Overview + +The workflow complexity reduction initiative addresses workflows that have grown to excessive size (600+ lines), making them difficult to maintain, debug, and test. This document captures the refactoring patterns used to modularize these workflows. + +## Refactoring Principles + +### 1. Extract Common Functionality + +Move reusable components to `.github/workflows/shared/` directory: +- **Data collection modules**: Pre-fetch and prepare data (e.g., `copilot-session-data-fetch.md`) +- **Analysis strategies**: Reusable analytical patterns (e.g., `session-analysis-strategies.md`) +- **Visualization modules**: Chart generation and data visualization (e.g., `session-analysis-charts.md`) +- **Utility modules**: Common utilities like `reporting.md`, `python-dataviz.md`, `trends.md` + +### 2. Split by Concern + +Separate workflows into distinct phases: +- **Data collection**: Fetch and prepare input data +- **Analysis**: Process and analyze data +- **Visualization**: Generate charts and visualizations +- **Reporting**: Create discussions, issues, or PRs + +### 3. Use Imports for Composition + +Compose workflows from shared modules using `imports:`: + +```yaml +imports: + - shared/copilot-session-data-fetch.md + - shared/session-analysis-charts.md + - shared/session-analysis-strategies.md + - shared/reporting.md +``` + +## Size Guidelines + +- **Target**: 400-500 lines maximum per workflow +- **Ideal**: 200-300 lines for most workflows +- **Hard limit**: 600 lines (refactor above this) + +## Refactoring Pattern Examples + +### Example 1: Session Analysis Workflow + +**Before** (748 lines): +```markdown +--- +imports: + - shared/copilot-session-data-fetch.md + - shared/reporting.md + - shared/trends.md +--- + +# Copilot Agent Session Analysis + +[... 748 lines of mixed concerns: chart generation, analysis strategies, reporting templates ...] +``` + +**After** (403 lines): + +**Main workflow** (`copilot-session-insights.md`): +```markdown +--- +imports: + - shared/copilot-session-data-fetch.md + - shared/session-analysis-charts.md + - shared/session-analysis-strategies.md + - shared/reporting.md +--- + +# Copilot Agent Session Analysis + +## Mission +[High-level mission and context] + +## Task Overview +[Reference shared modules for implementation details] +``` + +**Extracted modules**: +- `shared/session-analysis-charts.md` (117 lines): Chart generation patterns and requirements +- `shared/session-analysis-strategies.md` (201 lines): Analysis strategies and patterns + +### Example 2: CI Optimization Workflow + +**Before** (725 lines): +```markdown +--- +[Long steps section with data download, build, test setup] +--- + +# CI Optimization Coach + +[... 725 lines of mixed concerns: data collection, test coverage analysis, optimization strategies ...] +``` + +**After** (280 lines): + +**Main workflow** (`ci-coach.md`): +```markdown +--- +imports: + - shared/ci-data-analysis.md + - shared/ci-optimization-strategies.md + - shared/reporting.md +--- + +# CI Optimization Coach + +## Analysis Framework +[Reference shared modules for strategies] +``` + +**Extracted modules**: +- `shared/ci-data-analysis.md` (154 lines): Data collection, build, and test execution +- `shared/ci-optimization-strategies.md` (186 lines): Optimization analysis patterns + +## Shared Module Structure + +### Data Collection Modules + +Pattern for modules that fetch and prepare data: + +```markdown +--- +# Module name and description +# +# Usage: +# imports: +# - shared/module-name.md +# +# This import provides: +# - List of capabilities + +imports: + - shared/dependency.md # If needed + +tools: + cache-memory: true + bash: ["*"] + +steps: + - name: Fetch data + run: | + # Data collection logic +--- + +# Module Documentation + +Available data: +- Location 1: Description +- Location 2: Description + +Usage examples: +```bash +# How to use the collected data +``` +``` + +### Analysis Strategies Modules + +Pattern for modules that define analytical approaches: + +```markdown +--- +# Module name and description +# +# Usage: +# imports: +# - shared/module-name.md +--- + +# Strategy Name + +## Standard Strategies + +### Strategy 1: Name +- Description +- When to use +- Expected output + +### Strategy 2: Name +- Description +- When to use +- Expected output + +## Advanced Strategies + +[More complex or experimental strategies] +``` + +### Visualization Modules + +Pattern for modules that generate charts and visualizations: + +```markdown +--- +# Module name and description +# +# Usage: +# imports: +# - shared/module-name.md + +imports: + - shared/python-dataviz.md # For Python-based charts +--- + +# Chart Generation + +## Chart 1: Name +- Description +- Data requirements +- Output location +- Implementation pattern + +## Chart 2: Name +- Description +- Data requirements +- Output location +- Implementation pattern +``` + +## Refactoring Checklist + +When refactoring a large workflow: + +- [ ] Identify distinct concerns in the workflow +- [ ] Extract data collection steps to shared module +- [ ] Extract analysis strategies to shared module +- [ ] Extract visualization logic to shared module (if applicable) +- [ ] Update main workflow to use imports +- [ ] Verify workflow compiles successfully +- [ ] Check that line count is < 500 (ideally 200-400) +- [ ] Test workflow functionality +- [ ] Document extracted modules with clear usage examples + +## Benefits + +### Maintainability (+20 points) +- Easier to understand focused modules +- Changes to shared logic benefit all workflows +- Clear separation of concerns + +### Testability (+15 points) +- Smaller units are easier to test +- Can test shared modules independently +- Reduced cognitive load for reviewers + +### Reusability (+25 points) +- Shared modules benefit multiple workflows +- Common patterns defined once +- Easier to create new workflows + +### Debugging (+30 points) +- Easier to isolate issues +- Clear module boundaries +- Better error messages with specific module context + +## Common Patterns + +### Pattern: Data Fetch + Analysis + Visualization + +``` +Main Workflow (300 lines) +โ”œโ”€โ”€ Import: data-fetch.md (150 lines) +โ”œโ”€โ”€ Import: analysis-strategies.md (200 lines) +โ”œโ”€โ”€ Import: visualization.md (120 lines) +โ””โ”€โ”€ Import: reporting.md (15 lines) +``` + +### Pattern: Build + Analyze + Propose Changes + +``` +Main Workflow (280 lines) +โ”œโ”€โ”€ Import: build-and-test.md (180 lines) +โ”œโ”€โ”€ Import: optimization-strategies.md (190 lines) +โ””โ”€โ”€ Import: reporting.md (15 lines) +``` + +## Anti-Patterns to Avoid + +โŒ **Don't over-extract**: Keep related logic together. Not every 50-line section needs to be a separate module. + +โŒ **Don't create circular dependencies**: Shared modules should not import each other in circular ways. + +โŒ **Don't duplicate shared logic**: If two modules need the same setup, extract it to a common base module. + +โŒ **Don't make modules too generic**: Modules should be focused and purposeful, not catch-all utilities. + +## Success Metrics + +A successful refactoring achieves: +- โœ… Main workflow < 500 lines (ideally 200-400) +- โœ… No more than 3 distinct concerns per workflow +- โœ… Reusable shared modules with clear purpose +- โœ… Workflow compiles without errors +- โœ… Functionality preserved (verified by testing) + +## References + +- **Refactored Workflows**: + - `copilot-session-insights.md`: 748 โ†’ 403 lines (46% reduction) + - `ci-coach.md`: 725 โ†’ 280 lines (61% reduction) + +- **Created Shared Modules**: + - `shared/session-analysis-charts.md` + - `shared/session-analysis-strategies.md` + - `shared/ci-data-analysis.md` + - `shared/ci-optimization-strategies.md` + - `shared/token-cost-analysis.md` + +## Future Work + +Additional workflows identified for refactoring: +- `daily-copilot-token-report.md` (680 lines) +- `prompt-clustering-analysis.md` (639 lines) +- `developer-docs-consolidator.md` (623 lines) + +These workflows can follow the same patterns established in this document. From f62fa1b73902215c53133813a96ac9a5d9c2cb0f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 4 Jan 2026 06:17:16 +0000 Subject: [PATCH 5/5] Address code review feedback - Add concrete examples to anti-patterns section - Document shell requirement for RANDOM variable - Clarify YAML context in CI optimization example Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- .github/workflows/shared/ci-optimization-strategies.md | 3 ++- .github/workflows/shared/session-analysis-strategies.md | 3 ++- specs/workflow-refactoring-patterns.md | 8 ++++++++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/.github/workflows/shared/ci-optimization-strategies.md b/.github/workflows/shared/ci-optimization-strategies.md index 447929568b5..77f57501d3f 100644 --- a/.github/workflows/shared/ci-optimization-strategies.md +++ b/.github/workflows/shared/ci-optimization-strategies.md @@ -94,8 +94,9 @@ If any tests are not covered by the CI matrix, propose adding: 1. **Catch-all matrix groups** for packages with specific patterns but no catch-all 2. **New matrix entries** for packages not in the matrix at all -Example fix: +Example fix for missing catch-all (add to `.github/workflows/ci.yml`): ```yaml +# Add to the integration job's matrix.include section: - name: "CLI Other" # Catch-all for tests not matched by specific patterns packages: "./pkg/cli" pattern: "" # Empty pattern runs all remaining tests diff --git a/.github/workflows/shared/session-analysis-strategies.md b/.github/workflows/shared/session-analysis-strategies.md index 84ad07e0771..4164c111770 100644 --- a/.github/workflows/shared/session-analysis-strategies.md +++ b/.github/workflows/shared/session-analysis-strategies.md @@ -60,7 +60,8 @@ These strategies should be applied to every session analysis: **Determine if this is an experimental run**: ```bash -# Generate random number between 0-100 +# Generate random number between 0-100 using shell's RANDOM variable +# Note: Requires bash shell. On systems without bash, use: $(od -An -N1 -tu1 /dev/urandom | awk '{print $1}') RANDOM_VALUE=$((RANDOM % 100)) # If value < 30, this is an experimental run ``` diff --git a/specs/workflow-refactoring-patterns.md b/specs/workflow-refactoring-patterns.md index 77126dae56e..984b523fce0 100644 --- a/specs/workflow-refactoring-patterns.md +++ b/specs/workflow-refactoring-patterns.md @@ -284,12 +284,20 @@ Main Workflow (280 lines) ## Anti-Patterns to Avoid โŒ **Don't over-extract**: Keep related logic together. Not every 50-line section needs to be a separate module. + - **Bad example**: Extracting a 30-line section just because it's slightly different + - **Good example**: Extracting a 150-line section that's used by 3+ workflows โŒ **Don't create circular dependencies**: Shared modules should not import each other in circular ways. + - **Bad example**: Module A imports Module B, which imports Module C, which imports Module A + - **Good example**: Linear dependency chain: Main โ†’ Module A โ†’ Module B โŒ **Don't duplicate shared logic**: If two modules need the same setup, extract it to a common base module. + - **Bad example**: Both `data-analysis-a.md` and `data-analysis-b.md` have identical data fetch code + - **Good example**: Extract common data fetch to `data-fetch.md`, both modules import it โŒ **Don't make modules too generic**: Modules should be focused and purposeful, not catch-all utilities. + - **Bad example**: `shared/utilities.md` with 500 lines of unrelated functions + - **Good example**: `shared/python-dataviz.md` focused on data visualization setup ## Success Metrics