diff --git a/.github/aw/data-science.md b/.github/aw/data-science.md new file mode 100644 index 0000000000..7b30db5b2e --- /dev/null +++ b/.github/aw/data-science.md @@ -0,0 +1,125 @@ +--- +description: Guidelines for creating agentic workflows that generate charts and trend visualizations using Python scientific computing libraries with persistent historical data. +--- + +# Data Science & Chart Generation + +Use when creating a workflow that generates charts, trend visualizations, dashboards, or any Python-based metric output. + +## Frontmatter Template + +```yaml +--- +network: + allowed: + - defaults + - python +safe-outputs: + upload-asset: + create-issue: # or create-discussion + title-prefix: "๐Ÿ“Š [Report Name]:" + labels: [report] + close-older-issues: true + expires: 30 +steps: + - name: setup + run: | + mkdir -p /tmp/gh-aw/python/{data,charts} + mkdir -p /tmp/gh-aw/cache-memory/trending + pip install --user --quiet numpy pandas matplotlib seaborn scipy +--- +``` + +## Agent Prompt Structure + +Write the agent prompt as five ordered steps: + +1. **Load history** โ€” read `/tmp/gh-aw/cache-memory/trending//history.jsonl` into a DataFrame if it exists; otherwise start empty. +2. **Collect data** โ€” fetch metrics from the GitHub API (or generate with NumPy). Save to `/tmp/gh-aw/python/data/.csv` โ€” **never inline data in Python code**. +3. **Append & prune** โ€” append a JSON Lines record `{"timestamp": "", "metric": "...", "value": ...}` to `history.jsonl`; drop records older than 90 days. +4. **Chart** โ€” if โ‰ฅ 2 history points exist, generate a time-series line chart with 7-day moving average; otherwise use a bar/distribution chart. Save to `/tmp/gh-aw/python/charts/` at DPI 300. +5. **Report** โ€” upload each chart with `upload asset`, then create an issue/discussion embedding the URLs. Call `noop` if there is nothing to report. + +## Python Patterns + +### History: load โ†’ append โ†’ prune + +```python +import json, os, pandas as pd +from datetime import datetime, timedelta + +HISTORY = '/tmp/gh-aw/cache-memory/trending/issues/history.jsonl' + +# Load +df = pd.read_json(HISTORY, lines=True) if os.path.exists(HISTORY) else pd.DataFrame() +if not df.empty: + df['timestamp'] = pd.to_datetime(df['timestamp']) + df = df.sort_values('timestamp') + +# Append +with open(HISTORY, 'a') as f: + f.write(json.dumps({"timestamp": datetime.now().isoformat(), "metric": "issue_count", "value": 42}) + '\n') + +# Prune to 90 days +if not df.empty: + df = df[df['timestamp'] >= pd.Timestamp.now() - timedelta(days=90)] + df.to_json(HISTORY, orient='records', lines=True) +``` + +### Chart: trend with moving average + +```python +import matplotlib.pyplot as plt +import seaborn as sns + +sns.set_style("whitegrid"); sns.set_palette("husl") +fig, ax = plt.subplots(figsize=(12, 7), dpi=300) + +df['rolling'] = df['value'].rolling(window=7, min_periods=1).mean() +ax.plot(df['timestamp'], df['value'], label='Actual', alpha=0.5, marker='o') +ax.plot(df['timestamp'], df['rolling'], label='7-day avg', linewidth=2.5) +ax.fill_between(df['timestamp'], df['value'], df['rolling'], alpha=0.2) +ax.set_title('Metric Trend', fontsize=16, fontweight='bold') +ax.set_xlabel('Date', fontsize=12); ax.set_ylabel('Value', fontsize=12) +ax.legend(); ax.grid(True, alpha=0.3); plt.xticks(rotation=45); plt.tight_layout() +plt.savefig('/tmp/gh-aw/python/charts/trend.png', dpi=300, bbox_inches='tight', facecolor='white') +``` + +**Chart standards**: 300 DPI ยท 12ร—7 in ยท labeled axes and title ยท legend for multi-series ยท `husl` palette + +### Multiple metrics + +```python +for metric in metrics: + sub = df[df['metric'] == metric] + ax.plot(sub['timestamp'], sub['value'], marker='o', label=metric, linewidth=2) +``` + +## Report Template + +```markdown +# ๐Ÿ“Š [Title] โ€” [Date] + +## Summary +[2โ€“3 sentences on trends and key findings] + +### [Metric] Trend +![chart](https://github.com/OWNER/REPO/blob/assets/WORKFLOW/chart.png?raw=true) +[direction, moving average, notable events] + +## Data Details +- **Source**: โ€ฆ | **Points**: โ€ฆ | **Range**: โ€ฆ | **Period**: โ€ฆN days +- **Cache**: `/tmp/gh-aw/cache-memory/trending/` +``` + +Use `###` and deeper for all headers inside the report body. + +## Use Cases + +| Intent | Trigger | Notes | +|---|---|---| +| Weekly GitHub activity chart | `schedule` weekly | track issues, PRs, commits | +| Test coverage trends | `push`/`pull_request` | append per-run | +| Workflow run durations | `schedule` daily | GitHub Actions API | +| Stale repo aging distribution | `workflow_dispatch` | no cache needed | +| Contributor growth | `schedule` monthly | 365-day retention |