Skip to content

Commit 231a9f8

Browse files
docs: reduce bloat in data-ops.md by 24% (#20437)
1 parent 5af0fee commit 231a9f8

File tree

1 file changed

+8
-69
lines changed

1 file changed

+8
-69
lines changed

docs/src/content/docs/patterns/data-ops.md

Lines changed: 8 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -5,27 +5,10 @@ sidebar:
55
badge: { text: 'Hybrid', variant: 'caution' }
66
---
77

8-
DataOps combines deterministic data extraction with agentic analysis. Shell commands in `steps:` collect and prepare data, then the AI agent in the markdown body analyzes results and produces safe outputs like discussions or comments.
9-
10-
## When to Use DataOps
11-
12-
- **Data aggregation** - Collect metrics from APIs, logs, or repositories
13-
- **Report generation** - Analyze data and produce human-readable summaries
14-
- **Trend analysis** - Process historical data and identify patterns
15-
- **Auditing** - Gather evidence and generate audit reports
8+
DataOps combines deterministic data extraction with agentic analysis: shell commands in `steps:` reliably collect and prepare data (fast, cacheable, reproducible), then the AI agent reads the results and generates insights. Use this pattern for data aggregation, report generation, trend analysis, and auditing.
169

1710
## The DataOps Pattern
1811

19-
### Separation of Concerns
20-
21-
DataOps separates two distinct phases:
22-
23-
1. **Deterministic extraction** (`steps:`) - Shell commands that reliably fetch, filter, and structure data. These run before the agent and produce predictable, reproducible results.
24-
25-
2. **Agentic analysis** (markdown body) - The AI agent reads the prepared data, interprets patterns, and generates insights. The agent has access to the data files created by the steps.
26-
27-
This separation ensures data collection is fast, reliable, and cacheable, while the AI focuses on interpretation and communication.
28-
2912
### Basic Structure
3013

3114
```aw wrap
@@ -126,51 +109,11 @@ timeout-minutes: 10
126109

127110
# Weekly Pull Request Summary
128111

129-
Generate a summary of pull request activity for the repository.
130-
131-
## Available Data
132-
133-
The following data has been prepared for your analysis:
134-
112+
Analyze the prepared data:
135113
- `/tmp/gh-aw/pr-data/recent-prs.json` - Last 100 PRs with full metadata
136114
- `/tmp/gh-aw/pr-data/stats.json` - Pre-computed statistics
137115

138-
## Your Task
139-
140-
1. **Read the prepared data** from the files above
141-
2. **Analyze the statistics** to identify:
142-
- Overall activity levels
143-
- Merge rate and velocity
144-
- Most active contributors
145-
- Code churn (additions vs deletions)
146-
3. **Generate a summary report** as a GitHub discussion with:
147-
- Key metrics in a clear format
148-
- Notable trends or observations
149-
- Top contributors acknowledgment
150-
151-
## Report Format
152-
153-
Create a discussion with this structure:
154-
155-
```markdown
156-
# Weekly PR Summary - [Date Range]
157-
158-
## Key Metrics
159-
- **Total PRs**: X
160-
- **Merged**: X (Y%)
161-
- **Open**: X
162-
- **Code Changes**: +X / -Y lines across Z files
163-
164-
## Top Contributors
165-
1. @author1 - X PRs
166-
2. @author2 - Y PRs
167-
...
168-
169-
## Observations
170-
[Brief insights about activity patterns]
171-
```
172-
173-
Keep the report concise and factual. Focus on the numbers and let them tell the story.
116+
Create a discussion summarizing: total PRs, merge rate, code changes (+/- lines), top contributors, and any notable trends. Keep it concise and factual.
174117
````
175118

176119
## Data Caching
@@ -231,15 +174,11 @@ Analyze the combined data at `/tmp/gh-aw/combined.json` covering:
231174

232175
## Best Practices
233176

234-
**Keep steps deterministic** - Avoid randomness or time-dependent logic in steps. The same inputs should produce the same outputs.
235-
236-
**Pre-compute aggregations** - Use `jq`, `awk`, or Python in steps to compute statistics. This reduces agent token usage and improves reliability.
237-
238-
**Structure data clearly** - Output JSON with clear field names. Include a summary file alongside raw data.
239-
240-
**Document data locations** - Tell the agent exactly where to find the prepared data and what format to expect.
241-
242-
**Use safe outputs** - Always use `safe-outputs` for agent actions. Discussions are ideal for reports since they support threading and reactions.
177+
- **Keep steps deterministic** - Same inputs should produce the same outputs; avoid randomness or time-dependent logic.
178+
- **Pre-compute aggregations** - Use `jq`, `awk`, or Python to compute statistics upfront, reducing agent token usage.
179+
- **Structure data clearly** - Output JSON with clear field names; include a summary file alongside raw data.
180+
- **Document data locations** - Tell the agent where to find the data and what format to expect.
181+
- **Use safe outputs** - Discussions are ideal for reports (support threading and reactions).
243182

244183
## Additional Resources
245184

0 commit comments

Comments
 (0)