You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/content/docs/patterns/data-ops.md
+8-69Lines changed: 8 additions & 69 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,27 +5,10 @@ sidebar:
5
5
badge: { text: 'Hybrid', variant: 'caution' }
6
6
---
7
7
8
-
DataOps combines deterministic data extraction with agentic analysis. Shell commands in `steps:` collect and prepare data, then the AI agent in the markdown body analyzes results and produces safe outputs like discussions or comments.
9
-
10
-
## When to Use DataOps
11
-
12
-
-**Data aggregation** - Collect metrics from APIs, logs, or repositories
13
-
-**Report generation** - Analyze data and produce human-readable summaries
14
-
-**Trend analysis** - Process historical data and identify patterns
15
-
-**Auditing** - Gather evidence and generate audit reports
8
+
DataOps combines deterministic data extraction with agentic analysis: shell commands in `steps:` reliably collect and prepare data (fast, cacheable, reproducible), then the AI agent reads the results and generates insights. Use this pattern for data aggregation, report generation, trend analysis, and auditing.
16
9
17
10
## The DataOps Pattern
18
11
19
-
### Separation of Concerns
20
-
21
-
DataOps separates two distinct phases:
22
-
23
-
1.**Deterministic extraction** (`steps:`) - Shell commands that reliably fetch, filter, and structure data. These run before the agent and produce predictable, reproducible results.
24
-
25
-
2.**Agentic analysis** (markdown body) - The AI agent reads the prepared data, interprets patterns, and generates insights. The agent has access to the data files created by the steps.
26
-
27
-
This separation ensures data collection is fast, reliable, and cacheable, while the AI focuses on interpretation and communication.
28
-
29
12
### Basic Structure
30
13
31
14
```aw wrap
@@ -126,51 +109,11 @@ timeout-minutes: 10
126
109
127
110
# Weekly Pull Request Summary
128
111
129
-
Generate a summary of pull request activity for the repository.
130
-
131
-
## Available Data
132
-
133
-
The following data has been prepared for your analysis:
134
-
112
+
Analyze the prepared data:
135
113
- `/tmp/gh-aw/pr-data/recent-prs.json` - Last 100 PRs with full metadata
1. **Read the prepared data** from the files above
141
-
2. **Analyze the statistics** to identify:
142
-
- Overall activity levels
143
-
- Merge rate and velocity
144
-
- Most active contributors
145
-
- Code churn (additions vs deletions)
146
-
3. **Generate a summary report** as a GitHub discussion with:
147
-
- Key metrics in a clear format
148
-
- Notable trends or observations
149
-
- Top contributors acknowledgment
150
-
151
-
## Report Format
152
-
153
-
Create a discussion with this structure:
154
-
155
-
```markdown
156
-
# Weekly PR Summary - [Date Range]
157
-
158
-
## Key Metrics
159
-
- **Total PRs**: X
160
-
- **Merged**: X (Y%)
161
-
- **Open**: X
162
-
- **Code Changes**: +X / -Y lines across Z files
163
-
164
-
## Top Contributors
165
-
1. @author1 - X PRs
166
-
2. @author2 - Y PRs
167
-
...
168
-
169
-
## Observations
170
-
[Brief insights about activity patterns]
171
-
```
172
-
173
-
Keep the report concise and factual. Focus on the numbers and let them tell the story.
116
+
Create a discussion summarizing: total PRs, merge rate, code changes (+/- lines), top contributors, and any notable trends. Keep it concise and factual.
174
117
````
175
118
176
119
## Data Caching
@@ -231,15 +174,11 @@ Analyze the combined data at `/tmp/gh-aw/combined.json` covering:
231
174
232
175
## Best Practices
233
176
234
-
**Keep steps deterministic** - Avoid randomness or time-dependent logic in steps. The same inputs should produce the same outputs.
235
-
236
-
**Pre-compute aggregations** - Use `jq`, `awk`, or Python in steps to compute statistics. This reduces agent token usage and improves reliability.
237
-
238
-
**Structure data clearly** - Output JSON with clear field names. Include a summary file alongside raw data.
239
-
240
-
**Document data locations** - Tell the agent exactly where to find the prepared data and what format to expect.
241
-
242
-
**Use safe outputs** - Always use `safe-outputs` for agent actions. Discussions are ideal for reports since they support threading and reactions.
177
+
-**Keep steps deterministic** - Same inputs should produce the same outputs; avoid randomness or time-dependent logic.
178
+
-**Pre-compute aggregations** - Use `jq`, `awk`, or Python to compute statistics upfront, reducing agent token usage.
179
+
-**Structure data clearly** - Output JSON with clear field names; include a summary file alongside raw data.
180
+
-**Document data locations** - Tell the agent where to find the data and what format to expect.
181
+
-**Use safe outputs** - Discussions are ideal for reports (support threading and reactions).
0 commit comments