Skip to content

Commit 5c52288

Browse files
fix(feature-ideation): address CodeRabbit re-review on PR #85 (15 fixes + 5 new tests)
Critical/major: - collect-signals.sh: validate ISSUE_LIMIT/PR_LIMIT/DISCUSSION_LIMIT are positive integers; tighten REPO validation with strict ^[^/]+/[^/]+$ regex - compose-signals.sh: enforce array type (jq 'type == "array"') not just valid JSON so objects/strings don't silently produce wrong counts - date-utils.sh: guard $# before reading $1 to prevent set -u abort on zero-arg calls - filter-bots.sh: replace unquoted array expansion with IFS=',' read -r -a to prevent pathname-globbing against filesystem entries - gh-safe.sh: bounds-check args[i+1] before --jq dereference; add $# guard to gh_safe_graphql_input() to prevent nounset abort - lint-prompt.sh: recognise YAML chomping modifiers (|-,|+,>-,>+) in prompt_marker regex; replace [^}]* GH-expression stripper with a stateful scanner that handles nested braces; preserve exit-2 over exit-1 in main() - match-discussions.sh: wrap json.load calls in try/except for structured error exit-2 instead of Python traceback; skip discussions without an id; switch from greedy per-proposal to similarity-sorted global optimal matching - validate-signals.py: catch OSError on read_text() to preserve exit-2 contract; add -> bool return type annotation to _check_date_time Docs: - README.md: update lint command to mention both direct_prompt: and prompt:; fix Mary's prompt pointer to feature-ideation-reusable.yml Tests (+5 new, 109 → 114 total): - lint-prompt.bats: missing-file-before-lint-failing-file exits 2; YAML chomping modifiers detected; nested GH expressions don't false-positive - match-discussions.bats: malformed signals JSON exits non-zero; malformed proposals JSON exits non-zero - signals-schema.bats: truncated/malformed JSON exits 2 not 1 - date-utils.bats: use date_today helper instead of raw date -u - stubs/gh: prefer TT_TMP/BATS_TEST_TMPDIR for counter file isolation Co-authored-by: don-petry <don-petry@users.noreply.github.com>
1 parent 67c6037 commit 5c52288

14 files changed

Lines changed: 275 additions & 47 deletions

File tree

.github/scripts/feature-ideation/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ bats test/workflows/feature-ideation/gh-safe.bats
5151
shellcheck -x collect-signals.sh lint-prompt.sh match-discussions.sh \
5252
discussion-mutations.sh lib/*.sh)
5353

54-
# Lint the workflow's direct_prompt block
54+
# Lint workflow prompt blocks (direct_prompt: and prompt:)
5555
bash .github/scripts/feature-ideation/lint-prompt.sh
5656
```
5757

@@ -76,7 +76,7 @@ signals.json is a breaking change and must:
7676

7777
1. Bump `SCHEMA_VERSION` in `collect-signals.sh`.
7878
2. Update fixtures under `test/workflows/feature-ideation/fixtures/expected/`.
79-
3. Update Mary's prompt in `feature-ideation.yml` if any field references move.
79+
3. Update Mary's prompt in `.github/workflows/feature-ideation-reusable.yml` if any field references move.
8080

8181
CI validates every fixture against the schema, and the workflow validates
8282
the runtime output before handing it to Mary.

.github/scripts/feature-ideation/collect-signals.sh

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,33 @@ main() {
5858
local discussion_limit="${DISCUSSION_LIMIT:-100}"
5959
local output_path="${SIGNALS_OUTPUT:-./signals.json}"
6060

61-
local owner repo_name
62-
owner="${REPO%%/*}"
63-
repo_name="${REPO##*/}"
64-
if [ "$owner" = "$REPO" ] || [ -z "$repo_name" ]; then
61+
# Validate that limit overrides are positive integers before forwarding to
62+
# GraphQL — a value like `ISSUE_LIMIT=foo` would cause an opaque downstream
63+
# failure instead of a clean usage error. Caught by CodeRabbit review on
64+
# PR petry-projects/.github#85.
65+
local _lim_name _lim_val
66+
for _lim_name in ISSUE_LIMIT PR_LIMIT DISCUSSION_LIMIT; do
67+
case "$_lim_name" in
68+
ISSUE_LIMIT) _lim_val="$issue_limit" ;;
69+
PR_LIMIT) _lim_val="$pr_limit" ;;
70+
DISCUSSION_LIMIT) _lim_val="$discussion_limit" ;;
71+
esac
72+
if [[ ! $_lim_val =~ ^[1-9][0-9]*$ ]]; then
73+
printf '[collect-signals] %s must be a positive integer, got: %s\n' "$_lim_name" "$_lim_val" >&2
74+
return 64
75+
fi
76+
done
77+
78+
# Strict owner/name format — reject leading/trailing slashes, empty segments,
79+
# and extra path parts (e.g. "org//repo", "/repo", "org/repo/extra").
80+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
81+
if [[ ! $REPO =~ ^[^/]+/[^/]+$ ]]; then
6582
printf '[collect-signals] REPO must be in owner/name format, got: %s\n' "$REPO" >&2
6683
return 64
6784
fi
85+
local owner repo_name
86+
owner="${REPO%%/*}"
87+
repo_name="${REPO##*/}"
6888

6989
local thirty_days_ago
7090
thirty_days_ago=$(date_days_ago 30)

.github/scripts/feature-ideation/lib/compose-signals.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,11 @@ compose_signals() {
4949
for input in "$open_issues" "$closed_issues" "$ideas_discussions" "$releases" \
5050
"$merged_prs" "$feature_requests" "$bug_reports" "$truncation_warnings"; do
5151
idx=$((idx + 1))
52-
if ! printf '%s' "$input" | jq -e . >/dev/null 2>&1; then
53-
printf '[compose-signals] arg #%d is not valid JSON: %s\n' "$idx" "${input:0:120}" >&2
52+
# Require a JSON array, not just valid JSON. Objects/strings/nulls accepted
53+
# by `jq -e .` would silently produce wrong counts (key count, char count).
54+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
55+
if ! printf '%s' "$input" | jq -e 'type == "array"' >/dev/null 2>&1; then
56+
printf '[compose-signals] arg #%d must be a JSON array: %s\n' "$idx" "${input:0:120}" >&2
5457
return 65 # EX_DATAERR
5558
fi
5659
done

.github/scripts/feature-ideation/lib/date-utils.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@ set -euo pipefail
1010

1111
# Print an ISO date (YYYY-MM-DD) for N days ago in UTC.
1212
date_days_ago() {
13+
# Guard arg count before reading $1: under set -u a zero-arg call would abort
14+
# the shell with "unbound variable" instead of reaching the validation path.
15+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
16+
if [ "$#" -ne 1 ]; then
17+
printf '[date-utils] expected 1 arg (days), got: %d\n' "$#" >&2
18+
return 64
19+
fi
1320
local days="$1"
1421
if [ -z "$days" ] || ! printf '%s' "$days" | grep -Eq '^[0-9]+$'; then
1522
printf '[date-utils] days must be a non-negative integer, got: %s\n' "$days" >&2

.github/scripts/feature-ideation/lib/filter-bots.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,11 @@ DEFAULT_BOT_AUTHORS=(
3535
filter_bots_build_list() {
3636
local list=("${DEFAULT_BOT_AUTHORS[@]}")
3737
if [ -n "${FEATURE_IDEATION_BOT_AUTHORS:-}" ]; then
38-
local IFS=','
39-
# shellcheck disable=SC2206
40-
local extras=($FEATURE_IDEATION_BOT_AUTHORS)
38+
# Use `IFS=',' read` (not unquoted expansion) to avoid pathname-globbing
39+
# against the filesystem if any entry contains wildcard characters.
40+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
41+
local extras=()
42+
IFS=',' read -r -a extras <<<"${FEATURE_IDEATION_BOT_AUTHORS}"
4143
# Trim leading/trailing whitespace from each comma-separated entry so
4244
# `"bot1, bot2"` resolves to `bot1` and `bot2`, not `bot1` and ` bot2`.
4345
# Caught by CodeRabbit review on PR petry-projects/.github#85.

.github/scripts/feature-ideation/lib/gh-safe.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,13 @@ gh_safe_graphql() {
9393
local i=0
9494
while [ "$i" -lt "${#args[@]}" ]; do
9595
if [ "${args[$i]}" = "--jq" ]; then
96+
# Guard bounds before dereferencing args[i+1]: under set -u an out-of-
97+
# bounds access aborts the shell. Caught by CodeRabbit review on PR
98+
# petry-projects/.github#85.
99+
if [ $((i + 1)) -ge "${#args[@]}" ]; then
100+
_gh_safe_err "graphql-bad-args" "--jq requires a jq filter argument"
101+
return 64
102+
fi
96103
has_jq=1
97104
jq_filter="${args[$((i + 1))]}"
98105
break
@@ -187,6 +194,13 @@ gh_safe_graphql() {
187194
# Same defensive contract as `gh_safe_graphql`: any auth/network/schema
188195
# failure exits non-zero with a structured stderr message.
189196
gh_safe_graphql_input() {
197+
# Guard arg count before reading $1: under set -u a zero-arg call aborts the
198+
# shell instead of reaching the JSON validation. Caught by CodeRabbit review
199+
# on PR petry-projects/.github#85.
200+
if [ "$#" -ne 1 ]; then
201+
_gh_safe_err "graphql-bad-input" "expected 1 arg: JSON request body, got $#"
202+
return 64
203+
fi
190204
local body="$1"
191205
if ! gh_safe_is_json "$body"; then
192206
_gh_safe_err "graphql-bad-input" "request body is not valid JSON"

.github/scripts/feature-ideation/lint-prompt.sh

Lines changed: 53 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,34 @@ scan_file() {
3131
import re
3232
import sys
3333
34+
35+
def _strip_github_expressions(s: str) -> str:
36+
"""Remove ${{ ... }} GitHub Actions expressions from s.
37+
38+
Uses a stateful scanner instead of `[^}]*` regex so that expressions
39+
containing `}` inside string literals (e.g. format() calls) are fully
40+
consumed rather than prematurely terminated. This prevents false-positive
41+
shell-expansion matches on content that is actually inside a GH expression.
42+
Caught by CodeRabbit review on PR petry-projects/.github#85.
43+
"""
44+
result: list[str] = []
45+
i = 0
46+
while i < len(s):
47+
if s[i : i + 3] == "${{":
48+
# Consume until we find the matching "}}"
49+
j = i + 3
50+
while j < len(s):
51+
if s[j : j + 2] == "}}":
52+
j += 2
53+
break
54+
j += 1
55+
i = j # skip the whole ${{ ... }} expression
56+
else:
57+
result.append(s[i])
58+
i += 1
59+
return "".join(result)
60+
61+
3462
path = sys.argv[1]
3563
try:
3664
with open(path, "r", encoding="utf-8") as f:
@@ -55,8 +83,10 @@ findings = []
5583
shell_expansion = re.compile(r'(?<![\\$])\$\([^)]*\)|(?<![\\$])\$\{[A-Za-z_][A-Za-z0-9_]*\}')
5684
5785
# Recognise both `direct_prompt:` (v0) and `prompt:` (v1) markers, with
58-
# optional `|` or `>` block scalar indicators.
59-
prompt_marker = re.compile(r'(?:direct_prompt|prompt):\s*[|>]?\s*$')
86+
# optional `|` or `>` block scalar indicators plus YAML chomping modifiers
87+
# (`-` or `+`) so `prompt: |-`, `prompt: |+`, `prompt: >-`, `prompt: >+`
88+
# are all recognised. Caught by CodeRabbit review on PR petry-projects/.github#85.
89+
prompt_marker = re.compile(r'(?:direct_prompt|prompt):\s*[|>]?[-+]?\s*$')
6090
6191
for lineno, raw in enumerate(lines, start=1):
6292
stripped = raw.lstrip(" ")
@@ -78,8 +108,13 @@ for lineno, raw in enumerate(lines, start=1):
78108
continue
79109
80110
# We're inside the prompt body. Scan for shell expansions.
81-
# First, strip out any GitHub Actions expressions so they don't trip us.
82-
no_gh = re.sub(r'\$\{\{[^}]*\}\}', '', raw)
111+
# First, strip out GitHub Actions ${{ ... }} expressions.
112+
# The naive `[^}]*` regex stops at the first `}`, so expressions that
113+
# contain `}` internally (e.g. format() calls or string literals) are
114+
# not fully removed and leave false-positive shell expansion matches.
115+
# Use a small stateful scanner instead.
116+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
117+
no_gh = _strip_github_expressions(raw)
83118
for match in shell_expansion.finditer(no_gh):
84119
findings.append((lineno, match.group(0), raw.rstrip()))
85120
@@ -107,15 +142,27 @@ main() {
107142
fi
108143

109144
local exit=0
145+
local file_rc=0
110146
for file in "$@"; do
111147
if [ ! -f "$file" ]; then
112148
printf '[lint-prompt] not found: %s\n' "$file" >&2
113149
exit=2
114150
continue
115151
fi
116-
if ! scan_file "$file"; then
117-
exit=1
152+
# Capture the actual exit code so we preserve exit-2 (file error) over
153+
# exit-1 (lint finding). A later lint failure must not overwrite an earlier
154+
# file error. Caught by CodeRabbit review on PR petry-projects/.github#85.
155+
if scan_file "$file"; then
156+
file_rc=0
157+
else
158+
file_rc=$?
118159
fi
160+
case "$file_rc" in
161+
0) ;;
162+
1) if [ "$exit" -eq 0 ]; then exit=1; fi ;;
163+
2) exit=2 ;;
164+
*) return "$file_rc" ;;
165+
esac
119166
done
120167
return "$exit"
121168
}

.github/scripts/feature-ideation/match-discussions.sh

Lines changed: 67 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -97,49 +97,91 @@ def jaccard(a: set[str], b: set[str]) -> float:
9797
return len(a & b) / len(a | b)
9898
9999
100-
with open(signals_path) as f:
101-
signals = json.load(f)
102-
with open(proposals_path) as f:
103-
proposals = json.load(f)
100+
def _load_json(path: str, label: str):
101+
"""Load JSON from path, exiting with code 2 on any read or parse error."""
102+
try:
103+
with open(path, encoding="utf-8") as f:
104+
return json.load(f)
105+
except OSError as exc:
106+
sys.stderr.write(f"[match-discussions] cannot read {label}: {exc}\n")
107+
sys.exit(2)
108+
except json.JSONDecodeError as exc:
109+
sys.stderr.write(f"[match-discussions] invalid JSON in {label}: {exc}\n")
110+
sys.exit(2)
111+
112+
113+
signals = _load_json(signals_path, "signals")
114+
proposals = _load_json(proposals_path, "proposals")
104115
105116
if not isinstance(proposals, list):
106117
sys.stderr.write("[match-discussions] proposals must be a JSON array\n")
107118
sys.exit(65)
108119
109120
discussions = signals.get("ideas_discussions", {}).get("items", []) or []
110-
disc_norm = [(d, normalize(d.get("title", ""))) for d in discussions]
111-
112-
matched = []
113-
new_candidates = []
114-
seen_disc_ids = set()
115-
116-
for proposal in proposals:
121+
# Skip discussions without an id to avoid all id-less entries collapsing into
122+
# a single `None` key in seen_disc_ids. Caught by CodeRabbit on PR #85.
123+
disc_norm = [
124+
(d, normalize(d.get("title", "")))
125+
for d in discussions
126+
if d.get("id") is not None
127+
]
128+
129+
# --- Optimal (similarity-sorted) matching ------------------------------------
130+
# The original greedy per-proposal loop consumed discussions in proposal order,
131+
# so an early lower-value match could block a later higher-value match.
132+
# Instead we enumerate all (proposal, discussion) pairs, sort by similarity
133+
# descending (ties broken by original proposal index for stability), then
134+
# assign greedily. This guarantees globally higher-value matches are honoured
135+
# first. Caught by CodeRabbit review on PR petry-projects/.github#85.
136+
137+
# Collect valid proposals with their original index (for tie-breaking + new_candidates).
138+
proposals_indexed: list[tuple[int, dict]] = []
139+
for p_idx, proposal in enumerate(proposals):
117140
if not isinstance(proposal, dict) or "title" not in proposal:
118141
sys.stderr.write(f"[match-discussions] skipping malformed proposal: {proposal!r}\n")
119142
continue
120-
p_norm = normalize(proposal["title"])
143+
proposals_indexed.append((p_idx, proposal))
121144
122-
best = None
123-
best_sim = 0.0
145+
# Build all (similarity, proposal_idx, disc_id, proposal, disc) tuples.
146+
all_pairs: list[tuple[float, int, str, dict, dict]] = []
147+
for p_idx, proposal in proposals_indexed:
148+
p_norm = normalize(proposal["title"])
124149
for disc, d_norm in disc_norm:
125-
if disc.get("id") in seen_disc_ids:
126-
continue
127150
sim = jaccard(p_norm, d_norm)
128-
if sim > best_sim:
129-
best_sim = sim
130-
best = disc
151+
all_pairs.append((sim, p_idx, disc["id"], proposal, disc))
131152
132-
if best is not None and best_sim >= threshold:
153+
# Sort descending by similarity; stable tie-break by proposal index ascending.
154+
all_pairs.sort(key=lambda x: (-x[0], x[1]))
155+
156+
matched = []
157+
seen_disc_ids: set[str] = set()
158+
seen_proposal_idxs: set[int] = set()
159+
160+
for sim, p_idx, disc_id, proposal, disc in all_pairs:
161+
if p_idx in seen_proposal_idxs or disc_id in seen_disc_ids:
162+
continue
163+
if sim >= threshold:
133164
matched.append(
134165
{
135166
"proposal": proposal,
136-
"discussion": best,
137-
"similarity": round(best_sim, 4),
167+
"discussion": disc,
168+
"similarity": round(sim, 4),
138169
}
139170
)
140-
seen_disc_ids.add(best.get("id"))
141-
else:
142-
new_candidates.append({"proposal": proposal, "best_similarity": round(best_sim, 4)})
171+
seen_disc_ids.add(disc_id)
172+
seen_proposal_idxs.add(p_idx)
173+
174+
# Unmatched proposals become new candidates.
175+
new_candidates = []
176+
for p_idx, proposal in proposals_indexed:
177+
if p_idx in seen_proposal_idxs:
178+
continue
179+
p_norm = normalize(proposal["title"])
180+
best_sim = max(
181+
(jaccard(p_norm, d_norm) for _, d_norm in disc_norm),
182+
default=0.0,
183+
)
184+
new_candidates.append({"proposal": proposal, "best_similarity": round(best_sim, 4)})
143185
144186
result = {
145187
"matched": matched,

.github/scripts/feature-ideation/validate-signals.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,12 @@ def main(argv: list[str]) -> int:
5353
return 2
5454

5555
try:
56-
signals = json.loads(signals_path.read_text())
56+
signals = json.loads(signals_path.read_text(encoding="utf-8"))
57+
except OSError as exc:
58+
# File read errors (permissions, I/O) must also exit 2. Caught by
59+
# CodeRabbit review on PR petry-projects/.github#85.
60+
sys.stderr.write(f"[validate-signals] cannot read {signals_path}: {exc}\n")
61+
return 2
5762
except json.JSONDecodeError as exc:
5863
# Per the docstring contract, exit 2 means usage / file error and
5964
# exit 1 means schema validation error. A malformed signals file
@@ -63,7 +68,10 @@ def main(argv: list[str]) -> int:
6368
return 2
6469

6570
try:
66-
schema = json.loads(schema_path.read_text())
71+
schema = json.loads(schema_path.read_text(encoding="utf-8"))
72+
except OSError as exc:
73+
sys.stderr.write(f"[validate-signals] cannot read schema {schema_path}: {exc}\n")
74+
return 2
6775
except json.JSONDecodeError as exc:
6876
sys.stderr.write(f"[validate-signals] invalid schema JSON: {exc}\n")
6977
return 2
@@ -79,7 +87,7 @@ def main(argv: list[str]) -> int:
7987
format_checker = FormatChecker()
8088

8189
@format_checker.checks("date-time", raises=(ValueError, TypeError))
82-
def _check_date_time(instance): # noqa: ANN001 — jsonschema callback signature
90+
def _check_date_time(instance) -> bool: # noqa: ANN001 — jsonschema callback signature
8391
if not isinstance(instance, str):
8492
return True # non-strings handled by `type` keyword, not format
8593
# Must look like a date-time, not just any string. Require at least

test/workflows/feature-ideation/date-utils.bats

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,10 @@ setup() {
1515
}
1616

1717
@test "date_days_ago: 0 returns today" {
18-
today=$(date -u +%Y-%m-%d)
18+
# Use the helper function rather than the raw system `date` call so the test
19+
# validates behaviour consistently on both GNU and BSD systems.
20+
# Caught by CodeRabbit review on PR petry-projects/.github#85.
21+
today=$(date_today)
1922
run date_days_ago 0
2023
[ "$status" -eq 0 ]
2124
[ "$output" = "$today" ]

0 commit comments

Comments
 (0)