Skip to content

Deprecate code-generated titles — AI agent must produce all titles, descriptions, and SEO metadata#1610

Merged
pethers merged 5 commits intomainfrom
copilot/improve-agentic-workflows-analysis
Apr 8, 2026
Merged

Deprecate code-generated titles — AI agent must produce all titles, descriptions, and SEO metadata#1610
pethers merged 5 commits intomainfrom
copilot/improve-agentic-workflows-analysis

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

Article titles, meta descriptions, and Schema.org structured data were generated by TypeScript code (regex extraction from <strong> tags, keyword matching, document count interpolation) rather than by AI analysis of actual political content. This produced generic, repetitive titles like "Committee Reports: Defense in Focus" instead of newsworthy headlines derived from synthesis findings.

Code changes

  • generateDynamicTitle() — gutted to a stub returning base title + AI-attribution marker. The AI agent overwrites this during agentic workflows after reading synthesis-summary.md
  • extractHighlights(), extractDominantTheme() — removed entirely. Regex scanning HTML for <strong> tags is not political intelligence
  • All 10 generators (propositions, committee-reports, motions, interpellations, evening, week-ahead, deep-inspection, weekly-review, monthly-review, month-ahead, breaking-news) — subtitle templates replaced: removed ${docs.length} documents interpolation, added static AI-attribution stubs
  • generateSeoDescription() — simplified to SERP length enforcement only

Methodology & workflow prompt updates

  • ai-driven-analysis-guide.md — added v5.0 "Analysis-Driven Article Decision Protocol" with explicit ban on code-generated titles and required synthesis→title flow
  • political-style-guide.md — added "Article Title & SEO Standards" section with title formula: [Active Verb] + [Actor/Institution] + [Policy Action]
  • synthesis-summary.md template — added "AI-Recommended Article Metadata" fields (title, meta description, key highlights, editorial decision)
  • SHARED_PROMPT_PATTERNS.md — added "Analysis→Title Pipeline" protocol
  • 5 workflow prompts (propositions, committee-reports, motions, interpellations, evening) — Step 3b rewritten to read synthesis before generating titles for all 14 languages

Tests

  • dynamic-title.test.ts rewritten for stub behavior — validates no .length interpolation in subtitles across all discovered generators, verifies AI-attribution markers present
  • 14 dynamic-title tests pass, 4055 total tests pass, 0 CodeQL alerts

Copilot AI and others added 5 commits April 8, 2026 12:49
…banned title/description patterns

- Fix extractHighlights() to reject metadata field labels (Committee:, Filed by:, Published:)
- Remove BANNED ": {Topic} in Focus" suffix from generateDynamicTitle()
- Add sanitizeAlternativeHeadline() to strip boilerplate from Schema.org alternativeHeadline
- Add generateSeoDescription() for proper meta description without banned patterns
- Add countWords() for accurate Schema.org wordCount (was using length/5)
- Add speakable property to Schema.org NewsArticle structured data
- Add dateCreated to Schema.org structured data
- Improve all 14-language title templates in generators.ts - remove generic "This Week"/"Battle Lines" phrases
- Add 2 new banned patterns to shared.ts BANNED_PATTERNS array
- Update SHARED_PROMPT_PATTERNS.md with new banned patterns for AI workflows

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/568e92a9-076f-4d3e-ad26-025340de26d0

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…ractHighlights/extractDominantTheme, replace all subtitle templates with AI stubs

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/8e049658-aedb-4cdc-9e35-b48c1a80e47a

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions bot added documentation Documentation updates workflow GitHub Actions workflows ci-cd CI/CD pipeline changes testing Test coverage refactor Code refactoring news News articles and content generation agentic-workflow Agentic workflow changes size-l Large change (250-1000 lines) labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: documentation,workflow,ci-cd,testing,refactor,size-l,news,agentic-workflow

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI requested a review from pethers April 8, 2026 14:08
@pethers pethers marked this pull request as ready for review April 8, 2026 14:45
Copilot AI review requested due to automatic review settings April 8, 2026 14:45
@pethers pethers merged commit 8906c20 into main Apr 8, 2026
16 checks passed
@pethers pethers deleted the copilot/improve-agentic-workflows-analysis branch April 8, 2026 14:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR deprecates code-generated article titles/descriptions and shifts responsibility for all title/SEO metadata generation to the AI agent, using synthesis-driven workflow guidance and placeholder stubs in generators.

Changes:

  • Replaces dynamic/heuristic title generation (regex highlights + theme extraction) with a v5.0 stub and updates generator subtitle templates to AI-attribution stubs (removing doc-count interpolation).
  • Updates the article HTML template to enforce SERP-length descriptions and improve Schema.org metadata fields.
  • Updates methodology docs and workflow prompts to mandate “analysis → synthesis → title/SEO” sequencing, and adjusts tests to validate the new stub behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/dynamic-title.test.ts Rewrites unit/integration tests to validate stubbed title generation and to scan generators for disallowed subtitle interpolation.
scripts/news-types/weekly-review/generator.ts Replaces doc-count subtitle templates with AI-attribution stub subtitles; keeps API compatibility.
scripts/news-types/monthly-review.ts Replaces doc-count subtitle templates with AI-attribution stub subtitles; keeps API compatibility.
scripts/news-types/month-ahead.ts Replaces event-count subtitle templates with AI-attribution stub subtitles; keeps API compatibility.
scripts/news-types/breaking-news.ts Replaces breaking-news subtitles with AI-attribution stub subtitles across languages.
scripts/generate-news-enhanced/helpers.ts Removes regex-based highlight/theme extraction and converts generateDynamicTitle() into a deprecated v5.0 stub.
scripts/generate-news-enhanced/generators.ts Replaces generator subtitle templates with AI-attribution stubs (removing ${*.length} patterns) and simplifies titles.
scripts/data-transformers/content-generators/shared.ts Extends banned-pattern detection to catch additional template artifacts (e.g., “in Focus” suffixes).
scripts/article-template/template.ts Adds SEO/structured-data helpers (altHeadline sanitization, meta description truncation, wordCount calculation, speakable) and wires them into HTML + JSON-LD.
analysis/templates/synthesis-summary.md Adds mandatory “AI-Recommended Article Metadata” fields to make synthesis the single source of truth for titles/descriptions.
analysis/methodologies/political-style-guide.md Adds v5.0 title/SEO standards and banned title patterns; bumps methodology version metadata.
analysis/methodologies/ai-driven-analysis-guide.md Adds v5.0 “absolute ban” on code-generated titles and a mandatory analysis-driven decision protocol.
.github/workflows/SHARED_PROMPT_PATTERNS.md Adds an explicit “Analysis→Title Pipeline” protocol and expands banned title/description examples.
.github/workflows/news-propositions.md Updates Step 3b to require synthesis-first title/SEO generation and all-language metadata updates.
.github/workflows/news-motions.md Updates Step 3b to require synthesis-first title/SEO generation and all-language metadata updates.
.github/workflows/news-interpellations.md Updates Step 3c to require synthesis-first title/SEO generation and all-language metadata updates.
.github/workflows/news-evening-analysis.md Updates Step 3b to require cross-type synthesis-first title/SEO generation and all-language metadata updates.
.github/workflows/news-committee-reports.md Updates Step 3b to require synthesis-first title/SEO generation and all-language metadata updates.

Comment on lines +51 to +53
const HEADLINE_BANNED_PATTERNS: readonly RegExp[] = [
/Political intelligence briefing on [A-Za-z:]+\s+and\s+[A-Za-z:]+\s*[—–-]\s*\d+ parliamentary documents analyzed/i,
/In-depth analysis of [A-Za-z:]+\s+based on \d+ parliamentary documents/i,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEADLINE_BANNED_PATTERNS (and the generateSeoDescription safety-net checks below) are too restrictive to catch the real leaked boilerplate seen in existing pages, e.g. "Political intelligence briefing on Filed by: and Published: — 10 parliamentary documents analyzed". The current patterns assume single-word field labels (A-Za-z / \w) and therefore won’t match "Filed by:" / "Committee:" cases; broaden these regexes to allow multi-word field labels (and ideally Unicode letters) so the sanitizer actually strips the known template artifacts.

Suggested change
const HEADLINE_BANNED_PATTERNS: readonly RegExp[] = [
/Political intelligence briefing on [A-Za-z:]+\s+and\s+[A-Za-z:]+\s*[-]\s*\d+ parliamentary documents analyzed/i,
/In-depth analysis of [A-Za-z:]+\s+based on \d+ parliamentary documents/i,
const HEADLINE_FIELD_LABEL_WORD = String.raw`[\p{L}\p{M}]+`;
const HEADLINE_FIELD_LABEL = String.raw`${HEADLINE_FIELD_LABEL_WORD}(?:[\s-]+${HEADLINE_FIELD_LABEL_WORD})*\s*:`;
const HEADLINE_BANNED_PATTERNS: readonly RegExp[] = [
new RegExp(
String.raw`Political intelligence briefing on ${HEADLINE_FIELD_LABEL}\s+and\s+${HEADLINE_FIELD_LABEL}\s*[—–-]\s*\d+ parliamentary documents analyzed`,
'iu',
),
new RegExp(
String.raw`In-depth analysis of ${HEADLINE_FIELD_LABEL}\s+based on \d+ parliamentary documents`,
'iu',
),

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +72
function sanitizeAlternativeHeadline(subtitle: string, maxLen: number = 110): string {
let clean = subtitle;
for (const bp of HEADLINE_BANNED_PATTERNS) {
if (bp.test(clean)) {
clean = clean.replace(bp, '').trim();
}
}
// If cleaning emptied the string, return a safe fallback
if (clean.length < MIN_HEADLINE_LENGTH) {
clean = subtitle.substring(0, maxLen);
}
return clean.substring(0, maxLen);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizeAlternativeHeadline() falls back to the original subtitle when the banned-pattern removal leaves < MIN_HEADLINE_LENGTH. That re-introduces the exact boilerplate you’re trying to strip (so Schema.org alternativeHeadline can still contain banned template text). Prefer a fallback that cannot contain the banned patterns (e.g., use the article title, or a known-safe AI stub), and consider trimming leftover punctuation after replacements.

Copilot uses AI. Check for mistakes.
* Calculate accurate word count from HTML content by stripping tags first.
*/
function countWords(html: string): number {
const text = html.replace(/<[^>]+>/g, ' ').replace(/\s+/g, ' ').trim();
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countWords() will return misleadingly low counts for languages that don’t use spaces between words (notably ja/zh/ko in this repo). Since wordCount is emitted into Schema.org, consider using Intl.Segmenter (Node 25 supports it) for per-language word segmentation, or add a fallback heuristic (e.g., character-based estimate) when the stripped text contains few/no spaces.

Suggested change
const text = html.replace(/<[^>]+>/g, ' ').replace(/\s+/g, ' ').trim();
const text = html.replace(/<[^>]+>/g, ' ').replace(/\s+/g, ' ').trim();
if (text.length === 0) {
return 0;
}
const segmenter = new Intl.Segmenter('und', { granularity: 'word' });
const wordLikeCount = Array.from(segmenter.segment(text)).filter(segment => segment.isWordLike).length;
if (wordLikeCount > 0) {
return wordLikeCount;
}
const hasWhitespace = /\s/u.test(text);
const cjkCharacters = text.match(/[\p{Script=Han}\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Hangul}]/gu);
if (!hasWhitespace && cjkCharacters !== null) {
return cjkCharacters.length;
}

Copilot uses AI. Check for mistakes.
{ label: 'analysisOfNDocuments: "Analysis of N documents covering…"', pattern: /Analysis of \d+ documents covering/i },
{ label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"', pattern: /Requires committee review and chamber debate/i },
{ label: 'topicInFocusSuffix: "…: {Topic} in Focus"', pattern: /:\s+\w[\w\s]*\bin Focus\b/i },
{ label: 'briefingOnFieldLabels: "Political intelligence briefing on {Field}: and {Field}:"', pattern: /Political intelligence briefing on \w+:\s+and\s+\w+:/i },
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new banned-pattern detector for "Political intelligence briefing on {Field}: and {Field}:" won’t match the actual leaked boilerplate (e.g. "Filed by:" / "Committee:"), because the regex uses \w+ which stops at spaces. Update the pattern to allow multi-word field labels (and ideally Unicode letters) so detectBannedPatterns() reliably flags these known template artifacts.

Suggested change
{ label: 'briefingOnFieldLabels: "Political intelligence briefing on {Field}: and {Field}:"', pattern: /Political intelligence briefing on \w+:\s+and\s+\w+:/i },
{ label: 'briefingOnFieldLabels: "Political intelligence briefing on {Field}: and {Field}:"', pattern: /Political intelligence briefing on [\p{L}\p{N}][\p{L}\p{N}\s,&/()-]*:\s+and\s+[\p{L}\p{N}][\p{L}\p{N}\s,&/()-]*:/iu },

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +161
for (const file of generatorFiles) {
const content = fs.readFileSync(file, 'utf-8');
const subtitleLines = content.match(/subtitle:\s*`[^`]+`/g) ?? [];
const aiStubs = subtitleLines.filter(s => s.includes('AI-generat') || s.includes('AI-genererad') || s.includes('AI-genereret') || s.includes('AI-generert') || s.includes('tekoäly') || s.includes('KI-generierte') || s.includes('AI-gegenereerde') || s.includes('الذكاء الاصطناعي') || s.includes('בינה מלאכותית') || s.includes('AI生成') || s.includes('AI 생성') || s.includes('générée par IA') || s.includes('generado por IA'));
// At least some subtitles should have the AI attribution marker
if (subtitleLines.length > 0) {
expect(aiStubs.length, `${file} should have AI attribution in subtitle stubs`).toBeGreaterThan(0);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "subtitle templates contain AI attribution stub" assertion only checks that at least one subtitle template in a generator file contains an AI marker. The PR description/intent says subtitles are stubs specifically so the agent can reliably detect & overwrite them, so this test should enforce that every discovered subtitle: template literal in these generators includes the attribution marker (or otherwise explicitly whitelist exceptions).

Suggested change
for (const file of generatorFiles) {
const content = fs.readFileSync(file, 'utf-8');
const subtitleLines = content.match(/subtitle:\s*`[^`]+`/g) ?? [];
const aiStubs = subtitleLines.filter(s => s.includes('AI-generat') || s.includes('AI-genererad') || s.includes('AI-genereret') || s.includes('AI-generert') || s.includes('tekoäly') || s.includes('KI-generierte') || s.includes('AI-gegenereerde') || s.includes('الذكاء الاصطناعي') || s.includes('בינה מלאכותית') || s.includes('AI生成') || s.includes('AI 생성') || s.includes('générée par IA') || s.includes('generado por IA'));
// At least some subtitles should have the AI attribution marker
if (subtitleLines.length > 0) {
expect(aiStubs.length, `${file} should have AI attribution in subtitle stubs`).toBeGreaterThan(0);
const aiAttributionMarkers = [
'AI-generat',
'AI-genererad',
'AI-genereret',
'AI-generert',
'tekoäly',
'KI-generierte',
'AI-gegenereerde',
'الذكاء الاصطناعي',
'בינה מלאכותית',
'AI生成',
'AI 생성',
'générée par IA',
'generado por IA',
];
for (const file of generatorFiles) {
const content = fs.readFileSync(file, 'utf-8');
const subtitleLines = content.match(/subtitle:\s*`[^`]+`/g) ?? [];
const aiStubs = subtitleLines.filter(subtitleLine =>
aiAttributionMarkers.some(marker => subtitleLine.includes(marker)),
);
// Every discovered subtitle template must be an AI-attribution stub so the
// agent can reliably detect and replace it. Files without subtitle templates
// are ignored by this assertion.
if (subtitleLines.length > 0) {
expect(
aiStubs.length,
`${file} should include AI attribution in every subtitle stub`,
).toBe(subtitleLines.length);

Copilot uses AI. Check for mistakes.
subtitle: `Ugentlig gennemgang — AI-genereret politisk analyse fra det svenske parlament`
},
no: {
title: `Ukentlig gjennomgang: Stortinget i retrospekt`,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Norwegian (no) subtitle now explicitly says the analysis is from the Swedish parliament, but the corresponding title still says "Stortinget" (Norway’s parliament). This is internally inconsistent and misleading in the rendered metadata; adjust the no title (or subtitle) to refer to Sweden’s Riksdag consistently.

Suggested change
title: `Ukentlig gjennomgang: Stortinget i retrospekt`,
title: `Ukentlig gjennomgang: Riksdagen i retrospekt`,

Copilot uses AI. Check for mistakes.
subtitle: `Månedsgennemgang — AI-genereret politisk analyse fra det svenske parlament`
},
no: {
title: `Månedsgjennomgang: Stortinget i perspektiv`,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Norwegian (no) subtitle now explicitly says the analysis is from the Swedish parliament, but the corresponding title still says "Stortinget" (Norway’s parliament). This mismatch was introduced by the subtitle change and will surface in page titles/SEO; update the no title (or subtitle) to consistently reference Sweden’s Riksdag.

Suggested change
title: `Månedsgjennomgang: Stortinget i perspektiv`,
title: `Månedsgjennomgang: Riksdagen i perspektiv`,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agentic-workflow Agentic workflow changes ci-cd CI/CD pipeline changes documentation Documentation updates news News articles and content generation refactor Code refactoring size-l Large change (250-1000 lines) testing Test coverage workflow GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants