Add working link checker workflow using lychee by LukasWallrich · Pull Request #705 · forrtproject/forrtproject.github.io

LukasWallrich · 2026-03-18T16:17:30Z

Summary

Replaces the broken filiph/linkcheck workflow (currently disabled as link-check.yaml_OLD) with lychee, a fast Rust-based link checker
Crawls the live https://forrt.org site weekly (Mondays 01:30 UTC) and on manual dispatch
Creates a GitHub issue with label link-check listing all broken links found
Adds .lychee.toml config with exclusions for common false positives (LinkedIn, Twitter/X, doi.org, web.archive.org, etc.)

What changed

.github/workflows/link-check.yaml — new workflow using lychee/lychee-action@v2
.lychee.toml — link checker config (reusable locally with lychee --config .lychee.toml https://forrt.org)

Key design decisions

Crawls the live site (not source files) since many pages are dynamically generated by Hugo
Does not fail the workflow — only creates an issue when broken links are found
Uses default GITHUB_TOKEN, no custom secrets needed
Limits concurrency to 8 requests to avoid overwhelming the server

Test plan

Trigger manually via Actions tab → "Link Checker" → "Run workflow"
Verify issue is created with broken link report (if any)
Confirm excluded domains (LinkedIn, doi.org, etc.) don't appear as false positives

🤖 Generated with Claude Code

Replace the broken filiph/linkcheck workflow with lychee, which crawls the live forrt.org site weekly and creates a GitHub issue listing any broken links found. Includes .lychee.toml config with exclusions for common false positives (LinkedIn, Twitter/X, doi.org, web.archive.org). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-18T16:17:58Z

⚠️ Image files/references in png/jpg format detected

Note that we generally rely on webp format for this webpage, so please consider converting these images to WebP format and updating references accordingly.

References to image files:

content/educators-corner/022-repro-metrics-forrt-irise/index.md: app](reprometrics_app.png

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-18T16:21:25Z

📝 Spell Check Results

Found 1 potential spelling issue(s) when checking 24 changed file(s):

📄 `content/educators-corner/004-Teaching-why-how-replication/index.md`

Line	Issue
94	pre-selected ==> preselected

ℹ️ How to address these issues:

Fix the typo: If it's a genuine typo, please correct it.
Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to .codespell-ignore.txt
False positive: If this is a false positive, please report it in the PR comments.

_{🤖 This check was performed by codespell}

Lychee doesn't support recursive crawling, so fetch all page URLs from forrt.org/sitemap.xml and check links on every page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rkflow

LukasWallrich · 2026-03-18T16:26:50Z

✅ Staging Deployment Status

This PR has been successfully deployed to staging as part of an aggregated deployment.

Deployed at: 2026-03-19 14:22:07 UTC
Staging URL: https://staging.forrt.org

The staging site shows the combined state of all compatible open PRs.

Replace grep -oP (Perl regex) with grep -o + sed for broader shell compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Download the latest deploy artifact instead of crawling the live site. Lychee scans the local HTML files and checks every link it finds, both internal and external. This catches broken outbound links that the sitemap-only approach missed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove email addresses from author fields in educators-corner posts (Sarah von Grebmer, Rachel Heyard) - Fix YAML syntax in Berit Barthelmes author profile (stray 'Name' prefix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Internal links resolve to remote fetches via --base-url, causing thousands of false 404s for assets. Exclude forrt.org since those are already local files. Also exclude Sage, T&F, APA which block automated requests with 403s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Academic publishers (Sage, T&F, APA, etc.) return 403 for all automated requests — valid and invalid URLs alike. Accept 403 as non-broken so these links are still checked but don't produce false positives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Convert 488 publisher-specific DOI URLs to canonical https://doi.org/ format across 11 content files (glossary excluded as auto-generated) - Strip session-specific casa_token query params from all URLs - Remove doi.org from lychee exclusion list (it returns proper 404s for invalid DOIs, unlike publishers that block all bot requests) - Add workflow step to flag remaining publisher DOI URLs in the link checker issue report Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Flag any direct publisher URL (not just those with visible DOIs) so contributors know to look up and use the doi.org format. Added ScienceDirect, JSTOR, LWW, and Royal Society to the pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove 403 from accepted status codes so they appear in lychee output, then post-process to move them into a collapsed <details> block. This keeps the main report focused on actionable errors while still surfacing bot-blocked URLs for reference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukasWallrich · 2026-03-18T17:02:56Z

@richarddushime Could you review and merge this when you get a chance? The merge is needed to make publisher URLs more checkable — we've converted ~490 publisher-specific DOI URLs to doi.org format, and the link checker now flags any remaining ones in the weekly report. Until this is merged, the workflow checks the old build artifact which still has the publisher URLs.

Lychee reports the same broken URL once per page it appears on, making the issue body exceed GitHub's 65KB limit. Post-process to show each broken URL only once, with shortened output. Also moves per-page headers out in favour of a flat deduplicated list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GitHub limits issue bodies to 65KB. Cap 403 and publisher URL lists at 100 entries each with a count of remaining items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Track which page(s) each broken URL appears on so they can be found - Keep publisher URL section open (not collapsed) as last section - 403s still collapsed and capped at 100 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The full grep line content from reversals.md made the issue body exceed 65KB. Use grep -o to extract just the URL, with file:line prefix, and deduplicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

richarddushime

LGTM 👍
but this will create conflicts with this PR #699
merging this then i will fix the conflict later

Richard Dushime and others added 4 commits March 12, 2026 01:13

fix:compatibility

525fda0

Merge branch 'master' into compat

58bda57

Merge branch 'master' into compat

2ecf31a

LukasWallrich requested a review from a team as a code owner March 18, 2026 16:17

LukasWallrich and others added 2 commits March 18, 2026 16:18

Merge branch 'master' into fix/link-checker-workflow

0a0f3a8

Fix lychee action path: lycheeverse/lychee-action

7efb9eb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukasWallrich and others added 2 commits March 18, 2026 16:22

Crawl full site via sitemap instead of single URL

d221f1d

Lychee doesn't support recursive crawling, so fetch all page URLs from forrt.org/sitemap.xml and check links on every page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/compat' into fix/link-checker-wo…

96f8f47

…rkflow

LukasWallrich and others added 6 commits March 18, 2026 16:27

Use portable grep/sed for sitemap URL extraction

e947b0e

Replace grep -oP (Perl regex) with grep -o + sed for broader shell compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace deprecated --base with --base-url

17f3995

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukasWallrich mentioned this pull request Mar 18, 2026

Fix link checker & broken (external) URLs #522

Open

LukasWallrich and others added 3 commits March 18, 2026 16:54

LukasWallrich and others added 5 commits March 18, 2026 17:12

Truncate 403 and publisher URL lists to fit GitHub issue body limit

8a3043d

GitHub limits issue bodies to 65KB. Cap 403 and publisher URL lists at 100 entries each with a count of remaining items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Compact publisher URL output: show file:line + URL only

861d868

The full grep line content from reversals.md made the issue body exceed 65KB. Use grep -o to extract just the URL, with file:line prefix, and deduplicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'master' into fix/link-checker-workflow

38e1f2e

richarddushime approved these changes Mar 19, 2026

View reviewed changes

richarddushime merged commit b47e64c into master Mar 19, 2026
5 checks passed

richarddushime deleted the fix/link-checker-workflow branch March 19, 2026 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add working link checker workflow using lychee#705

Add working link checker workflow using lychee#705
richarddushime merged 22 commits intomasterfrom
fix/link-checker-workflow

LukasWallrich commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

LukasWallrich commented Mar 18, 2026 •

edited

Loading

Uh oh!

LukasWallrich commented Mar 18, 2026

Uh oh!

richarddushime left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LukasWallrich commented Mar 18, 2026

Summary

What changed

Key design decisions

Test plan

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Image files/references in png/jpg format detected

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Spell Check Results

📄 content/educators-corner/004-Teaching-why-how-replication/index.md

ℹ️ How to address these issues:

Uh oh!

LukasWallrich commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LukasWallrich commented Mar 18, 2026

Uh oh!

richarddushime left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 18, 2026 •

edited

Loading

github-actions bot commented Mar 18, 2026 •

edited

Loading

📄 `content/educators-corner/004-Teaching-why-how-replication/index.md`

LukasWallrich commented Mar 18, 2026 •

edited

Loading