Skip to content

Conversation

@ncarazon
Copy link
Contributor

@ncarazon ncarazon commented Feb 3, 2026

Closes #4175

This PR improves HTML sanitization for metadata descriptions.
Added a shared stripHtmlTags helper and replaced ad‑hoc regex stripping in tournament/community/OG metadata.

  • add stripHtmlTags utility that repeatedly removes tags until clean
  • use stripHtmlTags in tournament, community, and OG metadata description parsing

Summary by CodeRabbit

  • Refactor
    • Consolidated HTML sanitization logic into a centralized utility for consistent content processing across the platform. Applied standardized tag stripping to tournament metadata, community page descriptions, and social media sharing content. Enhanced HTML tag removal to properly handle nested, malformed, and complex tag structures for more reliable content display.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

A new stripHtmlTags utility function is introduced to centralize HTML tag removal logic, replacing three instances of inline regex implementations across different page files. The utility applies iterative regex replacements to ensure complete sanitization of nested or malformed tags.

Changes

Cohort / File(s) Summary
Utility Function
front_end/src/utils/formatters/string.ts
Added stripHtmlTags(html: string) function that iteratively removes HTML tags until none remain, handling nested and malformed tags more robustly than single-pass regex.
Metadata Generation Pages
front_end/src/app/(main)/(tournaments)/tournament/[slug]/page.tsx, front_end/src/app/(main)/c/[slug]/page.tsx, front_end/src/app/og/tournament/[slug]/page.tsx
Updated generateMetadata functions to use the new stripHtmlTags utility instead of inline regex patterns for HTML sanitization in description fields.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A helper born to cleanse the tags,
Three pages smile, no longer dragging.
HTML stripped with steady care,
Centralized logic—fair and square! ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix multi-character sanitization' accurately summarizes the main change: improving HTML tag stripping by replacing ad-hoc regex logic with a shared utility function that handles nested/multi-character tags.
Linked Issues check ✅ Passed The PR directly addresses the linked issue #4175 by implementing the stripHtmlTags utility function to fix code-scanning security alerts related to HTML sanitization in metadata descriptions.
Out of Scope Changes check ✅ Passed All changes are focused on HTML sanitization improvements: introducing stripHtmlTags utility and replacing ad-hoc regex stripping in three metadata files, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/multi-char-sanitization

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6d979a and 73acc46.

📒 Files selected for processing (4)
  • front_end/src/app/(main)/(tournaments)/tournament/[slug]/page.tsx
  • front_end/src/app/(main)/c/[slug]/page.tsx
  • front_end/src/app/og/tournament/[slug]/page.tsx
  • front_end/src/utils/formatters/string.ts
🧰 Additional context used
🧬 Code graph analysis (2)
front_end/src/app/(main)/(tournaments)/tournament/[slug]/page.tsx (1)
front_end/src/utils/formatters/string.ts (1)
  • stripHtmlTags (26-36)
front_end/src/app/og/tournament/[slug]/page.tsx (1)
front_end/src/utils/formatters/string.ts (1)
  • stripHtmlTags (26-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Docker Image / Build Docker Image
  • GitHub Check: Backend Checks
  • GitHub Check: Frontend Checks
  • GitHub Check: integration-tests
🔇 Additional comments (4)
front_end/src/utils/formatters/string.ts (1)

21-35: Centralized tag stripping looks good.

The iterative cleanup is straightforward and keeps sanitization consistent across callers.

front_end/src/app/(main)/(tournaments)/tournament/[slug]/page.tsx (1)

24-25: LGTM — metadata now uses the shared sanitizer.

This reduces regex duplication and aligns with the new utility.

Also applies to: 46-48

front_end/src/app/og/tournament/[slug]/page.tsx (1)

7-7: stripFirstLine now reuses the shared sanitizer — nice.

Keeps OG text stripping consistent with other metadata flows.

Also applies to: 98-101

front_end/src/app/(main)/c/[slug]/page.tsx (1)

15-15: LGTM — shared sanitizer applied in community metadata.

Consistent with the new utility and keeps metadata parsing clean.

Also applies to: 35-35

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

🚀 Preview Environment

Your preview environment is ready!

Resource Details
🌐 Preview URL https://metaculus-pr-4223-feat-multi-char-sanitization-preview.mtcl.cc
📦 Docker Image ghcr.io/metaculus/metaculus:feat-multi-char-sanitization-73acc46
🗄️ PostgreSQL NeonDB branch preview/pr-4223-feat-multi-char-sanitization
Redis Fly Redis mtc-redis-pr-4223-feat-multi-char-sanitization

Details

  • Commit: b3e396a970a0a112c9d94dff892a7c7f57ff51fe
  • Branch: feat/multi-char-sanitization
  • Fly App: metaculus-pr-4223-feat-multi-char-sanitization

ℹ️ Preview Environment Info

Isolation:

  • PostgreSQL and Redis are fully isolated from production
  • Each PR gets its own database branch and Redis instance
  • Changes pushed to this PR will trigger a new deployment

Limitations:

  • Background workers and cron jobs are not deployed in preview environments
  • If you need to test background jobs, use Heroku staging environments

Cleanup:

  • This preview will be automatically destroyed when the PR is closed

Copy link
Contributor

@elisescu elisescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@elisescu elisescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but left one inline comment


while (previous !== current) {
previous = current;
current = current.replace(/<[^>]*>/g, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not fail with examples like:

? Is that not a concern?

Copy link
Contributor

@cemreinanc cemreinanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

12, 13, 14

4 participants