Skip to content

feat: add ban list hotfix#47

Merged
aturret merged 1 commit intomainfrom
general-banlist-hotfix
Feb 1, 2026
Merged

feat: add ban list hotfix#47
aturret merged 1 commit intomainfrom
general-banlist-hotfix

Conversation

@aturret
Copy link
Owner

@aturret aturret commented Feb 1, 2026

Summary by CodeRabbit

  • New Features

    • Added filtering to block processing of ChatGPT, Gemini, and Telegram share links.
  • Refactor

    • Simplified text content extraction from web scraping results by standardizing to a single HTML content truncation method.

✏️ Tip: You can customize this high-level summary in your review settings.

@aturret aturret merged commit 23b8f8b into main Feb 1, 2026
1 of 2 checks passed
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 1, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The pull request modifies Firecrawl scraper text processing to use HTML content truncation instead of description fields, and adds URL pattern-based banning functionality by introducing a BANNED_PATTERNS configuration that prevents processing of specific share link URLs.

Changes

Cohort / File(s) Summary
Firecrawl Text Processing
app/services/scrapers/firecrawl_client/scraper.py
Removed description field usage; text content now consistently derives from HTML content truncated to FIRECRAWL_TEXT_LIMIT, simplifying the text extraction logic.
URL Banning System
app/utils/config.py, app/utils/parse.py
Added BANNED_PATTERNS configuration containing regex patterns for chatgpt/share, gemini/share, and t.me links. Updated parse.py to import and apply these patterns as an additional ban check in get_url_metadata when ban_list parameter doesn't flag the URL.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • aturret/FastFetchBot#45: Introduces Firecrawl scraper integration with the same scraper.py and config/parse modules that this PR modifies for text processing and URL filtering.

Poem

🐰 Hop along through the web, we now say nay,
To bothersome shares that get in the way!
HTML content flows fresh and clean,
No descriptions to muddy the scene,
Banned patterns caught—swift and serene!

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch general-banlist-hotfix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant