Skip to content

feat: refactor codebase to microservices monorepo#56

Merged
aturret merged 9 commits intomainfrom
massive-refactor
Feb 18, 2026
Merged

feat: refactor codebase to microservices monorepo#56
aturret merged 9 commits intomainfrom
massive-refactor

Conversation

@aturret
Copy link
Owner

@aturret aturret commented Feb 18, 2026

Summary by CodeRabbit

  • New Features

    • Restructured application into separate API and Telegram bot services for independent deployment and scaling.
    • Added support for additional content sources including Bluesky scraper.
    • Improved media handling and file export capabilities.
  • Infrastructure

    • Implemented Docker containerization for both services via GitHub Container Registry.
    • Added GitHub Actions CI/CD pipeline with matrix builds and automated deployment.
    • Refactored codebase into monorepo structure with shared library.
  • Documentation

    • Significantly expanded deployment and configuration guides.
    • Documented environment variables and service communication setup.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This pull request restructures the project from a monolithic application into a distributed monorepo architecture with three packages: a shared utilities package (fastfetchbot-shared) containing models and common functions, an API server (apps/api) for web-based content scraping, and a Telegram bot service (apps/telegram-bot) that communicates with the API. The CI/CD workflow is updated to build and publish container images for both services to GitHub Container Registry.

Changes

Cohort / File(s) Summary
CI/CD Workflow
.github/workflows/ci.yml
Replaced single Docker job with matrix-based build for api and telegram-bot services; updated to GitHub Container Registry; added separate deploy job triggering Watchtower webhook.
Documentation & Configuration
README.md, template.env, pyproject.toml, docker-compose.template.yml
Comprehensive README rewrite documenting monorepo structure, deployment via Docker, and new environment configuration; template environment additions for bot modes and API server URL; workspace configuration and docker-compose restructuring with separate api and telegram-bot services.
Shared Package Core
packages/shared/fastfetchbot_shared/config.py, packages/shared/fastfetchbot_shared/models/*, packages/shared/pyproject.toml
New shared package foundation with configuration, data models (MetadataItem, MediaFile, UrlMetadata, TelegraphItem), and utilities (image processing, HTML parsing, network requests, logging).
Shared Utilities
packages/shared/fastfetchbot_shared/utils/*
Utilities library extracted from app: image handling, logger initialization, network requests, URL/HTML parsing with pattern-based content classification.
Legacy App Refactoring
app/config.py, app/models/*, app/utils/*
Re-export shared implementations; consolidate configuration and data models into shared package dependencies; remove local implementations in favor of unified shared interface.
Telegram Bot Integration (Legacy)
app/services/telegram_bot/__init__.py, app/services/telegram_bot/handlers.py, app/services/telegram_bot/message_sender.py, app/services/inoreader/telegram_process.py
Extract Telegram-specific logic into submodules; introduce message callbacks for flexible message dispatch; add comprehensive URL processing and button handlers with inline keyboards and channel selection.
API Service Setup
apps/api/Dockerfile, apps/api/pyproject.toml, apps/api/src/config.py, apps/api/src/auth.py, apps/api/src/main.py, apps/api/src/database.py
New API service with FastAPI application, API key authentication, MongoDB integration via Beanie, Sentry error tracking, and centralized configuration for all external services.
API Core Services
apps/api/src/routers/*, apps/api/src/services/amazon/s3.py, apps/api/src/services/telegraph/__init__.py, apps/api/src/services/file_export/*, apps/api/src/models/database_model.py
RESTful endpoints for content scraping (Inoreader, WeChat, general URLs); AWS S3 integration; Telegraph publishing; file export (PDF, audio transcription, video downloading).
Scraper Implementations (Social Media)
apps/api/src/services/scrapers/twitter/__init__.py, apps/api/src/services/scrapers/weibo/*, apps/api/src/services/scrapers/bluesky/*, apps/api/src/services/scrapers/instagram/__init__.py, apps/api/src/services/scrapers/reddit/__init__.py, apps/api/src/services/scrapers/threads/__init__.py, apps/api/src/services/scrapers/wechat/__init__.py, apps/api/src/services/scrapers/douban/__init__.py, apps/api/src/services/scrapers/xiaohongshu/*, apps/api/src/services/scrapers/zhihu/*
Comprehensive scraper implementations for major social media platforms with HTML parsing, API integration, media extraction, and Jinja2-based content rendering; includes Xiaohongshu crawler framework with Playwright automation and proxy management.
Scraper Infrastructure
apps/api/src/services/scrapers/common.py, apps/api/src/services/scrapers/scraper.py, apps/api/src/services/scrapers/scraper_manager.py, apps/api/src/services/scrapers/general/*
Abstract scraper interfaces; InfoExtractService orchestrator for dispatcher-based item retrieval; ScraperManager for lazy initialization; general webpage scrapers (Firecrawl, Zyte) with LLM-assisted content extraction and client singletons.
Inoreader Integration
apps/api/src/services/inoreader/__init__.py, apps/api/src/services/inoreader/process.py
Inoreader API client and data processing pipeline; supports multiple stream types; optional inter-service callback dispatch to Telegram bot.
API Templates
apps/api/src/templates/*.jinja2
Jinja2 HTML templates for rendering scraped content across platforms (Bluesky, Douban, Reddit, Weibo, Xiaohongshu, Zhihu, video metadata) with platform-specific formatting.
Telegram Bot Service
apps/telegram-bot/Dockerfile, apps/telegram-bot/pyproject.toml, apps/telegram-bot/core/config.py, apps/telegram-bot/core/main.py, apps/telegram-bot/core/database.py
New Telegram bot service with FastAPI/Starlette webhook and polling modes, MongoDB persistence, centralized configuration, and API client for communicating with the API server.
Telegram Bot Handlers
apps/telegram-bot/core/handlers/*, apps/telegram-bot/core/services/*, apps/telegram-bot/core/webhook/server.py
URL extraction and processing (with metadata fetching); button callback handling (channel selection, action routing); message persistence; error reporting; media packaging and sending with size/limit handling.
Telegram Bot Models & Templates
apps/telegram-bot/core/models/*, apps/telegram-bot/core/templates/*, apps/telegram-bot/core/services/constants.py
Telegram data models (TelegramUser, TelegramChat, TelegramMessage) for database persistence; message formatting templates; constants and translations (EN/ZH_CN/ZH_TW).
Tests
tests/routers/test_scraper.py, apps/telegram-bot/tests/*
Existing router tests minor formatting; new Telegram webhook tests covering authentication, valid updates, and async processing behavior.

Sequence Diagram(s)

sequenceDiagram
    actor User as User/<br/>Telegram
    participant TBot as Telegram Bot<br/>Service
    participant API as API Server<br/>Service
    participant Scraper as Scraper<br/>Module
    participant Ext as External<br/>Service
    participant DB as MongoDB

    User->>TBot: /start or message<br/>with URL
    activate TBot
    TBot->>TBot: Parse message,<br/>extract URLs
    
    alt URL Detected
        TBot->>API: POST /scraper/getItem<br/>(url, api_key)
        activate API
        API->>Scraper: Dispatch to<br/>appropriate scraper
        activate Scraper
        Scraper->>Ext: Fetch page data<br/>(HTTP, API calls)
        activate Ext
        Ext-->>Scraper: Raw data
        deactivate Ext
        Scraper->>Scraper: Parse & transform<br/>into MetadataItem
        Scraper-->>API: Structured item
        deactivate Scraper
        
        API->>DB: Save item<br/>(if enabled)
        activate DB
        DB-->>API: Confirmation
        deactivate DB
        
        API-->>TBot: MetadataItem JSON
        deactivate API
        
        TBot->>TBot: Format message<br/>via template
        TBot->>TBot: Package media<br/>(download, resize)
        TBot->>User: Send message<br/>+ media
    else Error/Unknown
        TBot->>User: Error or<br/>help message
    end
    deactivate TBot
Loading
sequenceDiagram
    participant Client as External Client
    participant API as API Server
    participant InfoSvc as InfoExtract<br/>Service
    participant MGR as ScraperManager
    participant Scraper as Specific<br/>Scraper
    participant Cache as Cache/DB

    Client->>API: POST /scraper/getItem<br/>(url, params)
    activate API
    API->>InfoSvc: new InfoExtractService<br/>(UrlMetadata)
    activate InfoSvc
    
    InfoSvc->>InfoSvc: Detect source<br/>category
    
    alt Known Source
        InfoSvc->>Scraper: Dispatch via<br/>service_classes registry
        activate Scraper
    else Legacy Source
        InfoSvc->>MGR: init_scraper(category)
        activate MGR
        MGR->>Scraper: Create processor
        MGR-->>InfoSvc: Processor
        deactivate MGR
        activate Scraper
    end
    
    Scraper->>Scraper: Fetch & parse<br/>content
    Scraper-->>InfoSvc: Item dict
    deactivate Scraper
    
    InfoSvc->>InfoSvc: Post-process item<br/>(telegraph, PDF, DB)
    InfoSvc->>Cache: Optionally persist
    activate Cache
    Cache-->>InfoSvc: OK
    deactivate Cache
    
    InfoSvc-->>API: Final item
    deactivate InfoSvc
    API-->>Client: JSON response
    deactivate API
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

This is a substantial monorepo restructuring involving 150+ new files, significant module consolidation (extracting shared utilities into a new package), two new microservices (API and Telegram bot), comprehensive scraper implementations for 10+ social platforms, database model definitions, template systems, and CI/CD pipeline updates. The heterogeneity of changes (infrastructure, services, scrapers, templates, tests) combined with complex multi-component interactions and high logic density across modules demands careful verification of API contracts, service communication flows, database persistence logic, and error handling pathways.

Possibly related PRs

Poem

🐰 Whiskers twitching with delight,
A monorepo, shining bright!
Shared packages, services split,
API and bot, a perfect fit!
Scrapers bloom across the land,
Ten platforms, coordinated hand in hand! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.99% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: refactor codebase to microservices monorepo' accurately summarizes the main architectural change—converting a single application into a microservices-based monorepo structure with separate API and Telegram bot services.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch massive-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aturret aturret merged commit 2475ab1 into main Feb 18, 2026
1 of 2 checks passed
@aturret aturret deleted the massive-refactor branch February 19, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant