A modern news aggregator with AI support for automatic collection, processing, and distribution of news in multiple languages.
Official website: https://firefeed.net
- Project Overview
- Key Features
- Technology Stack
- Architecture
- Installation and Setup
- Configuration
- API Documentation
- Development
- License
FireFeed is a high-performance system for automatic collection, processing, and distribution of news content. The project uses modern machine learning technologies for intelligent text processing and provides multilingual support for international audiences.
- Automatic news translation to 4 languages (Russian, German, French, English) using modern machine learning models (Helsinki-NLP OPUS-MT, M2M100) - optional via TRANSLATION_ENABLED
- Duplicate detection using semantic analysis and vector embeddings (Sentence Transformers) - optional via DUPLICATE_DETECTOR_ENABLED
- Intelligent image processing with automatic extraction and optimization
- Fully localized Telegram bot with support for 4 languages
- REST API with multilingual interface
- Adaptive translation system with terminology consideration
- Automatic parsing of over 50 RSS feeds from various sources
- News categorization by topics (world news, technology, sports, economy, etc.)
- Personalized user subscriptions to categories and sources
- Custom RSS feeds - ability to add personal sources
- JWT authentication for API
- Password encryption using bcrypt
- Email validation with confirmation codes
- Secure secret storage through environment variables
- Asynchronous architecture based on asyncio
- PostgreSQL connection pool for efficient database operations
- Task queues for parallel translation processing
- ML model caching for memory optimization
- Python 3.8+ with asyncio
- FastAPI for REST API
- PostgreSQL with pgvector for semantic search
- Redis for storing API key usage data
- aiopg for asynchronous database queries
- Transformers (Hugging Face)
- Sentence Transformers for embeddings
- SpaCy for text processing
- Torch for computations
- Telegram Bot API
- SMTP for email notifications
- Webhook support
- Docker containerization
- systemd for service management
- nginx for proxying
The project consists of several key components:
- Telegram Bot (
apps/telegram_bot.py) - main user interaction interface - RSS Parser Service (
apps/rss_parser.py) - background service for RSS feed parsing - REST API (
apps/api.py) - web API for external integrations - Translation Services (
services/translation/) - translation system with caching - Duplicate Detector (
services/text_analysis/duplicate_detector.py) - ML-based duplicate detection - User Management (
services/user/user_manager.py) - user and subscription management
The Telegram bot serves as the primary interface for users to interact with the FireFeed system. It provides personalized news delivery, subscription management, and multilingual support.
- Personalized News Delivery: Users receive news based on their category subscriptions in their preferred language
- Multilingual Interface: Full localization support for English, Russian, German, and French
- Subscription Management: Easy category-based subscription configuration through inline keyboards
- Automatic Publishing: News items are automatically published to configured Telegram channels
To prevent spam and ensure fair usage, the bot implements sophisticated rate limiting for news publications:
Each RSS feed has configurable limits:
cooldown_minutes: Minimum time between publications from this feed (default: 60 minutes)max_news_per_hour: Maximum number of news items per hour from this feed (default: 10)
Before publishing any news item to Telegram channels, the system performs two types of checks:
-
Count-based Limiting:
- Counts publications from the same feed within the last 60 minutes
- If count >=
max_news_per_hour, skips publication - Uses data from
rss_items_telegram_bot_publishedtable
-
Time-based Limiting:
- Checks time since last publication from the same feed
- If elapsed time <
cooldown_minutes, skips publication
# Example: feed with cooldown_minutes=120, max_news_per_hour=1
# - Maximum 1 publication per 120 minutes
# - Minimum 120 minutes between publications
# Before each publication attempt:
recent_count = get_recent_telegram_publications_count(feed_id, 60)
if recent_count >= 1:
skip_publication()
last_time = get_last_telegram_publication_time(feed_id)
if last_time:
elapsed = now - last_time
if elapsed < timedelta(minutes=120):
skip_publication()This ensures that even if multiple news items are processed simultaneously from the same feed, only the allowed number will be published to Telegram, preventing rate limit violations and maintaining quality user experience.
- Horizontal scaling through microservice architecture
- Fault tolerance with automatic restarts and logging
- Performance monitoring with detailed telemetry
- Graceful shutdown for proper service termination
The project uses modern service-oriented architecture with dependency injection to ensure high testability and maintainability.
Service for fetching and parsing RSS feeds.
Key Features:
- Asynchronous RSS feed fetching with semaphore support for concurrency control
- XML structure parsing with extraction of titles, content, and metadata
- Duplicate detection through built-in detector
- Media content extraction (images, videos)
Configuration:
RSS_MAX_CONCURRENT_FEEDS=10
RSS_MAX_ENTRIES_PER_FEED=50
RSS_PARSER_MIN_ITEM_TITLE_WORDS_LENGTH=0
RSS_PARSER_MIN_ITEM_CONTENT_WORDS_LENGTH=0Service for RSS feed validation.
Key Features:
- URL availability checking with timeouts
- Validation result caching
- RSS structure correctness determination
Configuration:
RSS_VALIDATION_CACHE_TTL=300
RSS_REQUEST_TIMEOUT=15Service for RSS data database operations.
Key Features:
- Saving RSS items to database
- News translation management
- RSS feed settings retrieval (cooldowns, limits)
Service for extracting media content from RSS items.
Key Features:
- Image URL extraction from various RSS formats (media:thumbnail, enclosure)
- Video URL extraction with size checking
- Atom and RSS format support
ML model manager for translations.
Key Features:
- Lazy loading of translation models
- In-memory model caching with automatic cleanup
- GPU/CPU memory management
Configuration:
TRANSLATION_MAX_CACHED_MODELS=15
TRANSLATION_MODEL_CLEANUP_INTERVAL=1800
TRANSLATION_DEVICE=cpuMain service for performing translations.
Key Features:
- Batch translation processing for performance optimization
- Text preprocessing and postprocessing
- Translation concurrency management
Configuration:
TRANSLATION_MAX_CONCURRENT=3Translation result caching.
Key Features:
- Translation caching with TTL
- Cache size limitation
- Automatic cleanup of expired entries
Configuration:
CACHE_DEFAULT_TTL=3600
CACHE_MAX_SIZE=10000Service for managing Telegram bot users and their preferences.
Key Features:
- User settings management (subscriptions, language)
- Category-based subscriber retrieval
- User language preferences
- Database operations for Telegram bot users
Interface: ITelegramUserService
Service for managing web users and Telegram account linking.
Key Features:
- Telegram link code generation and validation
- Web user to Telegram user association
- Secure linking process with expiration
- Database operations for web user management
Interface: IWebUserService
Backward compatibility wrapper that delegates to specialized services.
Key Features:
- Unified interface for both Telegram and web users
- Automatic delegation to appropriate service
- Maintains existing API compatibility
Interface: IUserManager
Dependency injection container for service management.
Key Features:
- Service and factory registration
- Automatic dependency resolution
- Service lifecycle management
Centralized configuration of all services through environment variables.
Configuration Example:
# RSS services
RSS_MAX_CONCURRENT_FEEDS=10
RSS_MAX_ENTRIES_PER_FEED=50
RSS_VALIDATION_CACHE_TTL=300
RSS_REQUEST_TIMEOUT=15
# Translation services
TRANSLATION_MAX_CONCURRENT=3
TRANSLATION_MAX_CACHED_MODELS=15
TRANSLATION_MODEL_CLEANUP_INTERVAL=1800
TRANSLATION_DEVICE=cpu
# Caching
CACHE_DEFAULT_TTL=3600
CACHE_MAX_SIZE=10000
CACHE_CLEANUP_INTERVAL=300
# Task queues
QUEUE_MAX_SIZE=30
QUEUE_DEFAULT_WORKERS=1
QUEUE_TASK_TIMEOUT=300Abstract interfaces for all services, providing:
- Dependency Inversion Principle
- Easy testing through mock objects
- Implementation replacement flexibility
Hierarchy of custom exceptions for different error types:
RSSException- RSS processing errorsTranslationException- translation errorsDatabaseException- database errorsCacheException- caching errors
- High testability - each service is tested in isolation
- Configuration flexibility - all parameters configurable via environment variables
- Easy maintenance - clear separation of responsibilities
- Scalability - services can be easily replaced or extended
- Reliability - specific error handling and graceful degradation
- Python 3.8 or higher
- PostgreSQL 12+ with pgvector extension
- Telegram Bot API token
pip install -r requirements.txt- Copy .env.example to .env
- Configure real values for variables in .env file
# Create virtual environment
python -m venv venv
source venv/bin/activate # for Windows: venv\Scripts\activate
# Run Telegram bot
python bot.py# Make scripts executable
chmod +x ./scripts/run_telegram_bot.sh
chmod +x ./scripts/run_rss_parser.sh
chmod +x ./scripts/run_api.sh
# Run Telegram bot
./scripts/run_telegram_bot.sh
# Run RSS parser
./scripts/run_rss_parser.sh
# Run API
./scripts/run_api.shCreate a .env file in the project root directory:
# Logging level (DEBUG, INFO, WARNING, ERROR)
LOG_LEVEL=INFO
# Database configuration
DB_HOST=localhost
DB_USER=your_db_user
DB_PASSWORD=your_db_password
DB_NAME=firefeed
DB_PORT=5432
DB_MINSIZE=5
DB_MAXSIZE=20
# Telegram bot API configuration
API_BASE_URL=http://127.0.0.1:8000/api/v1
# SMTP configuration for email notifications
SMTP_SERVER=smtp.yourdomain.com
SMTP_PORT=465
SMTP_EMAIL=your_email@yourdomain.com
SMTP_PASSWORD=your_smtp_password
SMTP_USE_TLS=True
# Webhook configuration for Telegram bot
WEBHOOK_LISTEN=127.0.0.1
WEBHOOK_PORT=5000
WEBHOOK_URL_PATH=webhook
WEBHOOK_URL=https://yourdomain.com/webhook
# Telegram Bot Token (get from @BotFather)
BOT_TOKEN=your_telegram_bot_token
# Telegram bot channel IDs
CHANNEL_ID_RU=-1000000000000
CHANNEL_ID_DE=-1000000000001
CHANNEL_ID_FR=-1000000000002
CHANNEL_ID_EN=-1000000000003
# Telegram bot channel categories
CHANNEL_CATEGORIES=world,technology,lifestyle,politics,economy,autos,sports
# TTL for cleaning expired user data (24 hours)
USER_DATA_TTL_SECONDS=86400
# JWT configuration for API authentication
JWT_SECRET_KEY=your_jwt_secret_key
JWT_ALGORITHM=HS256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
# Service configuration
# RSS services
RSS_MAX_CONCURRENT_FEEDS=10
RSS_MAX_ENTRIES_PER_FEED=50
RSS_VALIDATION_CACHE_TTL=300
RSS_REQUEST_TIMEOUT=15
RSS_MAX_TOTAL_ITEMS=1000
RSS_PARSER_MEDIA_TYPE_PRIORITY=image
# RSS parser content filtering
RSS_PARSER_MIN_ITEM_TITLE_WORDS_LENGTH=0
RSS_PARSER_MIN_ITEM_CONTENT_WORDS_LENGTH=0
# Default User-Agent for HTTP requests
# Using Chrome-like User-Agent to avoid blocking while remaining minimally identifiable
DEFAULT_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 FireFeed/1.0"
# Absolute path to images directory on server
IMAGES_ROOT_DIR=/path/to/data/images/
# Absolute path to videos directory on server
VIDEOS_ROOT_DIR=/path/to/data/videos/
# Absolute path to videos directory on website
HTTP_VIDEOS_ROOT_DIR=https://yourdomain.com/data/videos/
# Absolute path to images directory on website
HTTP_IMAGES_ROOT_DIR=https://yourdomain.com/data/images/
# Translation services
TRANSLATION_MAX_CONCURRENT=3
TRANSLATION_MAX_CACHED_MODELS=15
TRANSLATION_CLEANUP_INTERVAL=1800
TRANSLATION_DEVICE=cpu
TRANSLATION_MAX_WORKERS=4
# Translation models
TRANSLATION_MODEL=facebook/m2m100_418M
TRANSLATION_ENABLED=true
# Cache services
CACHE_DEFAULT_TTL=3600
CACHE_MAX_SIZE=10000
CACHE_CLEANUP_INTERVAL=300
# Queue services
QUEUE_MAX_SIZE=30
QUEUE_DEFAULT_WORKERS=1
QUEUE_TASK_TIMEOUT=300
# Deduplication services
DUPLICATE_DETECTOR_ENABLED=true
# Embedding models
EMBEDDING_SENTENCE_TRANSFORMER_MODEL=paraphrase-multilingual-MiniLM-L12-v2
# spaCy models
SPACY_EN_MODEL=en_core_web_sm
SPACY_RU_MODEL=ru_core_news_sm
SPACY_DE_MODEL=de_core_news_sm
SPACY_FR_MODEL=fr_core_news_smFireFeed provides optional AI-powered features that can be enabled or disabled based on your needs:
- Default:
true - Description: Controls automatic translation of news articles to multiple languages
- Impact: When disabled, news items will only be available in their original language
- Use case: Disable to reduce computational load or when translations are not needed
- Default:
true - Description: Controls ML-based duplicate detection using semantic analysis
- Impact: When disabled, all news items will be processed without duplicate checking
- Use case: Disable for faster processing or when duplicate detection is handled externally
- Default:
0 - Description: Minimum number of words required in RSS item title
- Impact: RSS items with titles containing fewer words than this threshold will be skipped
- Use case: Filter out low-quality or incomplete news items with very short titles
- Default:
0 - Description: Minimum number of words required in RSS item content/description
- Impact: RSS items with content containing fewer words than this threshold will be skipped
- Use case: Filter out low-quality or incomplete news items with very short content
FireFeed allows customization of the AI models used for translation, embeddings, and text processing:
- Default:
facebook/m2m100_418M - Description: Specifies the translation model from Hugging Face Transformers
- Supported models: M2M100, Helsinki-NLP OPUS-MT, MarianMT, MBart, etc.
- Example:
Helsinki-NLP/opus-mt-en-rufor Helsinki-NLP models
- Default:
paraphrase-multilingual-MiniLM-L12-v2 - Description: Sentence transformer model for generating text embeddings
- Supported models: Any SentenceTransformer-compatible model from Hugging Face
- Example:
all-MiniLM-L6-v2for faster, smaller model
- Default:
en_core_web_sm,ru_core_news_sm,de_core_news_sm,fr_core_news_sm - Description: spaCy language models for text processing and linguistic analysis
- Supported models: Any spaCy model compatible with the language
- Example:
en_core_web_trffor transformer-based English model
For production environments, systemd services are recommended.
Telegram Bot Service (/etc/systemd/system/firefeed-bot.service):
[Unit]
Description=FireFeed Telegram Bot Service
After=network.target
[Service]
Type=simple
User=firefeed
Group=firefeed
WorkingDirectory=/path/to/firefeed/
ExecStart=/path/to/firefeed/scripts/run_telegram_bot.sh
Restart=on-failure
RestartSec=10
TimeoutStopSec=30
KillMode=mixed
KillSignal=SIGTERM
SendSIGKILL=yes
[Install]
WantedBy=multi-user.targetAPI Service (/etc/systemd/system/firefeed-api.service):
[Unit]
Description=Firefeed News API (FastAPI)
After=network.target
After=postgresql@17-main.service
Wants=postgresql@17-main.service
[Service]
Type=simple
User=firefeed
Group=firefeed
WorkingDirectory=/path/to/firefeed/
ExecStart=/path/to/firefeed/scripts/run_api.sh
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.targetExample configuration for webhook and FastAPI operation:
upstream fastapi_app {
server 127.0.0.1:8000;
}
server {
listen 80;
server_name your_domain.com;
location /webhook {
proxy_pass http://127.0.0.1:5000/webhook;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location /api/ {
proxy_pass http://fastapi_app;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}After starting the API server, documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Main endpoints:
GET /api/v1/news- get news listPOST /api/v1/users/register- user registrationGET /api/v1/subscriptions- subscription management
# Clone repository from GitHub
git clone https://github.com/yuremweiland/firefeed.git
# or GitVerse
git clone https://gitverse.ru/yuryweiland/firefeed.git
cd firefeed
# Install dependencies
pip install -r requirements.txtAll tests
pytest tests/Specific module
pytest tests/test_models.pyStop on first failure
pytest tests/ -xShort output
pytest tests/ --tb=shortfirefeed/
├── apps/ # Application entry points
│ ├── __init__.py
│ ├── rss_parser/ # RSS parser application
│ │ └── __main__.py # RSS parser entry point
│ ├── telegram_bot/ # Telegram bot application
│ │ ├── __init__.py
│ │ ├── __main__.py # Telegram bot entry point
│ │ ├── bot.py # Main bot logic
│ │ ├── config.py # Bot configuration
│ │ ├── translations.py # Bot translations
│ │ ├── handlers/ # Telegram bot handlers
│ │ │ ├── __init__.py
│ │ │ ├── callback_handlers.py # Callback query handlers
│ │ │ ├── command_handlers.py # Command handlers
│ │ │ ├── error_handlers.py # Error handlers
│ │ │ └── message_handlers.py # Message handlers
│ │ ├── models/ # Telegram bot models
│ │ │ ├── __init__.py
│ │ │ ├── rss_item.py # RSS item models
│ │ │ ├── telegram_models.py # Telegram models
│ │ │ └── user_state.py # User state models
│ │ ├── services/ # Telegram bot services
│ │ │ ├── __init__.py
│ │ │ ├── api_service.py # API communication service
│ │ │ ├── database_service.py # Database service
│ │ │ ├── rss_service.py # RSS service
│ │ │ ├── telegram_service.py # Telegram messaging service
│ │ │ └── user_state_service.py # User state service
│ │ └── utils/ # Telegram bot utilities
│ │ ├── __init__.py
│ │ ├── cleanup_utils.py # Cleanup utilities
│ │ ├── formatting_utils.py # Message formatting
│ │ ├── keyboard_utils.py # Keyboard utilities
│ │ └── validation_utils.py # Validation utilities
│ └── api/ # FastAPI REST API application
│ ├── __init__.py
│ ├── __main__.py # FastAPI entry point
│ ├── app.py # FastAPI application
│ ├── database.py # Database connection
│ ├── deps.py # Dependencies
│ ├── middleware.py # Custom middleware
│ ├── models.py # Pydantic models
│ ├── websocket.py # WebSocket support
│ ├── email_service/ # Email service
│ │ ├── __init__.py
│ │ ├── sender.py # Email sending
│ │ └── templates/ # Email templates
│ │ ├── password_reset_email_de.html
│ │ ├── password_reset_email_en.html
│ │ ├── password_reset_email_ru.html
│ │ ├── registration_success_email_de.html
│ │ ├── registration_success_email_en.html
│ │ ├── registration_success_email_ru.html
│ │ ├── verification_email_de.html
│ │ ├── verification_email_en.html
│ │ └── verification_email_ru.html
│ └── routers/ # API endpoints
│ ├── __init__.py
│ ├── api_keys.py # API key management
│ ├── auth.py # Authentication
│ ├── categories.py # News categories
│ ├── rss_feeds.py # RSS feed management
│ ├── rss_items.py # RSS items
│ ├── rss.py # RSS operations
│ ├── telegram.py # Telegram integration
│ └── users.py # User management
├── config/ # Configuration modules
│ ├── logging_config.py # Logging configuration
│ └── services_config.py # Service configuration
├── database/ # Database related files
│ └── migrations.sql # Database migrations
├── exceptions/ # Custom exceptions
│ ├── __init__.py
│ ├── base_exceptions.py # Base exception classes
│ ├── cache_exceptions.py # Cache related exceptions
│ ├── database_exceptions.py # Database exceptions
│ ├── rss_exceptions.py # RSS processing exceptions
│ ├── service_exceptions.py # Service exceptions
│ └── translation_exceptions.py # Translation exceptions
├── interfaces/ # Service interfaces
│ ├── __init__.py
│ ├── core_interfaces.py # Core interfaces
│ ├── repository_interfaces.py # Repository interfaces
│ ├── rss_interfaces.py # RSS interfaces
│ ├── translation_interfaces.py # Translation interfaces
│ └── user_interfaces.py # User interfaces
├── repositories/ # Data access layer
│ ├── __init__.py
│ ├── api_key_repository.py # API key repository
│ ├── category_repository.py # Category repository
│ ├── rss_feed_repository.py # RSS feed repository
│ ├── rss_item_repository.py # RSS item repository
│ ├── source_repository.py # Source repository
│ ├── telegram_repository.py # Telegram repository
│ └── user_repository.py # User repository
├── services/ # Service-oriented architecture
│ ├── database_pool_adapter.py # Database connection pool
│ ├── maintenance_service.py # System maintenance
│ ├── rss/ # RSS processing services
│ │ ├── __init__.py
│ │ ├── media_extractor.py # Media content extraction
│ │ ├── rss_fetcher.py # RSS feed fetching
│ │ ├── rss_manager.py # RSS processing orchestration
│ │ ├── rss_parser.py # RSS parsing logic
│ │ ├── rss_storage.py # RSS data storage
│ │ └── rss_validator.py # RSS feed validation
│ ├── text_analysis/ # Text analysis and ML services
│ │ ├── __init__.py
│ │ ├── duplicate_detector.py # ML-based duplicate detection
│ │ └── embeddings_processor.py # Text embeddings and processing
│ ├── translation/ # Translation services
│ │ ├── __init__.py
│ │ ├── model_manager.py # ML model management
│ │ ├── task_queue.py # Translation task queue
│ │ ├── terminology_dict.py # Translation terminology
│ │ ├── translation_cache.py # Translation caching
│ │ ├── translation_service.py # Translation processing
│ │ └── translations.py # Translation messages
│ └── user/ # User management services
│ ├── __init__.py
│ ├── telegram_user_service.py # Telegram bot user management
│ ├── web_user_service.py # Web user management and Telegram linking
│ └── user_manager.py # Backward compatibility wrapper
├── tests/ # Unit and integration tests
│ ├── __init__.py
│ ├── test_api_keys.py # API key tests
│ ├── test_bot.py # Telegram bot tests
│ ├── test_database.py # Database tests
│ ├── test_di_integration.py # Dependency injection tests
│ ├── test_email.py # Email service tests
│ ├── test_models.py # Model tests
│ ├── test_registration_success_email.py # Email template tests
│ ├── test_rss_manager.py # RSS manager tests
│ ├── test_services.py # Service tests
│ ├── test_user_services.py # User services tests
│ ├── test_user_state_service.py # User state service tests
│ └── test_utils.py # Utility tests
├── utils/ # Utility functions
│ ├── __init__.py
│ ├── api.py # API utilities
│ ├── cache.py # Caching utilities
│ ├── cleanup.py # Cleanup utilities
│ ├── database.py # Database utilities
│ ├── image.py # Image processing
│ ├── media_extractors.py # Media extraction
│ ├── retry.py # Retry mechanisms
│ ├── text.py # Text processing
│ └── video.py # Video processing
├── scripts/ # Startup scripts
│ ├── run_api.sh # API startup script
│ ├── run_rss_parser.sh # RSS parser startup script
│ └── run_telegram_bot.sh # Telegram bot startup script
├── di_container.py # Dependency injection container
├── requirements.txt # Python dependencies
├── .dockerignore # Docker ignore file
├── .env.example # Environment variables example
├── .gitignore # Git ignore file
├── CODE_OF_CONDUCT.md # Code of conduct
├── CONTRIBUTING.md # Contribution guidelines
├── docker-compose.yml # Docker compose configuration
├── Dockerfile # Docker image definition
├── LICENSE # License file
└── README.md # This file
This project is distributed under the MIT license. See LICENSE file for details.