Skip to content

feat: add Async worker to decouple API and core logic#69

Merged
aturret merged 5 commits intomainfrom
apq-refactor
Mar 22, 2026
Merged

feat: add Async worker to decouple API and core logic#69
aturret merged 5 commits intomainfrom
apq-refactor

Conversation

@aturret
Copy link
Owner

@aturret aturret commented Mar 22, 2026

Summary by CodeRabbit

  • New Features

    • Added an async worker service for background scraping/enrichment with optional Telegraph publishing and PDF export.
    • Added queue-based scrape mode in the Telegram bot using ARQ/Redis and a Redis-backed outbox for async result delivery.
  • Chores

    • Moved file export (PDF, audio transcription, video download) into a shared async/Celery-backed library and standardized usage across services.
    • Added packaging, container, and environment configs to support the async worker and queue mode; updated documentation.
  • Tests

    • Added extensive tests for the async worker, queue client, outbox consumer, and shared file-export services.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 49efddd9-d0dd-4d49-b995-fc4d1b862b7c

📥 Commits

Reviewing files that changed from the base of the PR and between fc0cc0a and 1a6c670.

📒 Files selected for processing (1)
  • packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py

📝 Walkthrough

Walkthrough

Adds an ARQ-based async worker service, refactors file-export logic into shared Celery-backed wrappers, and integrates queue mode into the Telegram bot with Redis-backed outbox and ARQ enqueueing; API app classes now delegate to shared implementations.

Changes

Cohort / File(s) Summary
Async worker app
apps/async-worker/Dockerfile, apps/async-worker/pyproject.toml, apps/async-worker/async_worker/*
New ARQ worker service: Dockerfile, pyproject, ARQ WorkerSettings, Redis parsing, Celery client, tasks (scrape_and_enrich), enrichment/outbox services, and config constants.
Shared file-export library
packages/shared/fastfetchbot_shared/services/file_export/*
New shared Celery-backed wrappers: AudioTranscribe, PdfExport, VideoDownloader, and HTML wrapping utilities; designed for dependency injection of celery_app and timeout.
API app thin adapters
apps/api/src/services/file_export/...
API-level file-export classes converted into thin subclasses that delegate to shared implementations and inject the API celery_app and timeout; S3 upload left in API layer.
Telegram bot queue integration
apps/telegram-bot/core/config.py, apps/telegram-bot/core/queue_client.py, apps/telegram-bot/core/services/outbox_consumer.py, apps/telegram-bot/core/services/bot_app.py, apps/telegram-bot/core/handlers/buttons.py, apps/telegram-bot/core/handlers/url_process.py, apps/telegram-bot/pyproject.toml
Adds SCRAPE_MODE (queue/api), ARQ Redis config, queue_client (enqueue/init/close), outbox consumer (BRPOP loop), lifecycle wiring in bot startup/shutdown, and handler changes to enqueue or process synchronously.
Docker / compose / env / IDE
docker-compose.template.yml, template.env, .idea/FastFetchBot.iml, .idea/runConfigurations/fullstack_polling_api.xml
Adds async-worker service to docker-compose, sets SCRAPE_MODE=queue defaults and Redis URLs in env template; adds async-worker to IDE module and renames run config.
Shared scrapers & InfoExtractService
packages/shared/fastfetchbot_shared/services/scrapers/common.py
InfoExtractService now lazy-resolves video scrapers (youtube/bilibili) and passes category when instantiating scrapers to support VideoDownloader.
Tests — async worker, outbox, queue, file-export, scrapers
tests/unit/async_worker/*, tests/unit/telegram_bot/*, tests/unit/file_export/*, tests/unit/scrapers/*, tests/unit/*/conftest.py
Extensive new unit tests for enrichment, outbox, scrape task, queue client, outbox consumer, shared file-export modules, and small test updates for scraper instantiation; test fixtures to expose new app packages.
Project dependency
pyproject.toml, apps/telegram-bot/pyproject.toml
Adds arq and redis[hiredis] dependencies at root and to telegram-bot project.
Docs
CLAUDE.md
Documentation updated to describe new async-worker component and redefined file-export module boundaries.

Sequence Diagrams

sequenceDiagram
    participant User as User (Telegram)
    participant Bot as Telegram Bot
    participant ARQ as ARQ Queue (Redis)
    participant Worker as Async Worker
    participant Celery as Celery Worker
    participant Outbox as Outbox (Redis)
    participant Consumer as Outbox Consumer

    User->>Bot: Send URL
    Bot->>ARQ: enqueue_scrape(url, chat_id)
    ARQ-->>Bot: job_id
    Bot-->>User: "Item queued for processing..."
    
    Worker->>ARQ: dequeue scrape_and_enrich
    Worker->>Celery: send_task(video_download)
    Celery-->>Worker: video metadata / file_path
    Worker->>Celery: send_task(pdf_export)
    Celery-->>Worker: pdf_url
    Worker->>Outbox: push(job_id, metadata_item)
    
    Consumer->>Outbox: BRPOP outbox_queue
    Outbox-->>Consumer: metadata_item payload
    Consumer->>Bot: send_item_message(metadata_item, chat_id)
    Bot-->>User: Deliver message to chat
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰
I hopped a queue down Redis lanes,
Enqueued some scrapes, then checked the chains,
Workers hummed and PDFs grew bright,
Outbox whispered back into the night,
A carrot-coded job—now everything’s alight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main architectural change: introducing an async worker component to separate API responsibilities from core scraping logic, which is the primary focus across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch apq-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 83.00221% with 77 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
apps/telegram-bot/core/handlers/url_process.py 8.33% 22 Missing ⚠️
...hbot_shared/services/file_export/video_download.py 84.78% 21 Missing ⚠️
apps/telegram-bot/core/handlers/buttons.py 0.00% 14 Missing ⚠️
apps/telegram-bot/core/services/bot_app.py 0.00% 12 Missing ⚠️
...ed/fastfetchbot_shared/services/scrapers/common.py 58.33% 5 Missing ⚠️
apps/telegram-bot/core/services/outbox_consumer.py 95.23% 3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #69   +/-   ##
=======================================
  Coverage        ?   71.74%           
=======================================
  Files           ?       62           
  Lines           ?     3072           
  Branches        ?        0           
=======================================
  Hits            ?     2204           
  Misses          ?      868           
  Partials        ?        0           
Files with missing lines Coverage Δ
apps/async-worker/async_worker/celery_client.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/config.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/main.py 100.00% <100.00%> (ø)
...s/async-worker/async_worker/services/enrichment.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/services/outbox.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/tasks/scrape.py 100.00% <100.00%> (ø)
apps/telegram-bot/core/config.py 90.41% <100.00%> (ø)
apps/telegram-bot/core/queue_client.py 100.00% <100.00%> (ø)
...ot_shared/services/file_export/audio_transcribe.py 100.00% <100.00%> (ø)
...fetchbot_shared/services/file_export/pdf_export.py 100.00% <100.00%> (ø)
... and 7 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
CLAUDE.md (1)

127-128: ⚠️ Potential issue | 🟡 Minor

Update uv version documentation to include the new async-worker Dockerfile.

The documentation mentions updating uv version in three Dockerfiles, but the new apps/async-worker/Dockerfile also pins uv to 0.10.4 (line 23). This should be included to keep all services in sync.

📝 Suggested documentation update
-> **uv version in Docker**: All three Dockerfiles pin uv to `0.10.4` via `COPY --from=ghcr.io/astral-sh/uv:0.10.4`.
-> To upgrade, update that tag in `apps/api/Dockerfile`, `apps/telegram-bot/Dockerfile`, and `apps/worker/Dockerfile`.
+> **uv version in Docker**: All four Dockerfiles pin uv to `0.10.4` via `COPY --from=ghcr.io/astral-sh/uv:0.10.4`.
+> To upgrade, update that tag in `apps/api/Dockerfile`, `apps/telegram-bot/Dockerfile`, `apps/worker/Dockerfile`, and `apps/async-worker/Dockerfile`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 127 - 128, Update the documentation note about pinned
uv versions to include the missing Dockerfile that also pins uv to 0.10.4:
search for the COPY line "COPY --from=ghcr.io/astral-sh/uv:0.10.4" in the
async-worker Dockerfile and add it to the list of files mentioned (so the doc
references the async-worker along with the api, telegram-bot, and worker
Dockerfiles); ensure the doc text explicitly instructs updating that tag in the
async-worker Dockerfile when bumping uv.
apps/telegram-bot/core/handlers/url_process.py (1)

100-120: ⚠️ Potential issue | 🟡 Minor

Missing message parameter may cause issues in API mode.

When SCRAPE_MODE is not "queue", the _fetch_and_send call at line 111 doesn't pass a message object, but send_item_message at line 79 expects it for reply context. This could cause the bot to lose the reply threading in API mode for this code path.

Suggested fix
             await _fetch_and_send(
                 url=url_metadata["url"],
                 chat_id=message.chat_id,
+                message=message,
                 source=url_metadata.get("source", ""),
                 content_type=url_metadata.get("content_type", ""),
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/handlers/url_process.py` around lines 100 - 120, The
code path calling _fetch_and_send in url_process.py doesn't pass the message
object, which breaks reply threading because send_item_message expects message
when SCRAPE_MODE != "queue"; update the _fetch_and_send(...) invocation inside
the "unknown" source branch to include the original message (pass
message=message) or conditionally forward message when SCRAPE_MODE != "queue",
ensuring _fetch_and_send and downstream send_item_message receive the message
for correct reply context.
🧹 Nitpick comments (9)
apps/async-worker/Dockerfile (2)

1-1: Minor: Remove leading blank line.

The Dockerfile starts with a blank line before the comment, which is unconventional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/Dockerfile` at line 1, Remove the leading blank line at the
top of the Dockerfile so the first line begins immediately with the existing
comment/instruction; open the Dockerfile and delete the initial empty line
before the first token to ensure the file starts with the intended content.

44-55: Consider adding a non-root user for improved security posture.

The container runs as root by default. While this may not be actively exploitable, running as a non-root user is a security best practice that reduces the blast radius if the container is compromised. No other services in the project use non-root users currently.

🔒 Optional: Add non-root user
 FROM python-base AS production
 ENV PYTHONPATH=/app/apps/async-worker:$PYTHONPATH
 RUN apt-get update \
     && apt-get install --no-install-recommends -y \
         ca-certificates \
         libffi-dev \
-    && rm -rf /var/lib/apt/lists/*
+    && rm -rf /var/lib/apt/lists/* \
+    && useradd --create-home --shell /bin/bash appuser
 COPY --from=builder-base $PYSETUP_PATH $PYSETUP_PATH
 COPY packages/ /app/packages/
 COPY apps/async-worker/ /app/apps/async-worker/
 WORKDIR /app/apps/async-worker
+USER appuser
 CMD ["arq", "async_worker.main.WorkerSettings"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/Dockerfile` around lines 44 - 55, Add a non-root user and
switch the container to run as that user to improve security: in the Dockerfile
create a group and user (e.g., appuser) with a fixed non-root UID/GID, set a
HOME, chown the application files under WORKDIR (/app/apps/async-worker) and any
copied paths ($PYSETUP_PATH, /app/packages/) to that user, and then add a USER
appuser line before the CMD ["arq", "async_worker.main.WorkerSettings"]; ensure
PYTHONPATH and WORKDIR remain correct and that file ownership is updated so the
non-root user can run the process.
tests/unit/file_export/test_pdf_export.py (2)

128-128: Unused variable from unpacking.

The kwargs variable is unpacked but never used. Prefix with underscore to indicate intentional discard.

♻️ Fix unused variable
-        args, kwargs = mock_celery.send_task.call_args
+        args, _kwargs = mock_celery.send_task.call_args
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/file_export/test_pdf_export.py` at line 128, The test unpacks
call_args into args and kwargs but never uses kwargs; change the unpack to
discard the unused dict by renaming it with a leading underscore (e.g., args,
_kwargs = mock_celery.send_task.call_args or args, _ =
mock_celery.send_task.call_args) so intent is clear and linter warnings are
avoided; update the unpack at the mock_celery.send_task.call_args usage
accordingly.

3-3: Remove unused import.

asyncio is imported but never used in this test file.

♻️ Remove unused import
-import asyncio
 from unittest.mock import MagicMock, patch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/file_export/test_pdf_export.py` at line 3, Remove the unused
top-level import of asyncio from the test file; simply delete the import
statement referencing asyncio so the module no longer contains an unused import
(look for the import asyncio line at the top of tests' PDF export test).
apps/telegram-bot/core/handlers/buttons.py (1)

67-67: Remove extraneous f-string prefixes.

These strings have no placeholders, so the f prefix is unnecessary.

♻️ Fix f-strings
-            text=f"Item queued for processing...",
+            text="Item queued for processing...",
-            text=f"Item processing...",
+            text="Item processing...",
-            text=f"Item processed. Sending to the target...",
+            text="Item processed. Sending to the target...",
-                text=f"Item sent to the channel.",
+                text="Item sent to the channel.",

Also applies to: 79-79, 85-85, 92-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/handlers/buttons.py` at line 67, Several message
strings in apps/telegram-bot/core/handlers/buttons.py are using unnecessary
f-string prefixes (e.g., text=f"Item queued for processing...") even though they
contain no placeholders; remove the leading "f" from those string literals so
they become plain strings. Locate the occurrences where the parameter name text
is set (the calls using text=f"...") — specifically the instances around the
current lines 67, 79, 85, and 92 — and change text=f"..." to text="..." for each
occurrence (no other logic changes required).
packages/shared/fastfetchbot_shared/services/scrapers/common.py (1)

72-78: Keep video scraper registration in the canonical registry.

This introduces a third resolution path outside service_classes/ScraperManager, which makes video support harder to discover and extend consistently. Prefer wiring the lazy video loader through the existing registry mechanism instead of special-casing categories here.

Based on learnings Register new platform scrapers in InfoExtractService.service_classes (in packages/shared/fastfetchbot_shared/services/scrapers/common.py) or ScraperManager (for scrapers needing lazy initialization).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py` around lines
72 - 78, The _resolve_scraper_class method currently special-cases "youtube" and
"bilibili" by calling _get_video_downloader, which bypasses the canonical
registry (service_classes/ScraperManager); remove that special-case and make
_resolve_scraper_class only consult self.service_classes and raise KeyError if
missing. Instead, register the lazy video downloader via the existing registry:
add entries for "youtube" and "bilibili" to InfoExtractService.service_classes
(or register them with ScraperManager if lazy init is required) mapping to the
_get_video_downloader factory/lazy wrapper so the registry, not
_resolve_scraper_class, controls video scraper resolution.
tests/unit/telegram_bot/test_outbox_consumer.py (1)

36-53: Cover message_id forwarding in the success-path test.

The outbox payload shape includes message_id, but this test only proves chat_id reaches send_item_message. Without a regression here, queue mode can silently stop replying in-thread while the suite still passes.

Also applies to: 92-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/telegram_bot/test_outbox_consumer.py` around lines 36 - 53, The
success-path test in tests/unit/telegram_bot/test_outbox_consumer.py doesn't
assert that message_id from the outbox payload is forwarded, only chat_id;
update the test(s) that use _make_payload (and the helper itself if needed) to
include a non-None message_id and add an assertion that send_item_message (the
mocked function) is called with that message_id (or that the consumer passed it
through to the API call), ensuring the payload's "message_id" field produced by
_make_payload is included in the forwarded arguments in the success path.
packages/shared/fastfetchbot_shared/services/file_export/video_download.py (1)

163-170: Use explicit | None type hints for optional parameters.

PEP 484 prohibits implicit Optional. Parameters with default None should use explicit union syntax.

Suggested fix
     async def get_video_info(
         self,
-        url: str = None,
-        download: bool = None,
-        extractor: str = None,
-        audio_only: bool = None,
-        hd: bool = None,
+        url: str | None = None,
+        download: bool | None = None,
+        extractor: str | None = None,
+        audio_only: bool | None = None,
+        hd: bool | None = None,
     ) -> dict:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/file_export/video_download.py`
around lines 163 - 170, The function get_video_info currently uses default None
for optional parameters without explicit union type hints; update its signature
to use explicit "X | None" annotations for each optional parameter (url: str |
None, download: bool | None, extractor: str | None, audio_only: bool | None, hd:
bool | None) so the type hints are PEP 484/PEP 604 compliant while keeping the
same default values and behavior.
apps/async-worker/async_worker/services/enrichment.py (1)

40-42: Simplify redundant condition.

The condition store_document or (not store_document and metadata_item.get("telegraph_url") == "") can be simplified. The second branch only triggers when store_document is False, so not store_document is always true there.

Suggested simplification
-    if store_document or (
-        not store_document and metadata_item.get("telegraph_url") == ""
-    ):
+    if store_document or metadata_item.get("telegraph_url") == "":
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/services/enrichment.py` around lines 40 - 42,
The if-condition in enrichment.py is redundant: replace the current condition
`if store_document or (not store_document and metadata_item.get("telegraph_url")
== "")` with the simplified equivalent `if store_document or
metadata_item.get("telegraph_url") == ""` so the branch triggers when either
storing is enabled or the telegraph_url is empty; update the conditional used in
the relevant function where `metadata_item` and `store_document` are evaluated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/services/file_export/document_export/pdf_export.py`:
- Line 25: The constructor signature in PDF export uses a nullable parameter
annotated as `html_string: str = None`, which triggers RUF013; change the
parameter annotation in the __init__ method of the class (the PDF export class
in pdf_export.py) to use PEP 604 union syntax: `html_string: str | None = None`
so the type is explicit and conforms to Python 3.12/Black formatting.

In `@apps/async-worker/async_worker/main.py`:
- Around line 22-32: The parse_redis_url function currently converts
parsed.path.lstrip("/") to int directly which will raise a ValueError for
non-numeric DB paths (e.g., redis://host/abc); update parse_redis_url to
validate or guard that value before constructing RedisSettings: extract db_str =
parsed.path.lstrip("/") or "0", attempt to convert to int inside a try/except,
and on ValueError either default to 0 or raise a clearer error with context
(include the original url/db_str in the message); ensure the change is applied
in parse_redis_url and the returned RedisSettings construction uses the
validated integer.

In `@apps/async-worker/async_worker/services/enrichment.py`:
- Around line 62-65: The code unconditionally accesses metadata_item["title"]
which can raise KeyError or TypeError; update the logic around metadata_item (in
enrichment.py where metadata_item["title"] is set) to safely handle missing or
non-string titles by checking "title" in metadata_item or using
metadata_item.get("title", "") and only calling .strip() when the value is a
string (fallback to an empty string or None as appropriate), ensuring no
exception is raised if the key is absent or value is not a string.

In `@apps/async-worker/async_worker/tasks/scrape.py`:
- Line 41: Replace the direct logging of the untrusted `url` in the logger.info
call with a sanitized version: parse the `url` (e.g., using
urllib.parse.urlparse), drop or mask the query and fragment components (and any
bearer/token-like values), then log `job_id`, `source`, and the sanitized URL;
update the logger.info at the site where `logger.info(f"[{job_id}] Starting
scrape: url={url}, source={source}")` is called so it uses the sanitized
variable instead of the raw `url`.
- Around line 43-75: The current try block mixes scraping/enrichment with
delivery so failures in outbox.push() can mark the whole job as a scrape
failure; split delivery into its own error handling: keep the
scraping/enrichment sequence (UrlMetadata, InfoExtractService.get_item,
enrichment.enrich) in the original try and return success once metadata_item is
produced, then perform outbox.push(job_id, chat_id, message_id, metadata_item)
in a separate try/except that logs delivery failures and attempts the fallback
outbox.push(error=...) without changing the returned scrape status; apply the
same separation to the other delivery call around the block that spans the code
referenced at lines 77-89 so outbox.push failures never override a successful
scrape (refer to outbox.push, service.get_item, enrichment.enrich, and job_id to
locate the spots).

In `@apps/telegram-bot/core/handlers/buttons.py`:
- Around line 66-76: Wrap the enqueue call in a try/except around
queue_client.enqueue_scrape in the handler so failures are caught and the user
is notified instead of silently deleting the "Item queued for processing..."
message; use the existing replying_message (created via
query.message.reply_text) and, on exception, update or replace it with an error
message that includes brief context, log the exception, and only delete the
success message when enqueue_scrape completes successfully. Ensure you catch
broad exceptions but log the stack/exception for debugging.

In `@apps/telegram-bot/core/services/outbox_consumer.py`:
- Around line 61-71: The _send_error_to_chat function lacks a guard for a None
chat_id; update _send_error_to_chat to check if chat_id is None (or empty) and
return early (or log a warning) instead of calling application.bot.send_message
with None, ensuring you still catch and log other exceptions; reference the
_send_error_to_chat function and the caller that extracts chat_id via .get() so
reviewers can locate the calling site and validate the guard.

In `@packages/shared/fastfetchbot_shared/services/file_export/pdf_export.py`:
- Around line 40-48: The filename is built from raw self.title (output_filename
= f"{self.title}-{uuid.uuid4()}.pdf") which can include path separators or
unsafe characters; sanitize self.title before constructing output_filename by
stripping or replacing path separators (/, \), control characters and other
disallowed filename chars, collapsing whitespace (e.g., to underscores), and
truncating to a safe length, then use the sanitized value in output_filename;
update the code around output_filename creation in pdf_export.py (the place that
references self.title and output_filename) to perform this sanitization before
sending the Celery task.

In `@packages/shared/fastfetchbot_shared/services/file_export/video_download.py`:
- Around line 70-78: get_video currently indexes video_info_funcs with
self.extractor which can KeyError for unsupported values; update get_video to
validate self.extractor against the allowed keys (e.g., "youtube", "bilibili")
before lookup, and either provide a clear fallback handler or raise a
descriptive error. Specifically, check self.extractor exists in
video_info_funcs, select the parser via video_info_funcs[self.extractor] only
after validation, and keep calling the chosen parser (_youtube_info_parse or
_bilibili_info_parse) and then _video_info_formatting; ensure the failure path
returns or raises a meaningful exception instead of allowing a raw KeyError.
- Around line 96-104: The nested helper _get_redirected_url inside _parse_url
returns mixed types (httpx.URL vs str); change it to always return a string by
converting resp.url to str (e.g., str(resp.url)) when status is 200 and use
resp.headers.get("Location") (or cast that to str) for 302, and ensure the
function returns a string in all branches so callers of _get_redirected_url and
_parse_url receive a consistent type.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py`:
- Around line 88-90: The instantiation of scraper classes (scraper_cls =
self._resolve_scraper_class(...); scraper_item = scraper_cls(url=self.url,
category=self.category, data=self.data, **self.kwargs)) passes data and **kwargs
to all scrapers, which breaks dataclass-based scrapers like Weibo and Bluesky
(subclasses of MetadataItem) that only accept declared fields; change the
instantiation logic in the calling site to detect whether scraper_cls accepts
extra kwargs (e.g., use inspect.signature to see if **kwargs or a data parameter
is present) or whether scraper_cls is a subclass of MetadataItem, and if so call
it with only the allowed fields (url, category and other declared MetadataItem
fields) instead of passing data/**kwargs; alternatively, attempt instantiation
and on TypeError retry by creating the object with the minimal allowed
parameters for MetadataItem (url, category, telegraph_url, content, text,
media_files, author, title, author_url, message_type).

In `@pyproject.toml`:
- Line 47: The dependency entry "arq>=0.27.0" in pyproject.toml lacks an
explicit upper bound; update that entry to follow the project's dependency
policy by pinning an upper bound (e.g., change "arq>=0.27.0" to
"arq>=0.27.0,<0.28.0") so it matches the other runtime dependencies and prevents
unintended upgrades.

In `@template.env`:
- Around line 167-176: The SCRAPE_MODE default is inconsistent: template.env
sets SCRAPE_MODE=api but the application default in
apps/telegram-bot/core/config.py falls back to "queue" (e.g., where SCRAPE_MODE
is loaded via get_env or Config.SCRAPE_MODE with default "queue"); pick a single
safe default (recommend "api") and change the fallback in config.py to "api" (or
change template.env to "queue" if you prefer the other behavior), and update any
associated comments/docs so both the env template and the code use the same
SCRAPE_MODE default.

In `@tests/unit/async_worker/test_enrichment.py`:
- Around line 187-197: In test_uses_config_defaults_when_none move the
mock_telegraph unpacking before calling enrich so the fixture/patch is active
during the call and drop the unused instance variable; e.g. unpack
mock_telegraph into (MockTg, _) or (MockTg, instance) at the top of the test,
then call await enrich(base_metadata_item) and assert
MockTg.from_dict.assert_called_once(); ensure you still patch
STORE_TELEGRAPH/STORE_DOCUMENT as currently done.

In `@tests/unit/file_export/test_pdf_export.py`:
- Around line 110-116: The test currently asserts mock_celery.send_task against
values extracted from the same call (mock_celery.send_task.call_args), which is
circular and always passes; update the assertion in the test_pdf_export test to
compare the send_task call to concrete expected values (or to variables computed
earlier in the test) instead of pulling values from call_args (i.e., assert
mock_celery.send_task.assert_called_once_with("file_export.pdf_export",
kwargs={"html_string": expected_html_string, "output_filename":
expected_output_filename})), referencing the mock_celery.send_task call and the
expected HTML/output filename variables used in the test.

---

Outside diff comments:
In `@apps/telegram-bot/core/handlers/url_process.py`:
- Around line 100-120: The code path calling _fetch_and_send in url_process.py
doesn't pass the message object, which breaks reply threading because
send_item_message expects message when SCRAPE_MODE != "queue"; update the
_fetch_and_send(...) invocation inside the "unknown" source branch to include
the original message (pass message=message) or conditionally forward message
when SCRAPE_MODE != "queue", ensuring _fetch_and_send and downstream
send_item_message receive the message for correct reply context.

In `@CLAUDE.md`:
- Around line 127-128: Update the documentation note about pinned uv versions to
include the missing Dockerfile that also pins uv to 0.10.4: search for the COPY
line "COPY --from=ghcr.io/astral-sh/uv:0.10.4" in the async-worker Dockerfile
and add it to the list of files mentioned (so the doc references the
async-worker along with the api, telegram-bot, and worker Dockerfiles); ensure
the doc text explicitly instructs updating that tag in the async-worker
Dockerfile when bumping uv.

---

Nitpick comments:
In `@apps/async-worker/async_worker/services/enrichment.py`:
- Around line 40-42: The if-condition in enrichment.py is redundant: replace the
current condition `if store_document or (not store_document and
metadata_item.get("telegraph_url") == "")` with the simplified equivalent `if
store_document or metadata_item.get("telegraph_url") == ""` so the branch
triggers when either storing is enabled or the telegraph_url is empty; update
the conditional used in the relevant function where `metadata_item` and
`store_document` are evaluated.

In `@apps/async-worker/Dockerfile`:
- Line 1: Remove the leading blank line at the top of the Dockerfile so the
first line begins immediately with the existing comment/instruction; open the
Dockerfile and delete the initial empty line before the first token to ensure
the file starts with the intended content.
- Around line 44-55: Add a non-root user and switch the container to run as that
user to improve security: in the Dockerfile create a group and user (e.g.,
appuser) with a fixed non-root UID/GID, set a HOME, chown the application files
under WORKDIR (/app/apps/async-worker) and any copied paths ($PYSETUP_PATH,
/app/packages/) to that user, and then add a USER appuser line before the CMD
["arq", "async_worker.main.WorkerSettings"]; ensure PYTHONPATH and WORKDIR
remain correct and that file ownership is updated so the non-root user can run
the process.

In `@apps/telegram-bot/core/handlers/buttons.py`:
- Line 67: Several message strings in apps/telegram-bot/core/handlers/buttons.py
are using unnecessary f-string prefixes (e.g., text=f"Item queued for
processing...") even though they contain no placeholders; remove the leading "f"
from those string literals so they become plain strings. Locate the occurrences
where the parameter name text is set (the calls using text=f"...") —
specifically the instances around the current lines 67, 79, 85, and 92 — and
change text=f"..." to text="..." for each occurrence (no other logic changes
required).

In `@packages/shared/fastfetchbot_shared/services/file_export/video_download.py`:
- Around line 163-170: The function get_video_info currently uses default None
for optional parameters without explicit union type hints; update its signature
to use explicit "X | None" annotations for each optional parameter (url: str |
None, download: bool | None, extractor: str | None, audio_only: bool | None, hd:
bool | None) so the type hints are PEP 484/PEP 604 compliant while keeping the
same default values and behavior.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py`:
- Around line 72-78: The _resolve_scraper_class method currently special-cases
"youtube" and "bilibili" by calling _get_video_downloader, which bypasses the
canonical registry (service_classes/ScraperManager); remove that special-case
and make _resolve_scraper_class only consult self.service_classes and raise
KeyError if missing. Instead, register the lazy video downloader via the
existing registry: add entries for "youtube" and "bilibili" to
InfoExtractService.service_classes (or register them with ScraperManager if lazy
init is required) mapping to the _get_video_downloader factory/lazy wrapper so
the registry, not _resolve_scraper_class, controls video scraper resolution.

In `@tests/unit/file_export/test_pdf_export.py`:
- Line 128: The test unpacks call_args into args and kwargs but never uses
kwargs; change the unpack to discard the unused dict by renaming it with a
leading underscore (e.g., args, _kwargs = mock_celery.send_task.call_args or
args, _ = mock_celery.send_task.call_args) so intent is clear and linter
warnings are avoided; update the unpack at the mock_celery.send_task.call_args
usage accordingly.
- Line 3: Remove the unused top-level import of asyncio from the test file;
simply delete the import statement referencing asyncio so the module no longer
contains an unused import (look for the import asyncio line at the top of tests'
PDF export test).

In `@tests/unit/telegram_bot/test_outbox_consumer.py`:
- Around line 36-53: The success-path test in
tests/unit/telegram_bot/test_outbox_consumer.py doesn't assert that message_id
from the outbox payload is forwarded, only chat_id; update the test(s) that use
_make_payload (and the helper itself if needed) to include a non-None message_id
and add an assertion that send_item_message (the mocked function) is called with
that message_id (or that the consumer passed it through to the API call),
ensuring the payload's "message_id" field produced by _make_payload is included
in the forwarded arguments in the success path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c1a7bdb6-3c96-4a8e-a338-3df323e0c7c6

📥 Commits

Reviewing files that changed from the base of the PR and between a0e2228 and fc0cc0a.

⛔ Files ignored due to path filters (2)
  • apps/worker/celerybeat-schedule.db is excluded by !**/*.db
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (47)
  • .idea/FastFetchBot.iml
  • .idea/runConfigurations/fullstack_polling_api.xml
  • CLAUDE.md
  • apps/api/src/services/file_export/audio_transcribe/__init__.py
  • apps/api/src/services/file_export/document_export/pdf_export.py
  • apps/api/src/services/file_export/video_download/__init__.py
  • apps/async-worker/Dockerfile
  • apps/async-worker/async_worker/__init__.py
  • apps/async-worker/async_worker/celery_client.py
  • apps/async-worker/async_worker/config.py
  • apps/async-worker/async_worker/main.py
  • apps/async-worker/async_worker/services/__init__.py
  • apps/async-worker/async_worker/services/enrichment.py
  • apps/async-worker/async_worker/services/outbox.py
  • apps/async-worker/async_worker/tasks/__init__.py
  • apps/async-worker/async_worker/tasks/scrape.py
  • apps/async-worker/pyproject.toml
  • apps/telegram-bot/core/config.py
  • apps/telegram-bot/core/handlers/buttons.py
  • apps/telegram-bot/core/handlers/url_process.py
  • apps/telegram-bot/core/queue_client.py
  • apps/telegram-bot/core/services/bot_app.py
  • apps/telegram-bot/core/services/outbox_consumer.py
  • apps/telegram-bot/pyproject.toml
  • docker-compose.template.yml
  • packages/shared/fastfetchbot_shared/services/file_export/__init__.py
  • packages/shared/fastfetchbot_shared/services/file_export/audio_transcribe.py
  • packages/shared/fastfetchbot_shared/services/file_export/pdf_export.py
  • packages/shared/fastfetchbot_shared/services/file_export/video_download.py
  • packages/shared/fastfetchbot_shared/services/scrapers/common.py
  • pyproject.toml
  • template.env
  • tests/unit/async_worker/__init__.py
  • tests/unit/async_worker/conftest.py
  • tests/unit/async_worker/test_enrichment.py
  • tests/unit/async_worker/test_main.py
  • tests/unit/async_worker/test_outbox.py
  • tests/unit/async_worker/test_scrape_task.py
  • tests/unit/file_export/__init__.py
  • tests/unit/file_export/test_audio_transcribe.py
  • tests/unit/file_export/test_pdf_export.py
  • tests/unit/file_export/test_video_download.py
  • tests/unit/scrapers/test_common.py
  • tests/unit/telegram_bot/__init__.py
  • tests/unit/telegram_bot/conftest.py
  • tests/unit/telegram_bot/test_outbox_consumer.py
  • tests/unit/telegram_bot/test_queue_client.py

class PdfExport(BasePdfExport):
"""API PDF export that adds optional S3 upload after Celery PDF generation."""

def __init__(self, title: str, html_string: str = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if file exists and examine the context around line 25
if [ -f "apps/api/src/services/file_export/document_export/pdf_export.py" ]; then
  echo "=== File found. Examining lines 20-30 ==="
  cat -n apps/api/src/services/file_export/document_export/pdf_export.py | sed -n '20,30p'
else
  echo "File not found at expected path"
  find . -name "pdf_export.py" -type f
fi

Repository: aturret/FastFetchBot

Length of output: 528


🏁 Script executed:

#!/bin/bash
# Check Python version requirement to understand if str | None is supported
if [ -f "setup.py" ] || [ -f "pyproject.toml" ] || [ -f ".python-version" ]; then
  echo "=== Checking Python version requirements ==="
  [ -f "setup.py" ] && head -30 setup.py | grep -i python
  [ -f "pyproject.toml" ] && cat pyproject.toml | grep -A 5 "python"
  [ -f ".python-version" ] && cat .python-version
else
  echo "Checking for version hints in common config files"
  ls -la | grep -E "(setup|requirements|pyproject|\.python)"
fi

Repository: aturret/FastFetchBot

Length of output: 851


🏁 Script executed:

#!/bin/bash
# Check imports and overall structure of the file
if [ -f "apps/api/src/services/file_export/document_export/pdf_export.py" ]; then
  echo "=== First 50 lines of file ==="
  head -50 apps/api/src/services/file_export/document_export/pdf_export.py
fi

Repository: aturret/FastFetchBot

Length of output: 1420


🏁 Script executed:

#!/bin/bash
# Check the parent class BasePdfExport to ensure signature consistency
find . -name "pdf_export.py" -type f | xargs grep -l "class.*BasePdfExport\|class PdfExport" | grep -v "__pycache__"

Repository: aturret/FastFetchBot

Length of output: 202


🏁 Script executed:

#!/bin/bash
# Search for other potential implicit Optional issues in this file
if [ -f "apps/api/src/services/file_export/document_export/pdf_export.py" ]; then
  echo "=== Full file content ==="
  cat apps/api/src/services/file_export/document_export/pdf_export.py
fi

Repository: aturret/FastFetchBot

Length of output: 1415


🏁 Script executed:

#!/bin/bash
# Check the parent class BasePdfExport signature
if [ -f "packages/shared/fastfetchbot_shared/services/file_export/pdf_export.py" ]; then
  echo "=== BasePdfExport class definition ==="
  cat packages/shared/fastfetchbot_shared/services/file_export/pdf_export.py
fi

Repository: aturret/FastFetchBot

Length of output: 2745


Use explicit Optional type annotation for html_string.

Line 25 has html_string: str = None which triggers Ruff RUF013. Since the project targets Python 3.12, use str | None instead to follow PEP 604 union syntax and satisfy black formatting standards.

✏️ Suggested change
-    def __init__(self, title: str, html_string: str = None):
+    def __init__(self, title: str, html_string: str | None = None):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, title: str, html_string: str = None):
def __init__(self, title: str, html_string: str | None = None):
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 25-25: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/services/file_export/document_export/pdf_export.py` at line 25,
The constructor signature in PDF export uses a nullable parameter annotated as
`html_string: str = None`, which triggers RUF013; change the parameter
annotation in the __init__ method of the class (the PDF export class in
pdf_export.py) to use PEP 604 union syntax: `html_string: str | None = None` so
the type is explicit and conforms to Python 3.12/Black formatting.

Comment on lines +22 to +32
def parse_redis_url(url: str) -> RedisSettings:
"""Parse a redis:// URL into ARQ RedisSettings."""
from urllib.parse import urlparse

parsed = urlparse(url)
return RedisSettings(
host=parsed.hostname or "localhost",
port=parsed.port or 6379,
database=int(parsed.path.lstrip("/") or 0),
password=parsed.password,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential ValueError on malformed database path.

If the Redis URL contains a non-numeric database path (e.g., redis://host/abc), int(parsed.path.lstrip("/") or 0) will raise a ValueError. Consider adding error handling or validation.

Suggested defensive fix
 def parse_redis_url(url: str) -> RedisSettings:
     """Parse a redis:// URL into ARQ RedisSettings."""
     from urllib.parse import urlparse

     parsed = urlparse(url)
+    db_str = parsed.path.lstrip("/")
+    try:
+        database = int(db_str) if db_str else 0
+    except ValueError:
+        database = 0
     return RedisSettings(
         host=parsed.hostname or "localhost",
         port=parsed.port or 6379,
-        database=int(parsed.path.lstrip("/") or 0),
+        database=database,
         password=parsed.password,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/main.py` around lines 22 - 32, The
parse_redis_url function currently converts parsed.path.lstrip("/") to int
directly which will raise a ValueError for non-numeric DB paths (e.g.,
redis://host/abc); update parse_redis_url to validate or guard that value before
constructing RedisSettings: extract db_str = parsed.path.lstrip("/") or "0",
attempt to convert to int inside a try/except, and on ValueError either default
to 0 or raise a clearer error with context (include the original url/db_str in
the message); ensure the change is applied in parse_redis_url and the returned
RedisSettings construction uses the validated integer.

Comment on lines +62 to +65
except Exception as e:
logger.error(f"Error exporting PDF: {e}")

metadata_item["title"] = metadata_item["title"].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against missing title key.

Line 65 accesses metadata_item["title"] directly, which will raise KeyError if the key is missing. Consider using .get() with a default or checking for presence first.

Suggested fix
-    metadata_item["title"] = metadata_item["title"].strip()
+    if "title" in metadata_item and metadata_item["title"]:
+        metadata_item["title"] = metadata_item["title"].strip()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.error(f"Error exporting PDF: {e}")
metadata_item["title"] = metadata_item["title"].strip()
except Exception as e:
logger.error(f"Error exporting PDF: {e}")
if "title" in metadata_item and metadata_item["title"]:
metadata_item["title"] = metadata_item["title"].strip()
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 62-62: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/services/enrichment.py` around lines 62 - 65,
The code unconditionally accesses metadata_item["title"] which can raise
KeyError or TypeError; update the logic around metadata_item (in enrichment.py
where metadata_item["title"] is set) to safely handle missing or non-string
titles by checking "title" in metadata_item or using metadata_item.get("title",
"") and only calling .strip() when the value is a string (fallback to an empty
string or None as appropriate), ensuring no exception is raised if the key is
absent or value is not a string.

if job_id is None:
job_id = str(uuid.uuid4())

logger.info(f"[{job_id}] Starting scrape: url={url}, source={source}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact the user-submitted URL before logging it.

url is untrusted input, so this info log can persist signed query params or bearer-style tokens from private links. Log job_id/source plus a sanitized URL instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/tasks/scrape.py` at line 41, Replace the
direct logging of the untrusted `url` in the logger.info call with a sanitized
version: parse the `url` (e.g., using urllib.parse.urlparse), drop or mask the
query and fragment components (and any bearer/token-like values), then log
`job_id`, `source`, and the sanitized URL; update the logger.info at the site
where `logger.info(f"[{job_id}] Starting scrape: url={url}, source={source}")`
is called so it uses the sanitized variable instead of the raw `url`.

Comment on lines +43 to +75
try:
# Build UrlMetadata and scrape
url_metadata = UrlMetadata(
url=url, source=source, content_type=content_type
)
service = InfoExtractService(
url_metadata=url_metadata,
store_telegraph=False, # We handle enrichment separately
store_document=False,
celery_app=celery_app,
timeout=DOWNLOAD_VIDEO_TIMEOUT,
**kwargs,
)
metadata_item = await service.get_item()

# Enrich: Telegraph, PDF
metadata_item = await enrichment.enrich(
metadata_item,
store_telegraph=store_telegraph,
store_document=store_document,
)

logger.info(f"[{job_id}] Scrape completed successfully")

# Push to outbox
await outbox.push(
job_id=job_id,
chat_id=chat_id,
message_id=message_id,
metadata_item=metadata_item,
)

return {"job_id": job_id, "status": "success"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not treat outbox delivery failures as scrape failures.

outbox.push() is inside the same try as scraping and enrichment. If Redis flakes after a successful scrape, this path reports the job as a scrape error and the fallback outbox.push(error=...) can mask the original result entirely. Split delivery into its own error handling so completed work is not misclassified. Worth adding a unit test that makes outbox.push() fail after get_item() succeeds.

Also applies to: 77-89

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/tasks/scrape.py` around lines 43 - 75, The
current try block mixes scraping/enrichment with delivery so failures in
outbox.push() can mark the whole job as a scrape failure; split delivery into
its own error handling: keep the scraping/enrichment sequence (UrlMetadata,
InfoExtractService.get_item, enrichment.enrich) in the original try and return
success once metadata_item is produced, then perform outbox.push(job_id,
chat_id, message_id, metadata_item) in a separate try/except that logs delivery
failures and attempts the fallback outbox.push(error=...) without changing the
returned scrape status; apply the same separation to the other delivery call
around the block that spans the code referenced at lines 77-89 so outbox.push
failures never override a successful scrape (refer to outbox.push,
service.get_item, enrichment.enrich, and job_id to locate the spots).

Comment on lines +88 to +90
scraper_cls = self._resolve_scraper_class(self.category)
scraper_item = scraper_cls(
url=self.url, category=self.category, data=self.data, **self.kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path

targets = {
    "twitter.py": "Twitter",
    "threads.py": "Threads",
    "reddit.py": "Reddit",
    "weibo.py": "Weibo",
    "wechat.py": "Wechat",
    "instagram.py": "Instagram",
    "douban.py": "Douban",
    "zhihu.py": "Zhihu",
    "xiaohongshu.py": "Xiaohongshu",
}

root = Path("packages/shared/fastfetchbot_shared/services/scrapers")
problems = []

for filename, class_name in targets.items():
    path = root / filename
    if not path.exists():
        problems.append(f"{filename}: file not found")
        continue

    tree = ast.parse(path.read_text())
    cls = next(
        (node for node in tree.body if isinstance(node, ast.ClassDef) and node.name == class_name),
        None,
    )
    if cls is None:
        problems.append(f"{path}: class {class_name} not found")
        continue

    init = next(
        (node for node in cls.body if isinstance(node, ast.FunctionDef) and node.name == "__init__"),
        None,
    )
    if init is None:
        print(f"{path}: {class_name} inherits __init__; verify its base class accepts category=...")
        continue

    args = [arg.arg for arg in init.args.args]
    has_category = "category" in args
    has_kwargs = init.args.kwarg is not None

    print(f"{path}: {class_name}.__init__ args={args} kwargs={has_kwargs}")
    if not (has_category or has_kwargs):
        problems.append(f"{path}: {class_name}.__init__ accepts neither category nor **kwargs")

if problems:
    print("\nPotential incompatibilities:")
    for problem in problems:
        print(f" - {problem}")
PY

Repository: aturret/FastFetchBot

Length of output: 362


🏁 Script executed:

#!/bin/bash
# Find the actual scraper files in the repository
find packages/shared/fastfetchbot_shared/services/scrapers -type f -name "*.py" | head -20

Repository: aturret/FastFetchBot

Length of output: 1510


🏁 Script executed:

#!/bin/bash
# Also check the structure of the scrapers directory
ls -la packages/shared/fastfetchbot_shared/services/scrapers/ 2>/dev/null || echo "Directory not found at expected path"

Repository: aturret/FastFetchBot

Length of output: 1186


🏁 Script executed:

#!/bin/bash
# Search for scraper class definitions in the entire codebase
rg "class.*\(.*Scraper" --type py -A 3 packages/shared/fastfetchbot_shared/services/scrapers/

Repository: aturret/FastFetchBot

Length of output: 2599


🏁 Script executed:

#!/bin/bash
# Read the base Scraper class
cat -n packages/shared/fastfetchbot_shared/services/scrapers/scraper.py

Repository: aturret/FastFetchBot

Length of output: 506


🏁 Script executed:

#!/bin/bash
# Check the call site in common.py around lines 88-90
cat -n packages/shared/fastfetchbot_shared/services/scrapers/common.py | sed -n '80,100p'

Repository: aturret/FastFetchBot

Length of output: 1295


🏁 Script executed:

#!/bin/bash
# Extract all scraper __init__ signatures using ast-grep
ast-grep --pattern 'class $_ {
  $$$
  def __init__($$$) {
    $$$
  }
  $$$
}'

Repository: aturret/FastFetchBot

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Use Python to inspect all scraper __init__ signatures
python3 << 'EOF'
import ast
from pathlib import Path

scrapers_dir = Path("packages/shared/fastfetchbot_shared/services/scrapers")
scrapers = []

# Check subdirectories for scraper.py files
for subdir in scrapers_dir.iterdir():
    if subdir.is_dir() and subdir.name not in ['__pycache__', 'templates']:
        scraper_file = subdir / 'scraper.py'
        if scraper_file.exists():
            try:
                tree = ast.parse(scraper_file.read_text())
                for node in tree.body:
                    if isinstance(node, ast.ClassDef):
                        init_method = next(
                            (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                            None
                        )
                        args = [arg.arg for arg in node.args.args] if init_method else None
                        if init_method:
                            args = [arg.arg for arg in init_method.args.args]
                            has_kwargs = init_method.args.kwarg is not None
                            scrapers.append({
                                'file': str(scraper_file),
                                'class': node.name,
                                'args': args,
                                'has_kwargs': has_kwargs,
                                'has_category': 'category' in args
                            })
                        else:
                            scrapers.append({
                                'file': str(scraper_file),
                                'class': node.name,
                                'args': 'inherits __init__',
                                'has_kwargs': None,
                                'has_category': None
                            })
            except Exception as e:
                print(f"Error parsing {scraper_file}: {e}")

# Print results
for s in scrapers:
    print(f"{s['file']}: {s['class']}")
    if isinstance(s['args'], list):
        print(f"  args: {s['args']}")
        print(f"  has_category: {s['has_category']}, has_kwargs: {s['has_kwargs']}")
    else:
        print(f"  {s['args']}")
    print()
EOF

Repository: aturret/FastFetchBot

Length of output: 451


🏁 Script executed:

#!/bin/bash
# Corrected Python script to inspect all scraper __init__ signatures
python3 << 'EOF'
import ast
from pathlib import Path

scrapers_dir = Path("packages/shared/fastfetchbot_shared/services/scrapers")
scrapers = []

# Check subdirectories for scraper.py files
for subdir in sorted(scrapers_dir.iterdir()):
    if subdir.is_dir() and subdir.name not in ['__pycache__', 'templates', 'general']:
        scraper_file = subdir / 'scraper.py'
        if scraper_file.exists():
            try:
                tree = ast.parse(scraper_file.read_text())
                for node in tree.body:
                    if isinstance(node, ast.ClassDef):
                        init_method = next(
                            (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                            None
                        )
                        if init_method:
                            args = [arg.arg for arg in init_method.args.args]
                            has_kwargs = init_method.args.kwarg is not None
                            has_category = 'category' in args
                            print(f"{subdir.name}/{node.name}")
                            print(f"  args: {args}")
                            print(f"  has_category: {has_category}, has_kwargs: {has_kwargs}")
                            if not (has_category or has_kwargs):
                                print(f"  ⚠️  INCOMPATIBLE: neither category nor **kwargs")
                        else:
                            print(f"{subdir.name}/{node.name}")
                            print(f"  inherits __init__ (check base class)")
                        print()
            except Exception as e:
                print(f"Error parsing {scraper_file}: {e}\n")

# Also check general scrapers
general_dir = scrapers_dir / 'general'
for py_file in sorted(general_dir.glob('*.py')):
    if py_file.name in ['scraper.py', 'base.py', 'firecrawl.py', 'zyte.py']:
        try:
            tree = ast.parse(py_file.read_text())
            for node in tree.body:
                if isinstance(node, ast.ClassDef):
                    init_method = next(
                        (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                        None
                    )
                    if init_method:
                        args = [arg.arg for arg in init_method.args.args]
                        has_kwargs = init_method.args.kwarg is not None
                        has_category = 'category' in args
                        print(f"general/{py_file.name}: {node.name}")
                        print(f"  args: {args}")
                        print(f"  has_category: {has_category}, has_kwargs: {has_kwargs}")
                        if not (has_category or has_kwargs):
                            print(f"  ⚠️  INCOMPATIBLE: neither category nor **kwargs")
                    else:
                        print(f"general/{py_file.name}: {node.name}")
                        print(f"  inherits __init__ (check base class)")
                    print()
        except Exception as e:
            print(f"Error parsing {py_file}: {e}\n")
EOF

Repository: aturret/FastFetchBot

Length of output: 1643


🏁 Script executed:

#!/bin/bash
# Check what base classes the scrapers inherit from
rg "class.*Scraper\(" packages/shared/fastfetchbot_shared/services/scrapers/ --type py -A 1 | head -60

Repository: aturret/FastFetchBot

Length of output: 1248


🏁 Script executed:

#!/bin/bash
# Let's look at a few specific scrapers to understand their structure
for scraper in bluesky twitter threads reddit; do
  echo "=== $scraper ===="
  if [ -f "packages/shared/fastfetchbot_shared/services/scrapers/$scraper/scraper.py" ]; then
    head -30 "packages/shared/fastfetchbot_shared/services/scrapers/$scraper/scraper.py"
  fi
  echo ""
done

Repository: aturret/FastFetchBot

Length of output: 1529


🏁 Script executed:

#!/bin/bash
# Check all remaining scrapers' __init__ signatures
python3 << 'EOF'
import ast
from pathlib import Path

scrapers_dir = Path("packages/shared/fastfetchbot_shared/services/scrapers")
scrapers_to_check = ['twitter', 'threads', 'reddit', 'instagram', 'douban', 'wechat', 'xiaohongshu', 'zhihu']

print("Checking scraper __init__ signatures:\n")

for scraper_name in scrapers_to_check:
    scraper_file = scrapers_dir / scraper_name / 'scraper.py'
    if scraper_file.exists():
        try:
            tree = ast.parse(scraper_file.read_text())
            for node in tree.body:
                if isinstance(node, ast.ClassDef):
                    init_method = next(
                        (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                        None
                    )
                    if init_method:
                        args = [arg.arg for arg in init_method.args.args]
                        has_kwargs = init_method.args.kwarg is not None
                        has_category = 'category' in args
                        print(f"{scraper_name}: {node.name}")
                        print(f"  args: {args}, kwargs: {has_kwargs}")
                        if not (has_category or has_kwargs):
                            print(f"  ❌ INCOMPATIBLE")
                    else:
                        print(f"{scraper_name}: {node.name}")
                        print(f"  inherits __init__ (base: Scraper - no __init__)")
                        print(f"  ❌ INCOMPATIBLE")
                    print()
        except Exception as e:
            print(f"Error: {scraper_file}: {e}\n")
    else:
        print(f"{scraper_name}: scraper.py not found\n")
EOF

Repository: aturret/FastFetchBot

Length of output: 351


🏁 Script executed:

#!/bin/bash
# Let's also check what _resolve_scraper_class returns to understand the full scope
grep -n "_resolve_scraper_class" packages/shared/fastfetchbot_shared/services/scrapers/common.py -A 10

Repository: aturret/FastFetchBot

Length of output: 1237


🏁 Script executed:

#!/bin/bash
# Check what files are in the scraper directories
for dir in twitter threads reddit instagram douban wechat xiaohongshu zhihu; do
  echo "=== $dir ==="
  ls -la packages/shared/fastfetchbot_shared/services/scrapers/$dir/
  echo ""
done

Repository: aturret/FastFetchBot

Length of output: 1936


🏁 Script executed:

#!/bin/bash
# Check service_classes registration in common.py
cat -n packages/shared/fastfetchbot_shared/services/scrapers/common.py | head -75

Repository: aturret/FastFetchBot

Length of output: 3177


🏁 Script executed:

#!/bin/bash
# Find where service_classes is defined
grep -rn "service_classes" packages/shared/fastfetchbot_shared/services/scrapers/ --type py

Repository: aturret/FastFetchBot

Length of output: 181


🏁 Script executed:

#!/bin/bash
# Check the __init__ signatures of the actual scraper classes defined in __init__.py files
python3 << 'EOF'
import ast
from pathlib import Path

scrapers_dir = Path("packages/shared/fastfetchbot_shared/services/scrapers")
scrapers_to_check = ['twitter', 'threads', 'reddit', 'instagram', 'douban', 'wechat', 'xiaohongshu', 'zhihu', 'weibo', 'bluesky']

print("Scraper classes that will be instantiated with category= argument:\n")

for scraper_name in scrapers_to_check:
    init_file = scrapers_dir / scraper_name / '__init__.py'
    if init_file.exists():
        try:
            tree = ast.parse(init_file.read_text())
            # Find the main scraper class (usually capitalized version of scraper_name)
            for node in tree.body:
                if isinstance(node, ast.ClassDef):
                    # Look for __init__ method
                    init_method = next(
                        (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                        None
                    )
                    if init_method:
                        args = [arg.arg for arg in init_method.args.args]
                        has_kwargs = init_method.args.kwarg is not None
                        has_category = 'category' in args
                        print(f"{scraper_name}: {node.name}")
                        print(f"  args: {args}")
                        print(f"  has_kwargs: {has_kwargs}, has_category: {has_category}")
                        if not (has_category or has_kwargs):
                            print(f"  ❌ WILL FAIL: does not accept category= or **kwargs")
                        print()
                    else:
                        print(f"{scraper_name}: {node.name}")
                        print(f"  inherits __init__")
                        print()
        except Exception as e:
            print(f"Error parsing {init_file}: {e}\n")
EOF

Repository: aturret/FastFetchBot

Length of output: 991


🏁 Script executed:

#!/bin/bash
# Let's check a couple of these __init__.py files directly to understand their structure
echo "=== twitter/__init__.py (first 50 lines) ==="
head -50 packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py

Repository: aturret/FastFetchBot

Length of output: 1618


🏁 Script executed:

#!/bin/bash
# Check threads/__init__.py
echo "=== threads/__init__.py (first 50 lines) ==="
head -50 packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py

Repository: aturret/FastFetchBot

Length of output: 1772


🏁 Script executed:

#!/bin/bash
# Check what Weibo and Bluesky inherit from and if their base classes accept category
python3 << 'EOF'
import ast
from pathlib import Path

# Check Weibo
weibo_file = Path("packages/shared/fastfetchbot_shared/services/scrapers/weibo/__init__.py")
bluesky_file = Path("packages/shared/fastfetchbot_shared/services/scrapers/bluesky/__init__.py")

print("=== Weibo ===")
if weibo_file.exists():
    tree = ast.parse(weibo_file.read_text())
    for node in tree.body:
        if isinstance(node, ast.ClassDef) and node.name == "Weibo":
            print(f"Class: {node.name}")
            print(f"Bases: {[base.id if hasattr(base, 'id') else str(base) for base in node.bases]}")
            # Check if __init__ is defined
            init_method = next(
                (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                None
            )
            if init_method:
                print("Has custom __init__")
            else:
                print("Inherits __init__ from base class")

print("\n=== Bluesky ===")
if bluesky_file.exists():
    tree = ast.parse(bluesky_file.read_text())
    for node in tree.body:
        if isinstance(node, ast.ClassDef) and node.name == "Bluesky":
            print(f"Class: {node.name}")
            print(f"Bases: {[base.id if hasattr(base, 'id') else str(base) for base in node.bases]}")
            # Check if __init__ is defined
            init_method = next(
                (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                None
            )
            if init_method:
                print("Has custom __init__")
            else:
                print("Inherits __init__ from base class")
EOF

Repository: aturret/FastFetchBot

Length of output: 238


🏁 Script executed:

#!/bin/bash
# Check weibo/__init__.py to see the Weibo class definition
head -80 packages/shared/fastfetchbot_shared/services/scrapers/weibo/__init__.py | tail -50

Repository: aturret/FastFetchBot

Length of output: 1616


🏁 Script executed:

#!/bin/bash
# Check bluesky/__init__.py to see the Bluesky class definition
head -50 packages/shared/fastfetchbot_shared/services/scrapers/bluesky/__init__.py | tail -30

Repository: aturret/FastFetchBot

Length of output: 1167


🏁 Script executed:

#!/bin/bash
# Find and read the MetadataItem class
find packages/shared -name "metadata_item.py" -type f

Repository: aturret/FastFetchBot

Length of output: 123


🏁 Script executed:

#!/bin/bash
# Check the MetadataItem __init__ signature
python3 << 'EOF'
import ast
from pathlib import Path

metadata_file = Path("packages/shared/fastfetchbot_shared/models/metadata_item.py")
if metadata_file.exists():
    tree = ast.parse(metadata_file.read_text())
    for node in tree.body:
        if isinstance(node, ast.ClassDef) and node.name == "MetadataItem":
            print(f"Class: {node.name}")
            init_method = next(
                (m for m in node.body if isinstance(m, ast.FunctionDef) and m.name == '__init__'),
                None
            )
            if init_method:
                args = [arg.arg for arg in init_method.args.args]
                has_kwargs = init_method.args.kwarg is not None
                has_category = 'category' in args
                print(f"__init__ args: {args}")
                print(f"has_category: {has_category}, has_kwargs: {has_kwargs}")
            else:
                print("No custom __init__ (inherits from parent)")
EOF

Repository: aturret/FastFetchBot

Length of output: 125


🏁 Script executed:

#!/bin/bash
# Let's check the actual file content
wc -l packages/shared/fastfetchbot_shared/models/metadata_item.py
head -100 packages/shared/fastfetchbot_shared/models/metadata_item.py

Repository: aturret/FastFetchBot

Length of output: 2885


Weibo and Bluesky scrapers are incompatible with the new constructor signature.

The call at lines 88-90 passes data= and **self.kwargs, but Weibo and Bluesky inherit from the dataclass MetadataItem, whose auto-generated __init__ only accepts the declared fields (url, telegraph_url, content, text, media_files, author, title, author_url, category, message_type). They do not accept **kwargs, so this will fail at runtime with TypeError: __init__() got an unexpected keyword argument 'data' whenever data or additional kwargs are provided. The other scrapers (Twitter, Threads, Reddit, Instagram, etc.) all have **kwargs in their signatures and will work correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py` around lines
88 - 90, The instantiation of scraper classes (scraper_cls =
self._resolve_scraper_class(...); scraper_item = scraper_cls(url=self.url,
category=self.category, data=self.data, **self.kwargs)) passes data and **kwargs
to all scrapers, which breaks dataclass-based scrapers like Weibo and Bluesky
(subclasses of MetadataItem) that only accept declared fields; change the
instantiation logic in the calling site to detect whether scraper_cls accepts
extra kwargs (e.g., use inspect.signature to see if **kwargs or a data parameter
is present) or whether scraper_cls is a subclass of MetadataItem, and if so call
it with only the allowed fields (url, category and other declared MetadataItem
fields) instead of passing data/**kwargs; alternatively, attempt instantiation
and on TypeError retry by creating the object with the minimal allowed
parameters for MetadataItem (url, category, telegraph_url, content, text,
media_files, author, title, author_url, message_type).

"celery[redis]>=5.4.0,<6.0.0",
"fastfetchbot-shared[postgres]",
"fastfetchbot-file-export",
"arq>=0.27.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

For ARQ, what version pinning strategy is recommended for production when using 0.27.x? Are breaking changes expected in 0.28.x?

💡 Result:

For ARQ (python-arq job queue library) version 0.27.x in production, pin to a specific patch version (e.g. arq==0.27.0) rather than the loose 0.27.x range, as the project is in maintenance-only mode with limited updates and follows semantic versioning where minor releases (0.28.0) may introduce breaking changes. No 0.28.x release exists as of 2026-03-22 (latest is v0.27.0 released 2026-02-02), so breaking changes are possible but not yet released; treat 0.28.x upgrade as potentially breaking per semver conventions used in releases like v0.26.0b1 and earlier.

Citations:


🏁 Script executed:

cat -n pyproject.toml | head -60

Repository: aturret/FastFetchBot

Length of output: 2217


Pin arq with an explicit upper bound to match the project's dependency strategy.

Line 47 is the only external runtime dependency without an upper bound; all 30+ other dependencies use explicit upper bounds (e.g., >=X.Y.Z,<UPPER.0.0). ARQ is also in maintenance-only mode with no 0.28.x release planned, making strict version bounds prudent.

Proposed diff
-    "arq>=0.27.0",
+    "arq>=0.27.0,<0.28.0",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"arq>=0.27.0",
"arq>=0.27.0,<0.28.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 47, The dependency entry "arq>=0.27.0" in
pyproject.toml lacks an explicit upper bound; update that entry to follow the
project's dependency policy by pinning an upper bound (e.g., change
"arq>=0.27.0" to "arq>=0.27.0,<0.28.0") so it matches the other runtime
dependencies and prevents unintended upgrades.

Comment on lines +167 to +176

# Async Scraping Worker (ARQ)
# Scrape mode: "api" (sync via API server) or "queue" (async via ARQ worker). Default: `api`
SCRAPE_MODE=api

# Redis URL for ARQ task queue. Default: `redis://localhost:6379/2`
ARQ_REDIS_URL=redis://redis:6379/2

# Redis URL for the result outbox. Default: `redis://localhost:6379/3`
OUTBOX_REDIS_URL=redis://redis:6379/3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default value inconsistency: SCRAPE_MODE differs between template.env and config.py.

The template.env sets SCRAPE_MODE=api (Line 170), but apps/telegram-bot/core/config.py (Line 125) defaults to "queue" when the variable is unset. This mismatch could cause unexpected behavior for users who deploy without explicitly setting SCRAPE_MODE.

Consider aligning the defaults—likely api should be the safe default in both places to ensure backwards compatibility for existing deployments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@template.env` around lines 167 - 176, The SCRAPE_MODE default is
inconsistent: template.env sets SCRAPE_MODE=api but the application default in
apps/telegram-bot/core/config.py falls back to "queue" (e.g., where SCRAPE_MODE
is loaded via get_env or Config.SCRAPE_MODE with default "queue"); pick a single
safe default (recommend "api") and change the fallback in config.py to "api" (or
change template.env to "queue" if you prefer the other behavior), and update any
associated comments/docs so both the env template and the code use the same
SCRAPE_MODE default.

Comment on lines +187 to +197
class TestConfigDefaults:
@pytest.mark.asyncio
async def test_uses_config_defaults_when_none(self, base_metadata_item, mock_telegraph):
"""When store_telegraph/store_document are None, config defaults should be used."""
with patch("async_worker.services.enrichment.STORE_TELEGRAPH", True), \
patch("async_worker.services.enrichment.STORE_DOCUMENT", False):
result = await enrich(base_metadata_item)

# STORE_TELEGRAPH=True means Telegraph should be called
MockTg, instance = mock_telegraph
MockTg.from_dict.assert_called_once()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fixture mock_telegraph not applied correctly in this test.

The test uses mock_telegraph as a parameter but then unpacks it after the await enrich(...) call completes, meaning the fixture's patch context may not be active at assertion time. Additionally, the unpacked instance variable is unused.

Suggested fix
     `@pytest.mark.asyncio`
     async def test_uses_config_defaults_when_none(self, base_metadata_item, mock_telegraph):
         """When store_telegraph/store_document are None, config defaults should be used."""
+        MockTg, _ = mock_telegraph  # Unpack before calling enrich
         with patch("async_worker.services.enrichment.STORE_TELEGRAPH", True), \
              patch("async_worker.services.enrichment.STORE_DOCUMENT", False):
             result = await enrich(base_metadata_item)

         # STORE_TELEGRAPH=True means Telegraph should be called
-        MockTg, instance = mock_telegraph
         MockTg.from_dict.assert_called_once()
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 196-196: Unpacked variable instance is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/async_worker/test_enrichment.py` around lines 187 - 197, In
test_uses_config_defaults_when_none move the mock_telegraph unpacking before
calling enrich so the fixture/patch is active during the call and drop the
unused instance variable; e.g. unpack mock_telegraph into (MockTg, _) or
(MockTg, instance) at the top of the test, then call await
enrich(base_metadata_item) and assert MockTg.from_dict.assert_called_once();
ensure you still patch STORE_TELEGRAPH/STORE_DOCUMENT as currently done.

Comment on lines +110 to +116
mock_celery.send_task.assert_called_once_with(
"file_export.pdf_export",
kwargs={
"html_string": mock_celery.send_task.call_args.kwargs["kwargs"]["html_string"],
"output_filename": mock_celery.send_task.call_args.kwargs["kwargs"]["output_filename"],
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test assertion is circular and doesn't verify expected values.

The assertion compares send_task call args against values extracted from the same call args, which always passes. Consider asserting against expected values instead.

💚 Suggested fix
         assert output == "/tmp/final.pdf"
-        mock_celery.send_task.assert_called_once_with(
-            "file_export.pdf_export",
-            kwargs={
-                "html_string": mock_celery.send_task.call_args.kwargs["kwargs"]["html_string"],
-                "output_filename": mock_celery.send_task.call_args.kwargs["kwargs"]["output_filename"],
-            },
-        )
+        mock_celery.send_task.assert_called_once()
+        call_args = mock_celery.send_task.call_args
+        assert call_args[0][0] == "file_export.pdf_export"
+        sent_kwargs = call_args.kwargs["kwargs"]
+        assert "<html>" in sent_kwargs["html_string"]
+        assert "content" in sent_kwargs["html_string"]
+        assert sent_kwargs["output_filename"].endswith(".pdf")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/file_export/test_pdf_export.py` around lines 110 - 116, The test
currently asserts mock_celery.send_task against values extracted from the same
call (mock_celery.send_task.call_args), which is circular and always passes;
update the assertion in the test_pdf_export test to compare the send_task call
to concrete expected values (or to variables computed earlier in the test)
instead of pulling values from call_args (i.e., assert
mock_celery.send_task.assert_called_once_with("file_export.pdf_export",
kwargs={"html_string": expected_html_string, "output_filename":
expected_output_filename})), referencing the mock_celery.send_task call and the
expected HTML/output filename variables used in the test.

@aturret aturret merged commit 3e1366b into main Mar 22, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants