feat: revive xiaohongshu scraping feature by aturret · Pull Request #60 · aturret/FastFetchBot

aturret · 2026-02-23T23:04:04Z

Summary by CodeRabbit

New Features
- External signing-server integration for Xiaohongshu
- Optional file-based Xiaohongshu cookie support
Improvements
- Xiaohongshu scraper switched to a lightweight adapter/API workflow (replaces prior browser-automation flow)
- URL filtering expanded to include GitHub repository links
- Content rendering/template tweaks for Xiaohongshu posts
Documentation
- README and docs updated with signing-server, cookie setup, examples, and config guidance
Chores
- Added .gitignore entry for celerybeat schedule DB file

coderabbitai · 2026-02-23T23:07:42Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 772adab and 6bb5526.

📒 Files selected for processing (3)

apps/api/src/config.py
apps/api/src/services/scrapers/xiaohongshu/adaptar.py
packages/shared/fastfetchbot_shared/utils/parse.py

📝 Walkthrough

Walkthrough

Replaces the Playwright-based Xiaohongshu crawler stack with a new HTTP adapter (XhsSinglePostAdapter), removes legacy xhs modules, adds sign-server and cookie-file configuration, updates scraper wiring and templates, and refreshes docs and env templates.

Changes

Cohort / File(s)	Summary
Xiaohongshu HTTP Adapter `apps/api/src/services/scrapers/xiaohongshu/adaptar.py`	Adds `XhsSinglePostAdapter`: URL parsing/normalization, sign-server header signing, signed GET/POST flows with HTML fallback, comment/sub-comment fetching and normalization, and utilities/constants for XHS API/web endpoints.
Scraper integration `apps/api/src/services/scrapers/xiaohongshu/__init__.py`, `apps/api/src/templates/xiaohongshu_content.jinja2`	Rewires scraper to use `XhsSinglePostAdapter`, renames/makes processing helpers private, updates field mapping (author, times, counts, media), consumes `XHS_COOKIE_STRING`/`XHS_SIGN_SERVER_URL`, and fixes template field references.
Removed Playwright crawler stack `apps/api/src/services/scrapers/xiaohongshu/xhs/...`	Deletes legacy Playwright-based modules and helpers: `core.py`, `client.py`, `login.py`, `help.py`, `utils.py`, `proxy_account_pool.py`, `base_crawler.py`, `field.py`, `exception.py`, plus re-exports — removing crawler classes, client, signing helpers, pools, types, and exceptions.
Config & env `apps/api/src/config.py`, `packages/shared/fastfetchbot_shared/config.py`, `template.env`	Adds `SIGN_SERVER_URL` and `XHS_COOKIE_PATH` to shared config and env template; API config loads cookie file (fallback to XIAOHONGSHU_* env vars) and exposes `XHS_COOKIE_STRING` and `XHS_SIGN_SERVER_URL`.
Docs & README `CLAUDE.md`, `README.md`	Updates docs to describe httpx-based adapter and external sign-server workflow, adds Xiaohongshu setup instructions, cookie-file vs env options, and updates scraper lineage and supported content types.
Shared utils & parsing `packages/shared/fastfetchbot_shared/utils/config.py`, `packages/shared/fastfetchbot_shared/utils/parse.py`	Adds GitHub repo regex to `BANNED_PATTERNS`; allows `xiaohongshu` as non-stripped source in URL finalization logic.
Miscellaneous `.gitignore`	Adds ignore entry for `apps/worker/celerybeat-schedule.db`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/API
    participant Adapter as XhsSinglePostAdapter
    participant SignServer as Sign Server
    participant XhsAPI as Xiaohongshu API
    participant Parser as HTML Parser

    Client->>Adapter: fetch_post(note_url, with_comments?)
    Adapter->>Adapter: parse_xhs_note_url(note_url)
    Adapter->>SignServer: request signature for uri/data
    SignServer-->>Adapter: returns signed headers
    Adapter->>XhsAPI: _signed_post or _signed_get with signed headers
    alt API returns JSON
        XhsAPI-->>Adapter: JSON response
        Adapter->>Adapter: _parse_api_response() and _normalize_note()
    else API fails or non-JSON
        Adapter->>XhsAPI: fetch note HTML
        XhsAPI-->>Parser: HTML content
        Parser-->>Adapter: extracted fields
        Adapter->>Adapter: _normalize_note()
    end
    opt with_comments
        Adapter->>XhsAPI: _fetch_comments (paged)
        XhsAPI-->>Adapter: comment pages
        Adapter->>Adapter: _normalize_comment() (incl. sub-comments)
    end
    Adapter-->>Client: normalized note dict (media, content, metadata, comments?)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: revive xiaohongshu scraping feature #60 — Implements the same Xiaohongshu overhaul (adds XhsSinglePostAdapter, config vars, removes legacy xhs modules).
feat: refactor codebase to microservices monorepo #56 — Earlier PR that added the Playwright-based Xiaohongshu crawler modules which are removed/replaced by this change.

Poem

🐰 Farewell to Playwright's midnight code,
A sign-server hums along the road.
HTTP hops, cookies snug and neat,
Notes and comments now march to the beat.
Hop, hop — new adapters make the world complete!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective: reviving Xiaohongshu scraping. The extensive changes include replacing Playwright-based crawler with a new adapter-based implementation, updating configuration, documentation, and removing legacy code—all aligned with the stated feature revival goal.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch xhs-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

aturret · 2026-02-23T23:08:56Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d8b5c3d8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-23T23:14:59Z

packages/shared/fastfetchbot_shared/config.py

 HTTP_REQUEST_TIMEOUT = env.get("HTTP_REQUEST_TIMEOUT", 30)
+
+# XHS (Xiaohongshu) shared configuration
+SIGN_SERVER_URL = env.get("SIGN_SERVER_URL", "http://localhost:8989")


Preserve localhost fallback when SIGN_SERVER_URL is empty

This reads SIGN_SERVER_URL with env.get("SIGN_SERVER_URL", "http://localhost:8989"), which does not apply the default when the key exists but is blank. In this repo’s common setup (docker-compose.template.yml uses env_file: .env, and template.env now adds SIGN_SERVER_URL=), the value becomes an empty string, so XHS signing requests are built from an invalid base URL instead of the documented localhost default and scraping fails at runtime. Treat empty values as unset (e.g., env.get(...) or "http://localhost:8989").

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-23T23:14:59Z

apps/api/src/services/scrapers/xiaohongshu/__init__.py

+        async with XhsSinglePostAdapter(
+            cookies=XHS_COOKIE_STRING,
+            sign_server_endpoint=XHS_SIGN_SERVER_URL,
+        ) as adapter:
+            result = await adapter.fetch_post(note_url=self.url)


Retry Xiaohongshu fetch on transient sign/API failures

The new flow performs a single fetch_post call and propagates errors directly, while the previous implementation retried up to 5 times with delay. Under transient sign-server/API/network failures (timeouts, brief 5xx, intermittent blocks), this now fails the entire scrape immediately and increases flaky user-facing errors. Reintroducing bounded retries here would restore the reliability behavior users previously had.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/api/src/config.py (1)
82-90: ⚠️ Potential issue | 🟡 Minor

Remove unused legacy XHS configuration variables from config.py.

These variables (XIAOHONGSHU_COOKIES, XHS_PHONE_LIST, XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, XHS_SAVE_LOGIN_STATE) are no longer referenced anywhere in the codebase after the migration to XhsSinglePostAdapter. The new adapter only uses XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL. Remove these dead config variables to keep the configuration file clean.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.py` around lines 82 - 90, Remove the unused legacy XHS
config variables from config.py by deleting the definitions XIAOHONGSHU_COOKIES,
XHS_PHONE_LIST, XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, and
XHS_SAVE_LOGIN_STATE; ensure you leave the current XHS settings
(XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL) intact, and run a quick repo-wide
search for those removed symbols to confirm they are not referenced elsewhere
before committing.

🧹 Nitpick comments (1)

apps/api/src/config.py (1)
93-94: Move the mid-file imports to the top of the file.

PEP 8 specifies that all import statements should be "put at the top of the file, just after any module comments and docstrings, and before module globals and constants." These two imports are inserted after ~90 lines of variable assignments, which breaks the established pattern in the file and makes dependencies harder to locate.
♻️ Proposed fix

Remove lines 93–94 from their current position and add alongside the other fastfetchbot_shared import at the top of the file:
 from fastfetchbot_shared.utils.parse import get_env_bool
+from fastfetchbot_shared.config import SIGN_SERVER_URL as XHS_SIGN_SERVER_URL
+from fastfetchbot_shared.config import XHS_COOKIE_PATH as _XHS_COOKIE_PATH
Then delete them from lines 93–94.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.py` around lines 93 - 94, The two mid-file imports
(SIGN_SERVER_URL as XHS_SIGN_SERVER_URL and XHS_COOKIE_PATH as _XHS_COOKIE_PATH
from fastfetchbot_shared.config) must be moved to the top with the other
fastfetchbot_shared imports; remove the duplicate lines at lines 93–94 and add
those two import names alongside the existing fastfetchbot_shared import block
so XHS_SIGN_SERVER_URL and _XHS_COOKIE_PATH are declared together with the
module-level imports.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 275-276: Remove the duplicate gitignore entry
'apps/worker/celerybeat-schedule.db' so it only appears once (or remove both),
and rely on the existing '*.db' pattern to cover it; update the .gitignore by
deleting the repeated 'apps/worker/celerybeat-schedule.db' line to avoid
redundancy.

In `@apps/api/src/config.py`:
- Around line 104-106: Replace the print call in the exception handler with
Loguru logging: in the except block that catches (IOError, OSError) where
XHS_COOKIE_STRING is set to "", call logger.error(...) and include the exception
details (e) and context ("Error reading XHS cookie file") instead of using
print; ensure Loguru's logger is imported/available in this module (logger from
loguru) before using it so the error is recorded consistently with the project's
logging conventions.

In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py`:
- Around line 43-61: In __init__, validate that a non-empty sign server URL is
available (either the provided sign_server_endpoint or the fallback
SIGN_SERVER_URL) and raise a clear error if both are empty to fail fast; update
the constructor logic around sign_server_endpoint (and its rstrip usage) so that
after resolving (sign_server_endpoint or SIGN_SERVER_URL).rstrip("/") you check
for empty string and raise ValueError (or a configuration-specific exception)
referencing the constructor/__init__ and _sign_headers behavior so
misconfiguration is surfaced immediately instead of causing httpx runtime URL
errors.
- Around line 225-260: The regex in _fetch_note_by_html currently uses ({.*})
which is greedy and won't match across newlines; change the search to use a
non-greedy capture and DOTALL mode so the JSON blob spanning multiple lines is
matched (e.g., use a pattern like "window.__INITIAL_STATE__=({.*?})</script>"
with the re.DOTALL flag) when calling re.search(...) so match.group(1) yields
the full JSON state for json.loads.
- Around line 438-459: The _extract_video_urls method builds a hardcoded HTTP
CDN URL using origin_video_key which can cause mixed-content warnings; update
the returned URL to use HTTPS (i.e., change the scheme for the
sns-video-bd.xhscdn.com URL built from origin_video_key in _extract_video_urls)
and keep the logic that falls back to collecting master_url entries from the
h264 stream unchanged.

In `@packages/shared/fastfetchbot_shared/utils/config.py`:
- Line 55: The regex entry in BANNED_PATTERNS for GitHub uses an unescaped dot
("github.com") which allows any character at that position; update the pattern
in BANNED_PATTERNS (the string currently
r"github.com\/[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+") to escape the dot as
r"github\.com\/[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+" so it only matches "github.com"
exactly.

In `@README.md`:
- Around line 182-184: The fenced code block containing the cookie-file example
(the block showing "a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx") is
missing a language identifier and triggers markdownlint MD040; update that
fenced block to include a language specifier such as "text" (or "sh") so the
block becomes ```text ... ``` to mark it as plain text.

---

Outside diff comments:
In `@apps/api/src/config.py`:
- Around line 82-90: Remove the unused legacy XHS config variables from
config.py by deleting the definitions XIAOHONGSHU_COOKIES, XHS_PHONE_LIST,
XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, and XHS_SAVE_LOGIN_STATE; ensure you
leave the current XHS settings (XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL)
intact, and run a quick repo-wide search for those removed symbols to confirm
they are not referenced elsewhere before committing.

---

Nitpick comments:
In `@apps/api/src/config.py`:
- Around line 93-94: The two mid-file imports (SIGN_SERVER_URL as
XHS_SIGN_SERVER_URL and XHS_COOKIE_PATH as _XHS_COOKIE_PATH from
fastfetchbot_shared.config) must be moved to the top with the other
fastfetchbot_shared imports; remove the duplicate lines at lines 93–94 and add
those two import names alongside the existing fastfetchbot_shared import block
so XHS_SIGN_SERVER_URL and _XHS_COOKIE_PATH are declared together with the
module-level imports.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a023a29 and 0d8b5c3.

📒 Files selected for processing (20)

.gitignore
CLAUDE.md
README.md
apps/api/src/config.py
apps/api/src/services/scrapers/xiaohongshu/__init__.py
apps/api/src/services/scrapers/xiaohongshu/adaptar.py
apps/api/src/services/scrapers/xiaohongshu/xhs/__init__.py
apps/api/src/services/scrapers/xiaohongshu/xhs/base_crawler.py
apps/api/src/services/scrapers/xiaohongshu/xhs/client.py
apps/api/src/services/scrapers/xiaohongshu/xhs/core.py
apps/api/src/services/scrapers/xiaohongshu/xhs/exception.py
apps/api/src/services/scrapers/xiaohongshu/xhs/field.py
apps/api/src/services/scrapers/xiaohongshu/xhs/help.py
apps/api/src/services/scrapers/xiaohongshu/xhs/login.py
apps/api/src/services/scrapers/xiaohongshu/xhs/proxy_account_pool.py
apps/api/src/services/scrapers/xiaohongshu/xhs/utils.py
apps/api/src/templates/xiaohongshu_content.jinja2
packages/shared/fastfetchbot_shared/config.py
packages/shared/fastfetchbot_shared/utils/config.py
template.env

💤 Files with no reviewable changes (10)

apps/api/src/services/scrapers/xiaohongshu/xhs/login.py
apps/api/src/services/scrapers/xiaohongshu/xhs/exception.py
apps/api/src/services/scrapers/xiaohongshu/xhs/help.py
apps/api/src/services/scrapers/xiaohongshu/xhs/client.py
apps/api/src/services/scrapers/xiaohongshu/xhs/field.py
apps/api/src/services/scrapers/xiaohongshu/xhs/init.py
apps/api/src/services/scrapers/xiaohongshu/xhs/base_crawler.py
apps/api/src/services/scrapers/xiaohongshu/xhs/proxy_account_pool.py
apps/api/src/services/scrapers/xiaohongshu/xhs/utils.py
apps/api/src/services/scrapers/xiaohongshu/xhs/core.py

.gitignore

apps/api/src/config.py

apps/api/src/services/scrapers/xiaohongshu/adaptar.py

coderabbitai · 2026-02-23T23:15:42Z

apps/api/src/services/scrapers/xiaohongshu/adaptar.py

+    async def _fetch_note_by_html(
+            self,
+            note_id: str,
+            xsec_token: str,
+            xsec_source: str,
+    ) -> Optional[Dict[str, Any]]:
+        detail_url = f"{XHS_WEB_URL}/explore/{note_id}"
+        if xsec_token:
+            detail_url += f"?xsec_token={xsec_token}&xsec_source={xsec_source}"
+
+        resp = await self._http.get(detail_url, headers=self._base_headers())
+        if resp.status_code != 200:
+            return None
+
+        text = resp.text or ""
+        match = re.search(r"window.__INITIAL_STATE__=({.*})</script>", text)
+        if not match:
+            return None
+
+        try:
+            state = json.loads(match.group(1).replace("undefined", "null"))
+            note = (
+                state.get("note", {})
+                .get("noteDetailMap", {})
+                .get(note_id, {})
+                .get("note")
+            )
+        except json.JSONDecodeError:
+            return None
+
+        if not note:
+            return None
+        note["xsec_token"] = xsec_token
+        note["xsec_source"] = xsec_source
+        return self._normalize_note(note)
+


⚠️ Potential issue | 🟠 Major

Make the HTML state regex DOTALL and non-greedy.

The HTML JSON blob often spans newlines, so the current regex is likely to miss and make the fallback path fail.

🔧 Suggested fix

- match = re.search(r"window.__INITIAL_STATE__=({.*})</script>", text) + match = re.search( + r"window.__INITIAL_STATE__=(\{.*?\})</script>", text, re.S + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py` around lines 225 - 260, The regex in _fetch_note_by_html currently uses ({.*}) which is greedy and won't match across newlines; change the search to use a non-greedy capture and DOTALL mode so the JSON blob spanning multiple lines is matched (e.g., use a pattern like "window.__INITIAL_STATE__=({.*?})</script>" with the re.DOTALL flag) when calling re.search(...) so match.group(1) yields the full JSON state for json.loads.

coderabbitai · 2026-02-23T23:15:42Z

apps/api/src/services/scrapers/xiaohongshu/adaptar.py

+    @staticmethod
+    def _extract_video_urls(note_item: Dict[str, Any]) -> List[str]:
+        if note_item.get("type") != "video":
+            return []
+        video = note_item.get("video", {}) or {}
+        consumer = video.get("consumer", {}) or {}
+        origin_video_key = consumer.get("origin_video_key") or consumer.get("originVideoKey")
+        if origin_video_key:
+            return [f"http://sns-video-bd.xhscdn.com/{origin_video_key}"]
+
+        urls = []
+        h264_list = (
+                (video.get("media", {}) or {})
+                .get("stream", {})
+                .get("h264", [])
+                or []
+        )
+        for item in h264_list:
+            master_url = item.get("master_url")
+            if master_url:
+                urls.append(master_url)
+        return urls


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists git ls-files | grep -i "xiaohongshu"

Repository: aturret/FastFetchBot

Length of output: 275

🏁 Script executed:

# Search for HTTP URLs in the codebase rg "http://" --type py | head -20

Repository: aturret/FastFetchBot

Length of output: 1272

🏁 Script executed:

# Check for any configuration or environment variables related to this CDN rg "sns-video-bd|xhscdn" --type py

Repository: aturret/FastFetchBot

Length of output: 191

🏁 Script executed:

# Look for similar patterns with HTTPS in other scrapers rg "https://.*\.xhscdn\.com" --type py

Repository: aturret/FastFetchBot

Length of output: 46

🏁 Script executed:

# Check for comments or documentation about xiaohongshu/CDN rg -A 5 -B 5 "xhscdn|xiaohongshu.*cdn" --type py

Repository: aturret/FastFetchBot

Length of output: 1099

🏁 Script executed:

# Look at the adaptar.py file to see more context around this function cat -n apps/api/src/services/scrapers/xiaohongshu/adaptar.py | head -100

Repository: aturret/FastFetchBot

Length of output: 4388

🏁 Script executed:

# Check if there's any environment configuration for URL schemes rg "HTTPS|https_only|secure" --type py apps/api/src/

Repository: aturret/FastFetchBot

Length of output: 46

🌐 Web query:

xiaohongshu xhscdn HTTPS support CDN

💡 Result:

xhscdn.com / xhscdn.net are domains used by Xiaohongshu (XHS) for serving static assets (images/video) via many subdomains (it’s not a “public CDN you sign up for”). XHS’s own security center explicitly identifies *.xhscdn.com and *.xhscdn.net as their CDN domains used for user-uploaded content. [1]

HTTPS support is not consistent at the apex/root domain: multiple checkers report that xhscdn.com itself (the bare domain) does not support HTTPS. [2]

Many specific subdomains do support HTTPS/TLS (i.e., they present valid certificates), e.g. magpie.xhscdn.com is reported as having a valid SSL certificate. [3]

Implication: if you’re trying to fetch an XHS asset, you typically need to use the exact subdomain hostname XHS provides (often works over https://<subdomain>.xhscdn.com/...), rather than assuming https://xhscdn.com/... will work. [2][3]

Sources
[1] Xiaohongshu Security Response Center: CDN domains include *.xhscdn.com / *.xhscdn.net
[2] Hypestat SSL checker result for xhscdn.com (reports no HTTPS on apex)
[3] ScamAdviser report for magpie.xhscdn.com (reports valid SSL certificate)

Use HTTPS for CDN video URLs to prevent mixed content warnings.

The hardcoded HTTP URL will trigger mixed content warnings when served in HTTPS contexts. While the base xhscdn.com domain has inconsistent HTTPS support, individual subdomains like sns-video-bd.xhscdn.com typically support HTTPS. This aligns with other Xiaohongshu API calls in this file which already use HTTPS (XHS_API_URL, XHS_WEB_URL).

🔧 Suggested fix

- return [f"http://sns-video-bd.xhscdn.com/{origin_video_key}"] + return [f"https://sns-video-bd.xhscdn.com/{origin_video_key}"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py` around lines 438 - 459, The _extract_video_urls method builds a hardcoded HTTP CDN URL using origin_video_key which can cause mixed-content warnings; update the returned URL to use HTTPS (i.e., change the scheme for the sns-video-bd.xhscdn.com URL built from origin_video_key in _extract_video_urls) and keep the logic that falls back to collecting master_url entries from the h264 stream unchanged.

packages/shared/fastfetchbot_shared/utils/config.py

coderabbitai · 2026-02-23T23:15:42Z

README.md

+  ```
+  a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx
+  ```


⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced code block.

The cookie-file example block has no language specifier, which triggers markdownlint MD040. Since it's a plain text example, use text (or sh).

🔧 Proposed fix

- ``` + ```text a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 182-182: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 182 - 184, The fenced code block containing the cookie-file example (the block showing "a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx") is missing a language identifier and triggers markdownlint MD040; update that fenced block to include a language specifier such as "text" (or "sh") so the block becomes ```text ... ``` to mark it as plain text.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…into xhs-update

aturret added 5 commits February 23, 2026 16:56

Update .gitignore

a100dac

Update .gitignore

ee2d78b

feat: revive xhs scraping

9d2d1e3

clean: remove old copied code

e43dfad

docs: update docs

0d8b5c3

chatgpt-codex-connector bot reviewed Feb 23, 2026

View reviewed changes

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

aturret and others added 4 commits February 23, 2026 17:33

Update .gitignore

6ae16ee

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update packages/shared/fastfetchbot_shared/utils/config.py

772adab

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

feat: update video fetch

5e248bc

Merge branch 'xhs-update' of https://github.com/aturret/FastFetchBot …

6bb5526

…into xhs-update

aturret merged commit f664418 into main Feb 24, 2026
1 of 2 checks passed

aturret deleted the xhs-update branch February 24, 2026 00:58

coderabbitai bot mentioned this pull request Mar 21, 2026

feat: add some unit tests #68

Merged

Conversation

aturret commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

aturret commented Feb 23, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aturret commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading