Skip to content

feat: revive xiaohongshu scraping feature#60

Merged
aturret merged 9 commits intomainfrom
xhs-update
Feb 24, 2026
Merged

feat: revive xiaohongshu scraping feature#60
aturret merged 9 commits intomainfrom
xhs-update

Conversation

@aturret
Copy link
Owner

@aturret aturret commented Feb 23, 2026

Summary by CodeRabbit

  • New Features

    • External signing-server integration for Xiaohongshu
    • Optional file-based Xiaohongshu cookie support
  • Improvements

    • Xiaohongshu scraper switched to a lightweight adapter/API workflow (replaces prior browser-automation flow)
    • URL filtering expanded to include GitHub repository links
    • Content rendering/template tweaks for Xiaohongshu posts
  • Documentation

    • README and docs updated with signing-server, cookie setup, examples, and config guidance
  • Chores

    • Added .gitignore entry for celerybeat schedule DB file

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 772adab and 6bb5526.

📒 Files selected for processing (3)
  • apps/api/src/config.py
  • apps/api/src/services/scrapers/xiaohongshu/adaptar.py
  • packages/shared/fastfetchbot_shared/utils/parse.py

📝 Walkthrough

Walkthrough

Replaces the Playwright-based Xiaohongshu crawler stack with a new HTTP adapter (XhsSinglePostAdapter), removes legacy xhs modules, adds sign-server and cookie-file configuration, updates scraper wiring and templates, and refreshes docs and env templates.

Changes

Cohort / File(s) Summary
Xiaohongshu HTTP Adapter
apps/api/src/services/scrapers/xiaohongshu/adaptar.py
Adds XhsSinglePostAdapter: URL parsing/normalization, sign-server header signing, signed GET/POST flows with HTML fallback, comment/sub-comment fetching and normalization, and utilities/constants for XHS API/web endpoints.
Scraper integration
apps/api/src/services/scrapers/xiaohongshu/__init__.py, apps/api/src/templates/xiaohongshu_content.jinja2
Rewires scraper to use XhsSinglePostAdapter, renames/makes processing helpers private, updates field mapping (author, times, counts, media), consumes XHS_COOKIE_STRING/XHS_SIGN_SERVER_URL, and fixes template field references.
Removed Playwright crawler stack
apps/api/src/services/scrapers/xiaohongshu/xhs/...
Deletes legacy Playwright-based modules and helpers: core.py, client.py, login.py, help.py, utils.py, proxy_account_pool.py, base_crawler.py, field.py, exception.py, plus re-exports — removing crawler classes, client, signing helpers, pools, types, and exceptions.
Config & env
apps/api/src/config.py, packages/shared/fastfetchbot_shared/config.py, template.env
Adds SIGN_SERVER_URL and XHS_COOKIE_PATH to shared config and env template; API config loads cookie file (fallback to XIAOHONGSHU_* env vars) and exposes XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL.
Docs & README
CLAUDE.md, README.md
Updates docs to describe httpx-based adapter and external sign-server workflow, adds Xiaohongshu setup instructions, cookie-file vs env options, and updates scraper lineage and supported content types.
Shared utils & parsing
packages/shared/fastfetchbot_shared/utils/config.py, packages/shared/fastfetchbot_shared/utils/parse.py
Adds GitHub repo regex to BANNED_PATTERNS; allows xiaohongshu as non-stripped source in URL finalization logic.
Miscellaneous
.gitignore
Adds ignore entry for apps/worker/celerybeat-schedule.db.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/API
    participant Adapter as XhsSinglePostAdapter
    participant SignServer as Sign Server
    participant XhsAPI as Xiaohongshu API
    participant Parser as HTML Parser

    Client->>Adapter: fetch_post(note_url, with_comments?)
    Adapter->>Adapter: parse_xhs_note_url(note_url)
    Adapter->>SignServer: request signature for uri/data
    SignServer-->>Adapter: returns signed headers
    Adapter->>XhsAPI: _signed_post or _signed_get with signed headers
    alt API returns JSON
        XhsAPI-->>Adapter: JSON response
        Adapter->>Adapter: _parse_api_response() and _normalize_note()
    else API fails or non-JSON
        Adapter->>XhsAPI: fetch note HTML
        XhsAPI-->>Parser: HTML content
        Parser-->>Adapter: extracted fields
        Adapter->>Adapter: _normalize_note()
    end
    opt with_comments
        Adapter->>XhsAPI: _fetch_comments (paged)
        XhsAPI-->>Adapter: comment pages
        Adapter->>Adapter: _normalize_comment() (incl. sub-comments)
    end
    Adapter-->>Client: normalized note dict (media, content, metadata, comments?)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Farewell to Playwright's midnight code,
A sign-server hums along the road.
HTTP hops, cookies snug and neat,
Notes and comments now march to the beat.
Hop, hop — new adapters make the world complete!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: reviving Xiaohongshu scraping. The extensive changes include replacing Playwright-based crawler with a new adapter-based implementation, updating configuration, documentation, and removing legacy code—all aligned with the stated feature revival goal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xhs-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aturret
Copy link
Owner Author

aturret commented Feb 23, 2026

@codex review this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d8b5c3d8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

HTTP_REQUEST_TIMEOUT = env.get("HTTP_REQUEST_TIMEOUT", 30)

# XHS (Xiaohongshu) shared configuration
SIGN_SERVER_URL = env.get("SIGN_SERVER_URL", "http://localhost:8989")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve localhost fallback when SIGN_SERVER_URL is empty

This reads SIGN_SERVER_URL with env.get("SIGN_SERVER_URL", "http://localhost:8989"), which does not apply the default when the key exists but is blank. In this repo’s common setup (docker-compose.template.yml uses env_file: .env, and template.env now adds SIGN_SERVER_URL=), the value becomes an empty string, so XHS signing requests are built from an invalid base URL instead of the documented localhost default and scraping fails at runtime. Treat empty values as unset (e.g., env.get(...) or "http://localhost:8989").

Useful? React with 👍 / 👎.

Comment on lines +40 to +44
async with XhsSinglePostAdapter(
cookies=XHS_COOKIE_STRING,
sign_server_endpoint=XHS_SIGN_SERVER_URL,
) as adapter:
result = await adapter.fetch_post(note_url=self.url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry Xiaohongshu fetch on transient sign/API failures

The new flow performs a single fetch_post call and propagates errors directly, while the previous implementation retried up to 5 times with delay. Under transient sign-server/API/network failures (timeouts, brief 5xx, intermittent blocks), this now fails the entire scrape immediately and increases flaky user-facing errors. Reintroducing bounded retries here would restore the reliability behavior users previously had.

Useful? React with 👍 / 👎.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/api/src/config.py (1)

82-90: ⚠️ Potential issue | 🟡 Minor

Remove unused legacy XHS configuration variables from config.py.

These variables (XIAOHONGSHU_COOKIES, XHS_PHONE_LIST, XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, XHS_SAVE_LOGIN_STATE) are no longer referenced anywhere in the codebase after the migration to XhsSinglePostAdapter. The new adapter only uses XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL. Remove these dead config variables to keep the configuration file clean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.py` around lines 82 - 90, Remove the unused legacy XHS
config variables from config.py by deleting the definitions XIAOHONGSHU_COOKIES,
XHS_PHONE_LIST, XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, and
XHS_SAVE_LOGIN_STATE; ensure you leave the current XHS settings
(XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL) intact, and run a quick repo-wide
search for those removed symbols to confirm they are not referenced elsewhere
before committing.
🧹 Nitpick comments (1)
apps/api/src/config.py (1)

93-94: Move the mid-file imports to the top of the file.

PEP 8 specifies that all import statements should be "put at the top of the file, just after any module comments and docstrings, and before module globals and constants." These two imports are inserted after ~90 lines of variable assignments, which breaks the established pattern in the file and makes dependencies harder to locate.

♻️ Proposed fix

Remove lines 93–94 from their current position and add alongside the other fastfetchbot_shared import at the top of the file:

 from fastfetchbot_shared.utils.parse import get_env_bool
+from fastfetchbot_shared.config import SIGN_SERVER_URL as XHS_SIGN_SERVER_URL
+from fastfetchbot_shared.config import XHS_COOKIE_PATH as _XHS_COOKIE_PATH

Then delete them from lines 93–94.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.py` around lines 93 - 94, The two mid-file imports
(SIGN_SERVER_URL as XHS_SIGN_SERVER_URL and XHS_COOKIE_PATH as _XHS_COOKIE_PATH
from fastfetchbot_shared.config) must be moved to the top with the other
fastfetchbot_shared imports; remove the duplicate lines at lines 93–94 and add
those two import names alongside the existing fastfetchbot_shared import block
so XHS_SIGN_SERVER_URL and _XHS_COOKIE_PATH are declared together with the
module-level imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 275-276: Remove the duplicate gitignore entry
'apps/worker/celerybeat-schedule.db' so it only appears once (or remove both),
and rely on the existing '*.db' pattern to cover it; update the .gitignore by
deleting the repeated 'apps/worker/celerybeat-schedule.db' line to avoid
redundancy.

In `@apps/api/src/config.py`:
- Around line 104-106: Replace the print call in the exception handler with
Loguru logging: in the except block that catches (IOError, OSError) where
XHS_COOKIE_STRING is set to "", call logger.error(...) and include the exception
details (e) and context ("Error reading XHS cookie file") instead of using
print; ensure Loguru's logger is imported/available in this module (logger from
loguru) before using it so the error is recorded consistently with the project's
logging conventions.

In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py`:
- Around line 43-61: In __init__, validate that a non-empty sign server URL is
available (either the provided sign_server_endpoint or the fallback
SIGN_SERVER_URL) and raise a clear error if both are empty to fail fast; update
the constructor logic around sign_server_endpoint (and its rstrip usage) so that
after resolving (sign_server_endpoint or SIGN_SERVER_URL).rstrip("/") you check
for empty string and raise ValueError (or a configuration-specific exception)
referencing the constructor/__init__ and _sign_headers behavior so
misconfiguration is surfaced immediately instead of causing httpx runtime URL
errors.
- Around line 225-260: The regex in _fetch_note_by_html currently uses ({.*})
which is greedy and won't match across newlines; change the search to use a
non-greedy capture and DOTALL mode so the JSON blob spanning multiple lines is
matched (e.g., use a pattern like "window.__INITIAL_STATE__=({.*?})</script>"
with the re.DOTALL flag) when calling re.search(...) so match.group(1) yields
the full JSON state for json.loads.
- Around line 438-459: The _extract_video_urls method builds a hardcoded HTTP
CDN URL using origin_video_key which can cause mixed-content warnings; update
the returned URL to use HTTPS (i.e., change the scheme for the
sns-video-bd.xhscdn.com URL built from origin_video_key in _extract_video_urls)
and keep the logic that falls back to collecting master_url entries from the
h264 stream unchanged.

In `@packages/shared/fastfetchbot_shared/utils/config.py`:
- Line 55: The regex entry in BANNED_PATTERNS for GitHub uses an unescaped dot
("github.com") which allows any character at that position; update the pattern
in BANNED_PATTERNS (the string currently
r"github.com\/[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+") to escape the dot as
r"github\.com\/[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+" so it only matches "github.com"
exactly.

In `@README.md`:
- Around line 182-184: The fenced code block containing the cookie-file example
(the block showing "a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx") is
missing a language identifier and triggers markdownlint MD040; update that
fenced block to include a language specifier such as "text" (or "sh") so the
block becomes ```text ... ``` to mark it as plain text.

---

Outside diff comments:
In `@apps/api/src/config.py`:
- Around line 82-90: Remove the unused legacy XHS config variables from
config.py by deleting the definitions XIAOHONGSHU_COOKIES, XHS_PHONE_LIST,
XHS_IP_PROXY_LIST, XHS_ENABLE_IP_PROXY, and XHS_SAVE_LOGIN_STATE; ensure you
leave the current XHS settings (XHS_COOKIE_STRING and XHS_SIGN_SERVER_URL)
intact, and run a quick repo-wide search for those removed symbols to confirm
they are not referenced elsewhere before committing.

---

Nitpick comments:
In `@apps/api/src/config.py`:
- Around line 93-94: The two mid-file imports (SIGN_SERVER_URL as
XHS_SIGN_SERVER_URL and XHS_COOKIE_PATH as _XHS_COOKIE_PATH from
fastfetchbot_shared.config) must be moved to the top with the other
fastfetchbot_shared imports; remove the duplicate lines at lines 93–94 and add
those two import names alongside the existing fastfetchbot_shared import block
so XHS_SIGN_SERVER_URL and _XHS_COOKIE_PATH are declared together with the
module-level imports.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a023a29 and 0d8b5c3.

📒 Files selected for processing (20)
  • .gitignore
  • CLAUDE.md
  • README.md
  • apps/api/src/config.py
  • apps/api/src/services/scrapers/xiaohongshu/__init__.py
  • apps/api/src/services/scrapers/xiaohongshu/adaptar.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/__init__.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/base_crawler.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/client.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/core.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/exception.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/field.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/help.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/login.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/proxy_account_pool.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/utils.py
  • apps/api/src/templates/xiaohongshu_content.jinja2
  • packages/shared/fastfetchbot_shared/config.py
  • packages/shared/fastfetchbot_shared/utils/config.py
  • template.env
💤 Files with no reviewable changes (10)
  • apps/api/src/services/scrapers/xiaohongshu/xhs/login.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/exception.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/help.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/client.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/field.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/init.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/base_crawler.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/proxy_account_pool.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/utils.py
  • apps/api/src/services/scrapers/xiaohongshu/xhs/core.py

Comment on lines +225 to +260
async def _fetch_note_by_html(
self,
note_id: str,
xsec_token: str,
xsec_source: str,
) -> Optional[Dict[str, Any]]:
detail_url = f"{XHS_WEB_URL}/explore/{note_id}"
if xsec_token:
detail_url += f"?xsec_token={xsec_token}&xsec_source={xsec_source}"

resp = await self._http.get(detail_url, headers=self._base_headers())
if resp.status_code != 200:
return None

text = resp.text or ""
match = re.search(r"window.__INITIAL_STATE__=({.*})</script>", text)
if not match:
return None

try:
state = json.loads(match.group(1).replace("undefined", "null"))
note = (
state.get("note", {})
.get("noteDetailMap", {})
.get(note_id, {})
.get("note")
)
except json.JSONDecodeError:
return None

if not note:
return None
note["xsec_token"] = xsec_token
note["xsec_source"] = xsec_source
return self._normalize_note(note)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the HTML state regex DOTALL and non-greedy.

The HTML JSON blob often spans newlines, so the current regex is likely to miss and make the fallback path fail.

🔧 Suggested fix
-        match = re.search(r"window.__INITIAL_STATE__=({.*})</script>", text)
+        match = re.search(
+            r"window.__INITIAL_STATE__=(\{.*?\})</script>", text, re.S
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py` around lines 225 -
260, The regex in _fetch_note_by_html currently uses ({.*}) which is greedy and
won't match across newlines; change the search to use a non-greedy capture and
DOTALL mode so the JSON blob spanning multiple lines is matched (e.g., use a
pattern like "window.__INITIAL_STATE__=({.*?})</script>" with the re.DOTALL
flag) when calling re.search(...) so match.group(1) yields the full JSON state
for json.loads.

Comment on lines +438 to +459
@staticmethod
def _extract_video_urls(note_item: Dict[str, Any]) -> List[str]:
if note_item.get("type") != "video":
return []
video = note_item.get("video", {}) or {}
consumer = video.get("consumer", {}) or {}
origin_video_key = consumer.get("origin_video_key") or consumer.get("originVideoKey")
if origin_video_key:
return [f"http://sns-video-bd.xhscdn.com/{origin_video_key}"]

urls = []
h264_list = (
(video.get("media", {}) or {})
.get("stream", {})
.get("h264", [])
or []
)
for item in h264_list:
master_url = item.get("master_url")
if master_url:
urls.append(master_url)
return urls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists
git ls-files | grep -i "xiaohongshu"

Repository: aturret/FastFetchBot

Length of output: 275


🏁 Script executed:

# Search for HTTP URLs in the codebase
rg "http://" --type py | head -20

Repository: aturret/FastFetchBot

Length of output: 1272


🏁 Script executed:

# Check for any configuration or environment variables related to this CDN
rg "sns-video-bd|xhscdn" --type py

Repository: aturret/FastFetchBot

Length of output: 191


🏁 Script executed:

# Look for similar patterns with HTTPS in other scrapers
rg "https://.*\.xhscdn\.com" --type py

Repository: aturret/FastFetchBot

Length of output: 46


🏁 Script executed:

# Check for comments or documentation about xiaohongshu/CDN
rg -A 5 -B 5 "xhscdn|xiaohongshu.*cdn" --type py

Repository: aturret/FastFetchBot

Length of output: 1099


🏁 Script executed:

# Look at the adaptar.py file to see more context around this function
cat -n apps/api/src/services/scrapers/xiaohongshu/adaptar.py | head -100

Repository: aturret/FastFetchBot

Length of output: 4388


🏁 Script executed:

# Check if there's any environment configuration for URL schemes
rg "HTTPS|https_only|secure" --type py apps/api/src/

Repository: aturret/FastFetchBot

Length of output: 46


🌐 Web query:

xiaohongshu xhscdn HTTPS support CDN

💡 Result:

  • xhscdn.com / xhscdn.net are domains used by Xiaohongshu (XHS) for serving static assets (images/video) via many subdomains (it’s not a “public CDN you sign up for”). XHS’s own security center explicitly identifies *.xhscdn.com and *.xhscdn.net as their CDN domains used for user-uploaded content. [1]
  • HTTPS support is not consistent at the apex/root domain: multiple checkers report that xhscdn.com itself (the bare domain) does not support HTTPS. [2]
  • Many specific subdomains do support HTTPS/TLS (i.e., they present valid certificates), e.g. magpie.xhscdn.com is reported as having a valid SSL certificate. [3]

Implication: if you’re trying to fetch an XHS asset, you typically need to use the exact subdomain hostname XHS provides (often works over https://<subdomain>.xhscdn.com/...), rather than assuming https://xhscdn.com/... will work. [2][3]

Sources
[1] Xiaohongshu Security Response Center: CDN domains include *.xhscdn.com / *.xhscdn.net
[2] Hypestat SSL checker result for xhscdn.com (reports no HTTPS on apex)
[3] ScamAdviser report for magpie.xhscdn.com (reports valid SSL certificate)


Use HTTPS for CDN video URLs to prevent mixed content warnings.

The hardcoded HTTP URL will trigger mixed content warnings when served in HTTPS contexts. While the base xhscdn.com domain has inconsistent HTTPS support, individual subdomains like sns-video-bd.xhscdn.com typically support HTTPS. This aligns with other Xiaohongshu API calls in this file which already use HTTPS (XHS_API_URL, XHS_WEB_URL).

🔧 Suggested fix
-            return [f"http://sns-video-bd.xhscdn.com/{origin_video_key}"]
+            return [f"https://sns-video-bd.xhscdn.com/{origin_video_key}"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/services/scrapers/xiaohongshu/adaptar.py` around lines 438 -
459, The _extract_video_urls method builds a hardcoded HTTP CDN URL using
origin_video_key which can cause mixed-content warnings; update the returned URL
to use HTTPS (i.e., change the scheme for the sns-video-bd.xhscdn.com URL built
from origin_video_key in _extract_video_urls) and keep the logic that falls back
to collecting master_url entries from the h264 stream unchanged.

Comment on lines +182 to +184
```
a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced code block.

The cookie-file example block has no language specifier, which triggers markdownlint MD040. Since it's a plain text example, use text (or sh).

🔧 Proposed fix
-  ```
+  ```text
   a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 182-182: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 182 - 184, The fenced code block containing the
cookie-file example (the block showing "a1=xxxxxxxx; web_id=xxxxxxxx;
web_session=xxxxxxxx") is missing a language identifier and triggers
markdownlint MD040; update that fenced block to include a language specifier
such as "text" (or "sh") so the block becomes ```text ... ``` to mark it as
plain text.

aturret and others added 4 commits February 23, 2026 17:33
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@aturret aturret merged commit f664418 into main Feb 24, 2026
1 of 2 checks passed
@aturret aturret deleted the xhs-update branch February 24, 2026 00:58
@coderabbitai coderabbitai bot mentioned this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant