aturret · aturret · Feb 24, 2026 · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026
diff --git a/.gitignore b/.gitignore
@@ -272,3 +272,4 @@ conf/*
 .DS_Store
 /.claude/
 /apps/worker/conf/
+apps/worker/celerybeat-schedule.db
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -33,7 +33,7 @@ The Telegram Bot communicates with the API server over HTTP (`API_SERVER_URL`).
 - **`main.py`** — FastAPI app setup, Sentry integration, lifecycle management
 - **`config.py`** — Environment variable handling, platform credentials
 - **`routers/`** — `scraper.py` (generic endpoint), `scraper_routers.py` (platform-specific), `inoreader.py`, `wechat.py`
-- **`services/scrapers/`** — `scraper_manager.py` orchestrates platform scrapers (twitter, weibo, bluesky, xiaohongshu, reddit, instagram, zhihu, douban, threads, wechat, general)
+- **`services/scrapers/`** — `scraper_manager.py` orchestrates platform scrapers (twitter, weibo, bluesky, xiaohongshu, reddit, instagram, zhihu, douban, threads, wechat, general); the Xiaohongshu scraper uses `xiaohongshu/adaptar.py` (`XhsSinglePostAdapter`) with an external sign server instead of the old Playwright-based crawler
 - **`services/file_export/`** — PDF generation, audio transcription (OpenAI), video download
 - **`services/amazon/s3.py`** — S3 storage integration
 - **`services/telegraph/`** — Telegraph content publishing
@@ -50,7 +50,7 @@ The Telegram Bot communicates with the API server over HTTP (`API_SERVER_URL`).
 
 ### Shared Library (`packages/shared/fastfetchbot_shared/`)
 
-- **`config.py`** — URL patterns (SOCIAL_MEDIA_WEBSITE_PATTERNS, VIDEO_WEBSITE_PATTERNS, BANNED_PATTERNS)
+- **`config.py`** — URL patterns (SOCIAL_MEDIA_WEBSITE_PATTERNS, VIDEO_WEBSITE_PATTERNS, BANNED_PATTERNS); shared env vars including `SIGN_SERVER_URL` and `XHS_COOKIE_PATH`
 - **`models/`** — `classes.py` (NamedBytesIO), `metadata_item.py`, `telegraph_item.py`, `url_metadata.py`
 - **`utils/`** — `parse.py` (URL parsing, HTML processing, `get_env_bool`), `image.py`, `logger.py`, `network.py`
 
@@ -128,6 +128,8 @@ See `template.env` for a complete reference. Key variables:
 - Most scrapers require authentication cookies/tokens
 - Use browser extension "Get cookies.txt LOCALLY" to extract cookies
 - Store Zhihu cookies in `conf/zhihu_cookies.json`
+- Store Xiaohongshu cookies in `conf/xhs_cookies.txt` (single-line cookie string, e.g. `a1=x; web_id=x; web_session=x`)
+- Xiaohongshu also requires an external **sign server** reachable at `SIGN_SERVER_URL` (default `http://localhost:8989`); the sign server is currently closed-source — you must supply your own compatible implementation
 - See `template.env` for all platform-specific variables (Twitter, Weibo, Xiaohongshu, Reddit, Instagram, Bluesky, etc.)
 
 ### Database

diff --git a/README.md b/README.md
@@ -154,10 +154,46 @@ See `template.env` for a complete reference with comments.
 | Twitter | `TWITTER_CT0`, `TWITTER_AUTH_TOKEN` |
 | Reddit | `REDDIT_CLIENT_ID`, `REDDIT_CLIENT_SECRET`, `REDDIT_USERNAME`, `REDDIT_PASSWORD` |
 | Weibo | `WEIBO_COOKIES` |
-| Xiaohongshu | `XIAOHONGSHU_A1`, `XIAOHONGSHU_WEBID`, `XIAOHONGSHU_WEBSESSION` |
+| Xiaohongshu | See [Xiaohongshu Setup](#xiaohongshu-setup) below |
 | Instagram | `X_RAPIDAPI_KEY` |
 | Zhihu | Store cookies in `conf/zhihu_cookies.json` |
 
+#### Xiaohongshu Setup
+
+Xiaohongshu (XHS) API requests require a cryptographic signature (`x-s`, `x-t`, etc.) that must be computed by a dedicated signing proxy. FastFetchBot delegates this to an external **sign server**.
+
+> **Note:** We currently use a closed-source sign server. You will need to run your own compatible signing proxy and point `SIGN_SERVER_URL` at it.
+
+The sign server must accept `POST /signsrv/v1/xhs/sign` with a JSON body:
+
+```json
+{"uri": "/api/sns/web/v1/feed", "data": {...}, "cookies": "a1=..."}
+```
+
+and return:
+
+```json
+{"isok": true, "data": {"x_s": "...", "x_t": "...", "x_s_common": "...", "x_b3_traceid": "..."}}
+```
+
+**Cookie configuration** (two options; file takes priority):
+
+- **File (recommended):** Create `apps/api/conf/xhs_cookies.txt` containing your XHS cookies as a single line:
+  ```
+  a1=xxxxxxxx; web_id=xxxxxxxx; web_session=xxxxxxxx
+  ```
+  Log in to [xiaohongshu.com](https://www.xiaohongshu.com) in your browser, then copy the cookie values from DevTools → Application → Cookies, or use the [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) extension.
+
+- **Environment variables (legacy fallback):** Set `XIAOHONGSHU_A1`, `XIAOHONGSHU_WEBID`, and `XIAOHONGSHU_WEBSESSION` individually. Used only when the cookie file is absent.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SIGN_SERVER_URL` | `http://localhost:8989` | URL of the XHS signing proxy |
+| `XHS_COOKIE_PATH` | `conf/xhs_cookies.txt` | Path to cookie file (overrides default location) |
+| `XIAOHONGSHU_A1` | `None` | `a1` cookie value (legacy fallback) |
+| `XIAOHONGSHU_WEBID` | `None` | `web_id` cookie value (legacy fallback) |
+| `XIAOHONGSHU_WEBSESSION` | `None` | `web_session` cookie value (legacy fallback) |
+
 #### Cloud Services
 
 | Variable | Description |
@@ -193,7 +229,7 @@ See `template.env` for a complete reference with comments.
 - [x] WeChat Public Account Articles
 - [x] Zhihu
 - [x] Douban
-- [ ] Xiaohongshu
+- [x] Xiaohongshu
 
 ### Video
 
@@ -211,7 +247,7 @@ The GitHub Actions pipeline (`.github/workflows/ci.yml`) automatically builds an
 
 The HTML to Telegra.ph converter function is based on [html-telegraph-poster](https://github.com/mercuree/html-telegraph-poster). I separated it from this project as an independent Python package: [html-telegraph-poster-v2](https://github.com/aturret/html-telegraph-poster-v2).
 
-The Xiaohongshu scraper is based on [MediaCrawler](https://github.com/NanmiCoder/MediaCrawler).
+The original Xiaohongshu scraper was based on [MediaCrawler](https://github.com/NanmiCoder/MediaCrawler). The current implementation uses a custom httpx-based adapter with an external signing proxy.
 
 The Weibo scraper is based on [weiboSpider](https://github.com/dataabc/weiboSpider).
 

diff --git a/apps/api/src/config.py b/apps/api/src/config.py
@@ -6,6 +6,7 @@
 import gettext
 import secrets
 
+from fastfetchbot_shared.utils.logger import logger
 from fastfetchbot_shared.utils.parse import get_env_bool
 
 env = os.environ
@@ -89,6 +90,32 @@
 XHS_ENABLE_IP_PROXY = get_env_bool(env, "XHS_ENABLE_IP_PROXY", False)
 XHS_SAVE_LOGIN_STATE = get_env_bool(env, "XHS_SAVE_LOGIN_STATE", True)
 
+# XHS sign server and cookie file
+from fastfetchbot_shared.config import SIGN_SERVER_URL as XHS_SIGN_SERVER_URL
+from fastfetchbot_shared.config import XHS_COOKIE_PATH as _XHS_COOKIE_PATH
+
+xhs_cookie_path = _XHS_COOKIE_PATH or os.path.join(conf_dir, "xhs_cookies.txt")
+
+# Load XHS cookies from file (similar to Zhihu cookie loading)
+XHS_COOKIE_STRING = ""
+if os.path.exists(xhs_cookie_path):
+    try:
+        with open(xhs_cookie_path, "r", encoding="utf-8") as f:
+            XHS_COOKIE_STRING = f.read().strip()
+    except (IOError, OSError) as e:
+        logger.error(f"Error reading XHS cookie file: {e}")
+        XHS_COOKIE_STRING = ""
+else:
+    # Fallback: build cookie string from individual env vars (backward compat)
+    cookie_parts = []
+    if XIAOHONGSHU_A1:
+        cookie_parts.append(f"a1={XIAOHONGSHU_A1}")
+    if XIAOHONGSHU_WEBID:
+        cookie_parts.append(f"web_id={XIAOHONGSHU_WEBID}")
+    if XIAOHONGSHU_WEBSESSION:
+        cookie_parts.append(f"web_session={XIAOHONGSHU_WEBSESSION}")
+    XHS_COOKIE_STRING = "; ".join(cookie_parts)
+
 # Zhihu
 FXZHIHU_HOST = env.get("FXZHIHU_HOST", "fxzhihu.com")
 

diff --git a/apps/api/src/services/scrapers/xiaohongshu/__init__.py b/apps/api/src/services/scrapers/xiaohongshu/__init__.py
@@ -1,23 +1,14 @@
-import asyncio
 from typing import Any
-from urllib.parse import urlparse
-
-import httpx
-import jmespath
 
 from fastfetchbot_shared.models.metadata_item import MetadataItem, MediaFile, MessageType
-from fastfetchbot_shared.utils.network import HEADERS
-from src.config import JINJA2_ENV, HTTP_REQUEST_TIMEOUT
-from .xhs.core import XiaoHongShuCrawler
-from .xhs.client import XHSClient
-from .xhs import proxy_account_pool
-
 from fastfetchbot_shared.utils.logger import logger
 from fastfetchbot_shared.utils.parse import (
     unix_timestamp_to_utc,
     get_html_text_length,
     wrap_text_into_html,
 )
+from src.config import JINJA2_ENV, XHS_COOKIE_STRING, XHS_SIGN_SERVER_URL
+from .adaptar import XhsSinglePostAdapter
 
 environment = JINJA2_ENV
 short_text_template = environment.get_template("xiaohongshu_short_text.jinja2")
@@ -42,78 +33,51 @@ def __init__(self, url: str, data: Any, **kwargs):
         self.raw_content = None
 
     async def get_item(self) -> dict:
-        await self.get_xiaohongshu()
+        await self._get_xiaohongshu()
         return self.to_dict()
 
-    async def get_xiaohongshu(self) -> None:
-        if self.url.find("xiaohongshu.com") == -1:
-            async with httpx.AsyncClient() as client:
-                resp = await client.get(
-                    self.url,
-                    headers=HEADERS,
-                    follow_redirects=True,
-                    timeout=HTTP_REQUEST_TIMEOUT,
-                )
-                if (
-                    resp.history
-                ):  # if there is a redirect, the request will have a response chain
-                    for h in resp.history:
-                        print(h.status_code, h.url)
-                    self.url = str(resp.url)
-        urlparser = urlparse(self.url)
-        self.id = urlparser.path.split("/")[-1]
-        crawler = XiaoHongShuCrawler()
-        account_pool = proxy_account_pool.create_account_pool()
-        crawler.init_config("xhs", "cookie", account_pool)
-        note_detail = None
-        for _ in range(5):
-            try:
-                note_detail = await crawler.start(id=self.id)
-                break
-            except Exception as e:
-                await asyncio.sleep(3)
-                logger.error(f"error: {e}")
-                logger.error(f"retrying...")
-        if not note_detail:
-            raise Exception("重试了这么多次还是无法签名成功，寄寄寄")
-        # logger.debug(f"json_data: {json.dumps(note_detail, ensure_ascii=False, indent=4)}")
-        parsed_data = self.process_note_json(note_detail)
-        await self.process_xiaohongshu_note(parsed_data)
+    async def _get_xiaohongshu(self) -> None:
+        async with XhsSinglePostAdapter(
+            cookies=XHS_COOKIE_STRING,
+            sign_server_endpoint=XHS_SIGN_SERVER_URL,
+        ) as adapter:
+            result = await adapter.fetch_post(note_url=self.url)
+        note = result["note"]
+        self.id = note.get("note_id")
+        self.url = result["url"]
+        await self._process_xiaohongshu_note(note)
 
-    async def process_xiaohongshu_note(self, json_data: dict):
+    async def _process_xiaohongshu_note(self, json_data: dict):
+        user = json_data.get("user", {}) or {}
         self.title = json_data.get("title")
-        self.author = json_data.get("author")
+        self.author = user.get("nickname")
         if not self.title and self.author:
             self.title = f"{self.author}的小红书笔记"
-        self.author_url = "https://www.xiaohongshu.com/user/profile/" + json_data.get(
-            "user_id"
+        self.author_url = (
+            "https://www.xiaohongshu.com/user/profile/" + user.get("user_id", "")
         )
-        self.raw_content = json_data.get("raw_content")
-        logger.debug(f"{json_data.get('created')}")
+        self.raw_content = json_data.get("desc", "")
+        raw_time = json_data.get("time", 0)
+        raw_updated = json_data.get("last_update_time", 0)
         self.created = (
-            unix_timestamp_to_utc(json_data.get("created") / 1000)
-            if json_data.get("created")
-            else None
+            unix_timestamp_to_utc(int(raw_time) / 1000) if raw_time else None
         )
         self.updated = (
-            unix_timestamp_to_utc(json_data.get("updated") / 1000)
-            if json_data.get("updated")
-            else None
+            unix_timestamp_to_utc(int(raw_updated) / 1000) if raw_updated else None
         )
-        self.like_count = json_data.get("like_count")
+        self.like_count = json_data.get("liked_count")
         self.collected_count = json_data.get("collected_count")
         self.comment_count = json_data.get("comment_count")
         self.share_count = json_data.get("share_count")
         self.ip_location = json_data.get("ip_location")
-        if json_data.get("image_list"):
-            for image_url in json_data.get("image_list"):
-                self.media_files.append(MediaFile(url=image_url, media_type="image"))
-        if json_data.get("video"):
-            self.media_files.append(
-                MediaFile(url=json_data.get("video"), media_type="video")
-            )
+        for image_url in json_data.get("image_list", []) or []:
+            self.media_files.append(MediaFile(url=image_url, media_type="image"))
+        video_urls = json_data.get("video_urls", []) or []
+        if video_urls:
+            self.media_files.append(MediaFile(url=video_urls[0], media_type="video"))
         data = self.__dict__
-        data["raw_content"] = data["raw_content"].replace("\t", "")
+        raw_content = self.raw_content or ""
+        data["raw_content"] = raw_content.replace("\t", "")
         if data["raw_content"].endswith("\n"):
             data["raw_content"] = data["raw_content"][:-1]
         self.text = short_text_template.render(data=data)
@@ -124,30 +88,7 @@ async def process_xiaohongshu_note(self, json_data: dict):
             if media_file.media_type == "image":
                 data["raw_content"] += f'<p><img src="{media_file.url}" alt=""/></p>'
             elif media_file.media_type == "video":
-                data[
-                    "raw_content"
-                ] += (
+                data["raw_content"] += (
                     f'<p><video src="{media_file.url}" controls="controls"></video></p>'
                 )
         self.content = content_template.render(data=data)
-
-    @staticmethod
-    def process_note_json(json_data: dict):
-        expression = """
-        {
-        title: title,
-        raw_content: desc,
-        author: user.nickname,
-        user_id: user.user_id,
-        image_list: image_list[*].url,
-        video: video.media.stream.h264[0].master_url,
-        like_count: interact_info.liked_count,
-        collected_count: interact_info.collected_count,
-        comment_count: interact_info.comment_count,
-        share_count: interact_info.share_count,
-        ip_location: ip_location,
-        created: time,
-        updated: last_update_time
-        }
-        """
-        return jmespath.search(expression, json_data)