Conversation
📝 WalkthroughWalkthroughThis PR extends Zhihu scraper functionality with a new "api" method alongside the existing "fxzhihu" approach, introducing cookie-based direct API authentication via Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Configuration
participant Scraper as Zhihu Scraper
participant API as Zhihu API
participant Processor as Content Processor
Config->>Scraper: Load ZHIHU_Z_C0 or ZHIHU_COOKIES
Note over Scraper: Build ZHIHU_API_COOKIE
Scraper->>Scraper: Set method="api" if ZHIHU_API_COOKIE present
Scraper->>API: GET request with Cookie header
API-->>Scraper: Return JSON response
Scraper->>Processor: Pass raw_content to fix_images_and_links()
Processor-->>Scraper: Fixed HTML (images, links)
Scraper->>Processor: Pass HTML to unmask_zhihu_links()
Processor-->>Scraper: Unmasked links (decoded targets)
Scraper->>Scraper: Extract fields with .get() defaults
Scraper-->>Scraper: Return parsed content
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
apps/api/src/services/scrapers/zhihu/__init__.py (1)
796-798: Simplify origin_pin check using.get().As flagged by static analysis, the key check before dictionary access can be simplified. The current pattern
"origin_pin" in data and data["origin_pin"]can be replaced withdata.get("origin_pin").♻️ Proposed fix
- if "origin_pin" in data and data["origin_pin"]: - result["origin_pin_id"] = str(data["origin_pin"]["id"]) + origin_pin = data.get("origin_pin") + if origin_pin: + result["origin_pin_id"] = str(origin_pin["id"]) - result["origin_pin_data"] = Zhihu._resolve_status_api_data(data["origin_pin"]) + result["origin_pin_data"] = Zhihu._resolve_status_api_data(origin_pin)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/services/scrapers/zhihu/__init__.py` around lines 796 - 798, Replace the explicit key-and-truthy check "origin_pin" in data and data["origin_pin"] with a single get call: retrieve origin_pin = data.get("origin_pin") and if origin_pin: set result["origin_pin_id"] = str(origin_pin["id"]) and result["origin_pin_data"] = Zhihu._resolve_status_api_data(origin_pin); this simplifies the conditional and avoids double dictionary lookup while preserving behavior.apps/api/src/services/scrapers/zhihu/content_processing.py (1)
39-44: Unused loop variableindex.The loop variable
indexis not used in the loop body. Consider using an underscore prefix to indicate it's intentionally unused.♻️ Proposed fix
- for index, ref in sorted_refs: + for _index, ref in sorted_refs:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/services/scrapers/zhihu/content_processing.py` around lines 39 - 44, The loop in sorted_refs = sorted(references.items(), key=lambda x: int(x[0])) assigns the first element to index but never uses it; change the loop header in the block that builds items (currently "for index, ref in sorted_refs") to use an unused-variable name (e.g., "for _, ref in sorted_refs") so it's clear index is intentionally ignored while preserving the sorted_refs, references, items, and URL/html construction logic.tests/test_zhihu_content_processing.py (1)
1-11: Consider using a proper import path instead ofsys.pathmanipulation.The
sys.pathmanipulation is fragile and can break if the directory structure changes. Consider:
- Running tests from project root with proper
PYTHONPATH- Using relative imports if this is a package
- Adding a
conftest.pythat sets up the path onceHowever, the comment explains the rationale (avoiding heavy dependencies), so this is acceptable for now if the test infrastructure requires it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_zhihu_content_processing.py` around lines 1 - 11, The test currently mutates sys.path to import content_processing which is fragile; replace this inline sys.path manipulation by one of: (a) configure PYTHONPATH or run tests from project root so tests import the module with a normal import, (b) convert the tests folder into a package and use a relative import to import content_processing, or (c) create a conftest.py that adjusts sys.path once for the test suite; ensure tests still import fix_images_and_links, extract_references, and unmask_zhihu_links from content_processing and remove the sys.path.insert and os.path.dirname usage from tests/test_zhihu_content_processing.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/services/scrapers/zhihu/content_processing.py`:
- Around line 56-63: The try/except in the URL unmasking block silently swallows
errors; update the block in content_processing.py that parses href (uses
urlparse, parse_qs, and unquote to set a_tag["href"]) to catch Exception as e
and log a debug-level message with the original href and exception details using
Loguru (ensure logger is imported/available), then proceed without raising;
include contextual text like "failed to unmask href" in the log so failures are
discoverable during debugging.
---
Nitpick comments:
In `@apps/api/src/services/scrapers/zhihu/__init__.py`:
- Around line 796-798: Replace the explicit key-and-truthy check "origin_pin" in
data and data["origin_pin"] with a single get call: retrieve origin_pin =
data.get("origin_pin") and if origin_pin: set result["origin_pin_id"] =
str(origin_pin["id"]) and result["origin_pin_data"] =
Zhihu._resolve_status_api_data(origin_pin); this simplifies the conditional and
avoids double dictionary lookup while preserving behavior.
In `@apps/api/src/services/scrapers/zhihu/content_processing.py`:
- Around line 39-44: The loop in sorted_refs = sorted(references.items(),
key=lambda x: int(x[0])) assigns the first element to index but never uses it;
change the loop header in the block that builds items (currently "for index, ref
in sorted_refs") to use an unused-variable name (e.g., "for _, ref in
sorted_refs") so it's clear index is intentionally ignored while preserving the
sorted_refs, references, items, and URL/html construction logic.
In `@tests/test_zhihu_content_processing.py`:
- Around line 1-11: The test currently mutates sys.path to import
content_processing which is fragile; replace this inline sys.path manipulation
by one of: (a) configure PYTHONPATH or run tests from project root so tests
import the module with a normal import, (b) convert the tests folder into a
package and use a relative import to import content_processing, or (c) create a
conftest.py that adjusts sys.path once for the test suite; ensure tests still
import fix_images_and_links, extract_references, and unmask_zhihu_links from
content_processing and remove the sys.path.insert and os.path.dirname usage from
tests/test_zhihu_content_processing.py.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9954da61-4b3d-4755-9e5a-0d3388cc1e05
📒 Files selected for processing (6)
apps/api/src/config.pyapps/api/src/services/scrapers/zhihu/__init__.pyapps/api/src/services/scrapers/zhihu/config.pyapps/api/src/services/scrapers/zhihu/content_processing.pytemplate.envtests/test_zhihu_content_processing.py
| try: | ||
| parsed = urlparse(href) | ||
| qs = parse_qs(parsed.query) | ||
| target = qs.get("target", [None])[0] | ||
| if target: | ||
| a_tag["href"] = unquote(target) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Consider logging failed URL unmasking attempts.
The silent except: pass makes debugging difficult when URL parsing fails unexpectedly. Per coding guidelines, Loguru should be used for logging. Consider logging at debug level to aid troubleshooting without cluttering normal output.
🛠️ Proposed fix
+from fastfetchbot_shared.utils.logger import logger
from urllib.parse import urlparse, parse_qs, unquote try:
parsed = urlparse(href)
qs = parse_qs(parsed.query)
target = qs.get("target", [None])[0]
if target:
a_tag["href"] = unquote(target)
- except Exception:
- pass
+ except Exception as e:
+ logger.debug(f"Failed to unmask Zhihu link {href}: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| parsed = urlparse(href) | |
| qs = parse_qs(parsed.query) | |
| target = qs.get("target", [None])[0] | |
| if target: | |
| a_tag["href"] = unquote(target) | |
| except Exception: | |
| pass | |
| try: | |
| parsed = urlparse(href) | |
| qs = parse_qs(parsed.query) | |
| target = qs.get("target", [None])[0] | |
| if target: | |
| a_tag["href"] = unquote(target) | |
| except Exception as e: | |
| logger.debug(f"Failed to unmask Zhihu link {href}: {e}") |
🧰 Tools
🪛 Ruff (0.15.6)
[error] 62-63: try-except-pass detected, consider logging the exception
(S110)
[warning] 62-62: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/services/scrapers/zhihu/content_processing.py` around lines 56 -
63, The try/except in the URL unmasking block silently swallows errors; update
the block in content_processing.py that parses href (uses urlparse, parse_qs,
and unquote to set a_tag["href"]) to catch Exception as e and log a debug-level
message with the original href and exception details using Loguru (ensure logger
is imported/available), then proceed without raising; include contextual text
like "failed to unmask href" in the log so failures are discoverable during
debugging.
Summary by CodeRabbit
New Features
Improvements
Tests