新增 GO 题库集成、多题库回退机制以及答案匹配优化#598
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughTiku provider loading now accepts ordered, comma-separated provider names resolved via a new PROVIDER_REGISTRY. Added TikuFallback (ordered fallback chain) and TikuGo (GO API client with rate-limiting, retries, and validation). Answer normalization and single-answer checks were tightened; config and init parsing updated. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TikuManager as Tiku.get_tiku_from_config / TikuFallback
participant ProviderA as TikuGo
participant ProviderB as Provider (e.g., TikuYanxi)
participant External as GO_API
Client->>TikuManager: request(query, q_info)
TikuManager->>ProviderA: _query(query, q_info)
alt ProviderA uses external API
ProviderA->>External: HTTP POST (rate-limited, retried)
External-->>ProviderA: response (status, JSON)
ProviderA->>TikuManager: answer or non-answer
end
alt ProviderA returned non-answer or failed
TikuManager->>ProviderB: _query(query, q_info)
ProviderB-->>TikuManager: answer or non-answer
end
TikuManager->>Client: final validated answer (after check_answer)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/answer.py (1)
463-523: Consider releasing lock before sleeping during retry backoff.The
_request_lockis acquired at line 466 and held during the entire request, including the retry sleep at lines 504 and 517. While this ensures strict rate limiting, it also blocks other threads from making requests during the backoff sleep period.If you want to allow other threads to proceed during retry backoff sleeps, consider restructuring to release the lock before sleeping:
# Current: lock held during sleep with self._request_lock: # ... request code ... if is_throttled: time.sleep(sleep_seconds) # Lock still held # Alternative: release lock before sleep with self._request_lock: # ... request code ... if is_throttled: time.sleep(sleep_seconds) # Lock released, other threads can proceed continueHowever, the current implementation is also valid if strict serialization is desired for rate limiting. This is a design trade-off worth documenting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/answer.py` around lines 463 - 523, The _query_once method currently holds self._request_lock during the entire request and also while sleeping on retry backoff; change the flow so the lock is released before any time.sleep for retry backoff: keep the request/last_request_time update inside the with self._request_lock block, but when you detect throttling (the branches that compute sleep_seconds and call logger.warning + time.sleep), move the time.sleep and continue to execute after the with block has exited (i.e., compute sleep_seconds and set a flag inside the locked section, then release the lock, perform time.sleep(sleep_seconds) and continue), updating both places that handle throttled retries (the code path checking code != 1 and the '李恒雅' path) to follow this pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/answer_check.py`:
- Around line 1-17: The function check_single currently treats certain
separators in strong_delimiters as evidence of multiple answers; remove the "/"
entry from the strong_delimiters list so answers like "TCP/IP" or "男/女" are not
falsely classified as multi-answer; update the strong_delimiters variable in
check_single to exclude "/" and keep the rest of the splitting logic unchanged
to only reject answers with clear multi-answer separators.
---
Nitpick comments:
In `@api/answer.py`:
- Around line 463-523: The _query_once method currently holds self._request_lock
during the entire request and also while sleeping on retry backoff; change the
flow so the lock is released before any time.sleep for retry backoff: keep the
request/last_request_time update inside the with self._request_lock block, but
when you detect throttling (the branches that compute sleep_seconds and call
logger.warning + time.sleep), move the time.sleep and continue to execute after
the with block has exited (i.e., compute sleep_seconds and set a flag inside the
locked section, then release the lock, perform time.sleep(sleep_seconds) and
continue), updating both places that handle throttled retries (the code path
checking code != 1 and the '李恒雅' path) to follow this pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5aa53f1-2d6c-49e3-9c3e-60ab7d0bd566
📒 Files selected for processing (5)
api/answer.pyapi/answer_check.pyapi/base.pyconfig_template.inimain.py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/answer_check.py`:
- Around line 11-14: check_single() is using a wrong set of strong_delimiters
causing "A、C" to be mis-accepted and "R&D" mis-rejected; update the
strong_delimiters list used in check_single (the strong_delimiters variable
inside the check_single function) to match the delimiter behavior of cut() by
including the ideographic comma '、' and removing '&' (or otherwise align both
functions' delimiter sets), so both check_single and cut use the same
delimiters; optionally note this is a minimal fix and a future improvement is to
make these checks token-aware.
In `@api/answer.py`:
- Around line 352-355: The loop in _query calls provider._query(q_info) directly
so any exception from a single provider (e.g., PermissionError from TikuYanxi or
JSON/request errors from TikuAdapter) aborts the fallback chain; wrap the call
to provider._query(q_info) in a try/except that catches Exception (or specific
known exceptions), log the error including provider identity and exception
details, and continue to the next provider so one failing provider doesn't stop
others from being attempted; keep the method signature and behavior (return
first non-empty answer or None) unchanged.
- Around line 431-435: The code currently sends sensitive go_authorization in
the Authorization header to an http:// endpoint (self.api) which leaks
credentials; update the implementation so requests never transmit Authorization
over plain HTTP: change the default API URL (self.api) to use https:// if the
upstream supports TLS, or add a runtime check in the request path (where
Authorization is attached — references: self._headers, self.api, and the code
that sets/sends go_authorization) to refuse or strip the Authorization header
when the URL scheme is http and raise/return an error instructing the caller to
use HTTPS; also remove reliance on verify=True as a fix for plain HTTP and
ensure TLS certificate verification is enabled for HTTPS requests.
- Around line 272-280: The current validation uses globals() which treats any
global symbol as a valid provider; replace this with an explicit provider
registry: create/consume a PROVIDER_REGISTRY (dict) mapping provider names to
their classes and update the validation logic (where `providers` is checked and
`invalid_providers` is computed) to check membership in PROVIDER_REGISTRY
instead of globals(); when instantiating a single provider (the `new_cls =
globals()[providers[0]]()` line) instantiate via
PROVIDER_REGISTRY[providers[0]]() and, as an extra safeguard, ensure the mapped
class is a subclass/instance of the Tiku base class before calling config_set or
using it (and keep existing DISABLE and logger.error behavior for invalid
names).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 58ca5dd0-e882-455a-b742-083224daa048
📒 Files selected for processing (2)
api/answer.pyapi/answer_check.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
api/answer.py (2)
539-543: Consider adding validation for numeric config values.The
float()andint()conversions could raiseValueErrorif the user provides invalid config values. While this follows the existing pattern in other providers, adding try/except with sensible defaults would improve robustness.♻️ Optional defensive parsing
def _init_tiku(self): self._headers['Authorization'] = self._conf.get('go_authorization', self._headers['Authorization']) - self._min_interval = float(self._conf.get('go_min_interval', self._min_interval)) - self._retry_times = int(self._conf.get('go_retry_times', self._retry_times)) - self._retry_backoff = float(self._conf.get('go_retry_backoff', self._retry_backoff)) + try: + self._min_interval = float(self._conf.get('go_min_interval', self._min_interval)) + except (ValueError, TypeError): + logger.warning(f'{self.name} go_min_interval 配置无效,使用默认值 {self._min_interval}') + try: + self._retry_times = int(self._conf.get('go_retry_times', self._retry_times)) + except (ValueError, TypeError): + logger.warning(f'{self.name} go_retry_times 配置无效,使用默认值 {self._retry_times}') + try: + self._retry_backoff = float(self._conf.get('go_retry_backoff', self._retry_backoff)) + except (ValueError, TypeError): + logger.warning(f'{self.name} go_retry_backoff 配置无效,使用默认值 {self._retry_backoff}')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/answer.py` around lines 539 - 543, In _init_tiku, wrap the numeric conversions for _min_interval (float), _retry_times (int), and _retry_backoff (float) in try/except blocks to catch ValueError/TypeError and fall back to the existing defaults (self._min_interval, self._retry_times, self._retry_backoff); also validate non-negative or sensible ranges if desired, and log or silently ignore malformed values; keep the Authorization header assignment as-is (self._headers['Authorization'] = ...).
477-496: Network exceptions bypass the retry mechanism.When a
RequestExceptionoccurs (e.g., timeout, connection error), the method returnsNoneimmediately without utilizing the retry loop. Transient network issues could benefit from retrying, similar to throttling responses.♻️ Proposed fix to include network errors in retry logic
def _query_once(self, question: str) -> Optional[str]: for attempt in range(1, self._retry_times + 1): try: with self._request_lock: # 主动限速,降低触发 GO 题库并发/流控限制的概率。 now = time.time() wait_time = self._min_interval - (now - self._last_request_time) if wait_time > 0: time.sleep(wait_time) res = requests.post( self.api, data={'question': question}, headers=self._headers, verify=True, timeout=15 ) self._last_request_time = time.time() except requests.exceptions.RequestException as e: - logger.error(f'{self.name}查询异常: {e}') - return None + logger.warning(f'{self.name}查询异常 ({attempt}/{self._retry_times}): {e}') + if attempt < self._retry_times: + sleep_seconds = self._retry_backoff * attempt + time.sleep(sleep_seconds) + continue + logger.error(f'{self.name}查询失败,已达最大重试次数') + return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/answer.py` around lines 477 - 496, The network exception handling in _query_once currently returns None inside the except block, which bypasses the retry loop; change it so RequestException is logged (include the exception) but does not return immediately — instead let the loop continue until self._retry_times is exhausted (and after each failed attempt optionally sleep/backoff respecting self._min_interval and updating self._last_request_time as appropriate), and only return None after all attempts fail; ensure you reference the same symbols (method _query_once, self._retry_times, self._min_interval, self._last_request_time, logger, requests.post) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@api/answer.py`:
- Around line 539-543: In _init_tiku, wrap the numeric conversions for
_min_interval (float), _retry_times (int), and _retry_backoff (float) in
try/except blocks to catch ValueError/TypeError and fall back to the existing
defaults (self._min_interval, self._retry_times, self._retry_backoff); also
validate non-negative or sensible ranges if desired, and log or silently ignore
malformed values; keep the Authorization header assignment as-is
(self._headers['Authorization'] = ...).
- Around line 477-496: The network exception handling in _query_once currently
returns None inside the except block, which bypasses the retry loop; change it
so RequestException is logged (include the exception) but does not return
immediately — instead let the loop continue until self._retry_times is exhausted
(and after each failed attempt optionally sleep/backoff respecting
self._min_interval and updating self._last_request_time as appropriate), and
only return None after all attempts fail; ensure you reference the same symbols
(method _query_once, self._retry_times, self._min_interval,
self._last_request_time, logger, requests.post) when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ff06211b-10d1-45c0-b38c-ddb6a46a6ac9
📒 Files selected for processing (2)
api/answer.pyapi/answer_check.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/answer.py (1)
480-493: Note: Request lock serializes all concurrent queries.The lock at line 480 is held during the entire request (sleep + HTTP POST), not just for timestamp coordination. This serializes all
TikuGorequests across threads. If this is intentional for strict rate-limit compliance, it's fine; otherwise, the lock could be released after updating timestamps.Alternative: narrower lock scope (optional)
try: with self._request_lock: - # 主动限速,降低触发 GO 题库并发/流控限制的概率。 now = time.time() wait_time = self._min_interval - (now - self._last_request_time) if wait_time > 0: time.sleep(wait_time) - res = requests.post( - self.api, - data={'question': question}, - headers=self._headers, - verify=True, - timeout=15 - ) self._last_request_time = time.time() + # 主动限速,降低触发 GO 题库并发/流控限制的概率。 + res = requests.post( + self.api, + data={'question': question}, + headers=self._headers, + verify=True, + timeout=15 + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/answer.py` around lines 480 - 493, The lock self._request_lock is held across the entire HTTP call, serializing all TikuGo requests; instead, acquire the lock only to compute and reserve the timing window and to update the timestamp: under self._request_lock read self._last_request_time and self._min_interval to compute sleep_time (max(0, last + min_interval - now)), release the lock, time.sleep(sleep_time) outside the lock, perform requests.post(...) without the lock, then re-acquire self._request_lock to set self._last_request_time = time.time() so only timestamp coordination (not the network call) is serialized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/answer.py`:
- Around line 513-514: The current int(res_json.get('code', 0)) conversion in
api/answer.py can raise ValueError for non-numeric strings; update the code that
assigns code (using res_json) to safely parse integers by wrapping the
conversion in a try/except ValueError (or using a small helper like safe_int)
and defaulting to 0 on failure, e.g., attempt int(str(res_json.get('code',
'')).strip()) inside try/except and set code = 0 if parsing fails so the rest of
the logic using code and answer remains robust.
---
Nitpick comments:
In `@api/answer.py`:
- Around line 480-493: The lock self._request_lock is held across the entire
HTTP call, serializing all TikuGo requests; instead, acquire the lock only to
compute and reserve the timing window and to update the timestamp: under
self._request_lock read self._last_request_time and self._min_interval to
compute sleep_time (max(0, last + min_interval - now)), release the lock,
time.sleep(sleep_time) outside the lock, perform requests.post(...) without the
lock, then re-acquire self._request_lock to set self._last_request_time =
time.time() so only timestamp coordination (not the network call) is serialized.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
功能概述
这个 PR 主要包含三项重要更新:新增 GO 题库集成、多题库回退机制以及答案匹配优化。
主要变更
1. 新增 GO 题库支持 (TikuGo)
2. 多题库回退功能 (TikuFallback)
3. 答案匹配优化
4. 配置更新
go_authorization、go_min_interval、go_retry_times、go_retry_backoff配置项修改文件
备注
Summary by CodeRabbit
New Features
Bug Fixes
Documentation