Skip to content

新增 GO 题库集成、多题库回退机制以及答案匹配优化#598

Open
Bnao-zh wants to merge 8 commits intoSamueli924:mainfrom
Bnao-zh:main
Open

新增 GO 题库集成、多题库回退机制以及答案匹配优化#598
Bnao-zh wants to merge 8 commits intoSamueli924:mainfrom
Bnao-zh:main

Conversation

@Bnao-zh
Copy link
Copy Markdown

@Bnao-zh Bnao-zh commented Mar 30, 2026

功能概述

这个 PR 主要包含三项重要更新:新增 GO 题库集成、多题库回退机制以及答案匹配优化。

主要变更

1. 新增 GO 题库支持 (TikuGo)

  • 集成网课小工具题库 API:http://q.icodef.com/wyn-nb
  • 支持多种题干格式自动适配
  • 内置流控保护机制
    • 自动限速防止触发频控
    • 流控重试机制,最多重试 3 次
    • 可配置的请求间隔和重试参数
  • 特殊处理"李恒雅正在努力撰写中..."等占位符

2. 多题库回退功能 (TikuFallback)

  • 支持在配置中指定多个 provider 并用逗号分隔
  • 当前题库未命中时自动回退到下一个题库
  • 智能验证答案类型,类型不符也会继续回退
  • 支持级联连接检查

3. 答案匹配优化

  • 改进答案验证逻辑(api/answer_check.py)
  • 增强答案匹配的准确性(api/base.py)
  • 更完善的错误处理和日志记录

4. 配置更新

  • 新增 go_authorizationgo_min_intervalgo_retry_timesgo_retry_backoff 配置项
  • 支持通过配置灵活调整 GO 题库的行为

修改文件

  • api/answer.py - 新增 TikuGo 和 TikuFallback 类,优化 provider 加载逻辑
  • api/answer_check.py - 改进答案类型检查逻辑
  • api/base.py - 增强答案匹配算法
  • config_template.ini - 新增配置项

备注

  • 回退机制按 provider 配置顺序依次尝试
  • GO 题库内置了并发控制,建议保持 min_interval >= 1.0
  • 答案类型不符也会触发回退,确保高准确率

Summary by CodeRabbit

  • New Features

    • Configure multiple ordered providers with sequential fallback; added a new GO provider with title normalization, rate limiting, retries/backoff, and provider registry.
  • Bug Fixes

    • Stricter single-answer validation and delimiter handling.
    • Improved single/multiple-choice matching with similarity-based fallback and deduplication.
    • Fallback skips failing providers and validates answers before accepting; connection checks now consider active providers.
  • Documentation

    • Updated config template with multi-provider examples and GO-specific settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Tiku provider loading now accepts ordered, comma-separated provider names resolved via a new PROVIDER_REGISTRY. Added TikuFallback (ordered fallback chain) and TikuGo (GO API client with rate-limiting, retries, and validation). Answer normalization and single-answer checks were tightened; config and init parsing updated.

Changes

Cohort / File(s) Summary
API: provider resolution & exports
api/answer.py
Replace wildcard import with named import; export TikuFallback, TikuGo; parse provider as ordered list and resolve via PROVIDER_REGISTRY; return single provider or a TikuFallback chain.
New/extended providers
api/answer.py (contains TikuGo, TikuFallback)
Add TikuGo: title normalization/dedup, rate-limited POSTs, retry/backoff, response validation, throttling detection, special non-answer handling. Add TikuFallback: sequentially query providers, continue on exceptions/falsy answers, accept answers via check_answer, require all active providers for connection check.
Provider registry
api/answer.py
Introduce PROVIDER_REGISTRY mapping canonical provider names to classes for validation/instantiation.
Answer validation
api/answer_check.py
Tighten check_single(answer): early-reject None/empty, coerce to str, and detect multi-segment answers using focused delimiters (e.g., \n, `
Normalization & matching
api/base.py
Add normalize_text, get_option_text, best_option_by_similarity (SequenceMatcher); use similarity fallback for single- and multiple-choice extraction; deduplicate option letters via set.
Config template
config_template.ini
Document TikuGo provider; allow comma-separated provider fallback; add go_authorization, go_min_interval, go_retry_times, go_retry_backoff keys with defaults.
Initialization
main.py
Parse tiku_config["provider"] as comma-separated list and detect membership using any(...) across items.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TikuManager as Tiku.get_tiku_from_config / TikuFallback
    participant ProviderA as TikuGo
    participant ProviderB as Provider (e.g., TikuYanxi)
    participant External as GO_API

    Client->>TikuManager: request(query, q_info)
    TikuManager->>ProviderA: _query(query, q_info)
    alt ProviderA uses external API
        ProviderA->>External: HTTP POST (rate-limited, retried)
        External-->>ProviderA: response (status, JSON)
        ProviderA->>TikuManager: answer or non-answer
    end
    alt ProviderA returned non-answer or failed
        TikuManager->>ProviderB: _query(query, q_info)
        ProviderB-->>TikuManager: answer or non-answer
    end
    TikuManager->>Client: final validated answer (after check_answer)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Samueli924

Poem

"I nibbled logs beneath the code,
stitched providers in tidy mode,
retried the GO until it sang,
fallback branches softly sprang,
answers found where queries strode. 🐇"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: GO library integration, multi-library fallback mechanism, and answer matching optimization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
api/answer.py (1)

463-523: Consider releasing lock before sleeping during retry backoff.

The _request_lock is acquired at line 466 and held during the entire request, including the retry sleep at lines 504 and 517. While this ensures strict rate limiting, it also blocks other threads from making requests during the backoff sleep period.

If you want to allow other threads to proceed during retry backoff sleeps, consider restructuring to release the lock before sleeping:

# Current: lock held during sleep
with self._request_lock:
    # ... request code ...
    if is_throttled:
        time.sleep(sleep_seconds)  # Lock still held

# Alternative: release lock before sleep
with self._request_lock:
    # ... request code ...
if is_throttled:
    time.sleep(sleep_seconds)  # Lock released, other threads can proceed
    continue

However, the current implementation is also valid if strict serialization is desired for rate limiting. This is a design trade-off worth documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/answer.py` around lines 463 - 523, The _query_once method currently holds
self._request_lock during the entire request and also while sleeping on retry
backoff; change the flow so the lock is released before any time.sleep for retry
backoff: keep the request/last_request_time update inside the with
self._request_lock block, but when you detect throttling (the branches that
compute sleep_seconds and call logger.warning + time.sleep), move the time.sleep
and continue to execute after the with block has exited (i.e., compute
sleep_seconds and set a flag inside the locked section, then release the lock,
perform time.sleep(sleep_seconds) and continue), updating both places that
handle throttled retries (the code path checking code != 1 and the '李恒雅' path)
to follow this pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/answer_check.py`:
- Around line 1-17: The function check_single currently treats certain
separators in strong_delimiters as evidence of multiple answers; remove the "/"
entry from the strong_delimiters list so answers like "TCP/IP" or "男/女" are not
falsely classified as multi-answer; update the strong_delimiters variable in
check_single to exclude "/" and keep the rest of the splitting logic unchanged
to only reject answers with clear multi-answer separators.

---

Nitpick comments:
In `@api/answer.py`:
- Around line 463-523: The _query_once method currently holds self._request_lock
during the entire request and also while sleeping on retry backoff; change the
flow so the lock is released before any time.sleep for retry backoff: keep the
request/last_request_time update inside the with self._request_lock block, but
when you detect throttling (the branches that compute sleep_seconds and call
logger.warning + time.sleep), move the time.sleep and continue to execute after
the with block has exited (i.e., compute sleep_seconds and set a flag inside the
locked section, then release the lock, perform time.sleep(sleep_seconds) and
continue), updating both places that handle throttled retries (the code path
checking code != 1 and the '李恒雅' path) to follow this pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5aa53f1-2d6c-49e3-9c3e-60ab7d0bd566

📥 Commits

Reviewing files that changed from the base of the PR and between d4816a5 and e0ca38f.

📒 Files selected for processing (5)
  • api/answer.py
  • api/answer_check.py
  • api/base.py
  • config_template.ini
  • main.py

Comment thread api/answer_check.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/answer_check.py`:
- Around line 11-14: check_single() is using a wrong set of strong_delimiters
causing "A、C" to be mis-accepted and "R&D" mis-rejected; update the
strong_delimiters list used in check_single (the strong_delimiters variable
inside the check_single function) to match the delimiter behavior of cut() by
including the ideographic comma '、' and removing '&' (or otherwise align both
functions' delimiter sets), so both check_single and cut use the same
delimiters; optionally note this is a minimal fix and a future improvement is to
make these checks token-aware.

In `@api/answer.py`:
- Around line 352-355: The loop in _query calls provider._query(q_info) directly
so any exception from a single provider (e.g., PermissionError from TikuYanxi or
JSON/request errors from TikuAdapter) aborts the fallback chain; wrap the call
to provider._query(q_info) in a try/except that catches Exception (or specific
known exceptions), log the error including provider identity and exception
details, and continue to the next provider so one failing provider doesn't stop
others from being attempted; keep the method signature and behavior (return
first non-empty answer or None) unchanged.
- Around line 431-435: The code currently sends sensitive go_authorization in
the Authorization header to an http:// endpoint (self.api) which leaks
credentials; update the implementation so requests never transmit Authorization
over plain HTTP: change the default API URL (self.api) to use https:// if the
upstream supports TLS, or add a runtime check in the request path (where
Authorization is attached — references: self._headers, self.api, and the code
that sets/sends go_authorization) to refuse or strip the Authorization header
when the URL scheme is http and raise/return an error instructing the caller to
use HTTPS; also remove reliance on verify=True as a fix for plain HTTP and
ensure TLS certificate verification is enabled for HTTPS requests.
- Around line 272-280: The current validation uses globals() which treats any
global symbol as a valid provider; replace this with an explicit provider
registry: create/consume a PROVIDER_REGISTRY (dict) mapping provider names to
their classes and update the validation logic (where `providers` is checked and
`invalid_providers` is computed) to check membership in PROVIDER_REGISTRY
instead of globals(); when instantiating a single provider (the `new_cls =
globals()[providers[0]]()` line) instantiate via
PROVIDER_REGISTRY[providers[0]]() and, as an extra safeguard, ensure the mapped
class is a subclass/instance of the Tiku base class before calling config_set or
using it (and keep existing DISABLE and logger.error behavior for invalid
names).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58ca5dd0-e882-455a-b742-083224daa048

📥 Commits

Reviewing files that changed from the base of the PR and between e0ca38f and 8d874ac.

📒 Files selected for processing (2)
  • api/answer.py
  • api/answer_check.py

Comment thread api/answer_check.py Outdated
Comment thread api/answer.py Outdated
Comment thread api/answer.py
Comment thread api/answer.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
api/answer.py (2)

539-543: Consider adding validation for numeric config values.

The float() and int() conversions could raise ValueError if the user provides invalid config values. While this follows the existing pattern in other providers, adding try/except with sensible defaults would improve robustness.

♻️ Optional defensive parsing
     def _init_tiku(self):
         self._headers['Authorization'] = self._conf.get('go_authorization', self._headers['Authorization'])
-        self._min_interval = float(self._conf.get('go_min_interval', self._min_interval))
-        self._retry_times = int(self._conf.get('go_retry_times', self._retry_times))
-        self._retry_backoff = float(self._conf.get('go_retry_backoff', self._retry_backoff))
+        try:
+            self._min_interval = float(self._conf.get('go_min_interval', self._min_interval))
+        except (ValueError, TypeError):
+            logger.warning(f'{self.name} go_min_interval 配置无效,使用默认值 {self._min_interval}')
+        try:
+            self._retry_times = int(self._conf.get('go_retry_times', self._retry_times))
+        except (ValueError, TypeError):
+            logger.warning(f'{self.name} go_retry_times 配置无效,使用默认值 {self._retry_times}')
+        try:
+            self._retry_backoff = float(self._conf.get('go_retry_backoff', self._retry_backoff))
+        except (ValueError, TypeError):
+            logger.warning(f'{self.name} go_retry_backoff 配置无效,使用默认值 {self._retry_backoff}')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/answer.py` around lines 539 - 543, In _init_tiku, wrap the numeric
conversions for _min_interval (float), _retry_times (int), and _retry_backoff
(float) in try/except blocks to catch ValueError/TypeError and fall back to the
existing defaults (self._min_interval, self._retry_times, self._retry_backoff);
also validate non-negative or sensible ranges if desired, and log or silently
ignore malformed values; keep the Authorization header assignment as-is
(self._headers['Authorization'] = ...).

477-496: Network exceptions bypass the retry mechanism.

When a RequestException occurs (e.g., timeout, connection error), the method returns None immediately without utilizing the retry loop. Transient network issues could benefit from retrying, similar to throttling responses.

♻️ Proposed fix to include network errors in retry logic
     def _query_once(self, question: str) -> Optional[str]:
         for attempt in range(1, self._retry_times + 1):
             try:
                 with self._request_lock:
                     # 主动限速,降低触发 GO 题库并发/流控限制的概率。
                     now = time.time()
                     wait_time = self._min_interval - (now - self._last_request_time)
                     if wait_time > 0:
                         time.sleep(wait_time)
                     res = requests.post(
                         self.api,
                         data={'question': question},
                         headers=self._headers,
                         verify=True,
                         timeout=15
                     )
                     self._last_request_time = time.time()
             except requests.exceptions.RequestException as e:
-                logger.error(f'{self.name}查询异常: {e}')
-                return None
+                logger.warning(f'{self.name}查询异常 ({attempt}/{self._retry_times}): {e}')
+                if attempt < self._retry_times:
+                    sleep_seconds = self._retry_backoff * attempt
+                    time.sleep(sleep_seconds)
+                    continue
+                logger.error(f'{self.name}查询失败,已达最大重试次数')
+                return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/answer.py` around lines 477 - 496, The network exception handling in
_query_once currently returns None inside the except block, which bypasses the
retry loop; change it so RequestException is logged (include the exception) but
does not return immediately — instead let the loop continue until
self._retry_times is exhausted (and after each failed attempt optionally
sleep/backoff respecting self._min_interval and updating self._last_request_time
as appropriate), and only return None after all attempts fail; ensure you
reference the same symbols (method _query_once, self._retry_times,
self._min_interval, self._last_request_time, logger, requests.post) when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/answer.py`:
- Around line 539-543: In _init_tiku, wrap the numeric conversions for
_min_interval (float), _retry_times (int), and _retry_backoff (float) in
try/except blocks to catch ValueError/TypeError and fall back to the existing
defaults (self._min_interval, self._retry_times, self._retry_backoff); also
validate non-negative or sensible ranges if desired, and log or silently ignore
malformed values; keep the Authorization header assignment as-is
(self._headers['Authorization'] = ...).
- Around line 477-496: The network exception handling in _query_once currently
returns None inside the except block, which bypasses the retry loop; change it
so RequestException is logged (include the exception) but does not return
immediately — instead let the loop continue until self._retry_times is exhausted
(and after each failed attempt optionally sleep/backoff respecting
self._min_interval and updating self._last_request_time as appropriate), and
only return None after all attempts fail; ensure you reference the same symbols
(method _query_once, self._retry_times, self._min_interval,
self._last_request_time, logger, requests.post) when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff06211b-10d1-45c0-b38c-ddb6a46a6ac9

📥 Commits

Reviewing files that changed from the base of the PR and between 8d874ac and 1e15ee2.

📒 Files selected for processing (2)
  • api/answer.py
  • api/answer_check.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
api/answer.py (1)

480-493: Note: Request lock serializes all concurrent queries.

The lock at line 480 is held during the entire request (sleep + HTTP POST), not just for timestamp coordination. This serializes all TikuGo requests across threads. If this is intentional for strict rate-limit compliance, it's fine; otherwise, the lock could be released after updating timestamps.

Alternative: narrower lock scope (optional)
             try:
                 with self._request_lock:
-                    # 主动限速,降低触发 GO 题库并发/流控限制的概率。
                     now = time.time()
                     wait_time = self._min_interval - (now - self._last_request_time)
                     if wait_time > 0:
                         time.sleep(wait_time)
-                    res = requests.post(
-                        self.api,
-                        data={'question': question},
-                        headers=self._headers,
-                        verify=True,
-                        timeout=15
-                    )
                     self._last_request_time = time.time()
+                # 主动限速,降低触发 GO 题库并发/流控限制的概率。
+                res = requests.post(
+                    self.api,
+                    data={'question': question},
+                    headers=self._headers,
+                    verify=True,
+                    timeout=15
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/answer.py` around lines 480 - 493, The lock self._request_lock is held
across the entire HTTP call, serializing all TikuGo requests; instead, acquire
the lock only to compute and reserve the timing window and to update the
timestamp: under self._request_lock read self._last_request_time and
self._min_interval to compute sleep_time (max(0, last + min_interval - now)),
release the lock, time.sleep(sleep_time) outside the lock, perform
requests.post(...) without the lock, then re-acquire self._request_lock to set
self._last_request_time = time.time() so only timestamp coordination (not the
network call) is serialized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/answer.py`:
- Around line 513-514: The current int(res_json.get('code', 0)) conversion in
api/answer.py can raise ValueError for non-numeric strings; update the code that
assigns code (using res_json) to safely parse integers by wrapping the
conversion in a try/except ValueError (or using a small helper like safe_int)
and defaulting to 0 on failure, e.g., attempt int(str(res_json.get('code',
'')).strip()) inside try/except and set code = 0 if parsing fails so the rest of
the logic using code and answer remains robust.

---

Nitpick comments:
In `@api/answer.py`:
- Around line 480-493: The lock self._request_lock is held across the entire
HTTP call, serializing all TikuGo requests; instead, acquire the lock only to
compute and reserve the timing window and to update the timestamp: under
self._request_lock read self._last_request_time and self._min_interval to
compute sleep_time (max(0, last + min_interval - now)), release the lock,
time.sleep(sleep_time) outside the lock, perform requests.post(...) without the
lock, then re-acquire self._request_lock to set self._last_request_time =
time.time() so only timestamp coordination (not the network call) is serialized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 787bb4b8-19cb-43e7-945a-c1dd204da1b8

📥 Commits

Reviewing files that changed from the base of the PR and between 1e15ee2 and 6bef9a1.

📒 Files selected for processing (1)
  • api/answer.py

Comment thread api/answer.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant