Skip to content

fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280

Merged
MistEO merged 2 commits intomainfrom
fix/batch-ocr-crash-on-sub-name-conflict
Apr 17, 2026
Merged

fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280
MistEO merged 2 commits intomainfrom
fix/batch-ocr-crash-on-sub-name-conflict

Conversation

@MistEO
Copy link
Copy Markdown
Member

@MistEO MistEO commented Apr 17, 2026

Summary

  • Fixes crash (std::bad_variant_accessstd::terminate) when an And/Or sub-recognition's sub_name matches an existing pipeline node of a different type

Root Cause

prepare_batch_ocr used a two-phase approach:

  1. Collect phase: try_add_ocr_node stored OCR node names in node_names (using sub_name for inline subs)
  2. Build phase: Loop over node_names, call get_pipeline_data(name), and std::get<OCRerParam>(node_opt->reco_param)

For inline sub-recognitions inside And/Or, the sub_name is user-defined and doesn't necessarily correspond to an actual pipeline node. When sub_name happened to match a real pipeline node with a different recognition type (e.g. ColorMatch), std::get<OCRerParam> threw std::bad_variant_access, causing silent process termination.

Fix

Build BatchOCREntry directly in try_add_ocr_node using the already-available OCRerParam, eliminating the second lookup via get_pipeline_data. This ensures the param always comes from the actual OCR recognition, not from a potentially mismatched pipeline node.

Closes #1268

Summary by Sourcery

防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃,并在节点收集阶段直接构建批量 OCR 计划。

Bug 修复:

  • 修复由于行内 OCR 子识别名称与已有非 OCR 流水线节点名称发生冲突而导致的批量 OCR 崩溃问题。

增强:

  • 在收集阶段构建批量 OCR 条目,并使用收集到的条目列表来决定是否需要批量 OCR 计划,从而避免重复的流水线查找。
Original summary in English

Summary by Sourcery

Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.

Bug Fixes:

  • Fix batch OCR crash caused by inline OCR sub-recognition names colliding with existing non-OCR pipeline node names.

Enhancements:

  • Construct batch OCR entries at collection time and use the collected entries list to decide whether a batch OCR plan is needed, avoiding redundant pipeline lookups.
Original summary in English

Summary by Sourcery

防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃,并在节点收集阶段直接构建批量 OCR 计划。

Bug 修复:

  • 修复由于行内 OCR 子识别名称与已有非 OCR 流水线节点名称发生冲突而导致的批量 OCR 崩溃问题。

增强:

  • 在收集阶段构建批量 OCR 条目,并使用收集到的条目列表来决定是否需要批量 OCR 计划,从而避免重复的流水线查找。
Original summary in English

Summary by Sourcery

Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.

Bug Fixes:

  • Fix batch OCR crash caused by inline OCR sub-recognition names colliding with existing non-OCR pipeline node names.

Enhancements:

  • Construct batch OCR entries at collection time and use the collected entries list to decide whether a batch OCR plan is needed, avoiding redundant pipeline lookups.

When an And/Or sub-recognition's sub_name matched an actual pipeline
node of a different type (e.g. ColorMatch), the batch OCR plan builder
would crash with std::bad_variant_access.

Root cause: prepare_batch_ocr used a two-phase approach - first
collecting OCR node names via try_add_ocr_node, then re-looking up
pipeline data via get_pipeline_data(name) and calling
std::get<OCRerParam>. For inline subs, the name came from sub_name
which could resolve to a non-OCR pipeline node.

Fix: build BatchOCREntry directly in try_add_ocr_node using the
already-available OCRerParam, eliminating the second lookup.

Closes #1268

Made-with: Cursor
Copilot AI review requested due to automatic review settings April 17, 2026 08:56
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并留下了一些整体性的反馈:

  • try_add_ocr_node 中向 ctx.plan.entries 添加元素时,建议参考 node_names 的去重语义(例如,仅当 emplace(name) 实际插入成功时才 push entry),以避免针对同一节点名称生成重复的 BatchOCREntry 条目。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
     }

     ctx.plan.node_names.emplace(name);
+    ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
 }

</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.

`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>

Sourcery 对开源项目是免费的 —— 如果你觉得我们的 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据反馈来改进后续的代码审查。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • When adding to ctx.plan.entries in try_add_ocr_node, consider mirroring the deduplication semantics of node_names (e.g., only pushing an entry if emplace(name) actually inserts) to avoid duplicate BatchOCREntry entries for the same node name.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
     }

     ctx.plan.node_names.emplace(name);
+    ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
 }

</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.

`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread source/MaaFramework/Task/PipelineTask.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a pipeline batch-OCR crash caused by inline And/Or sub-recognitions whose sub_name collides with an existing (non-OCR) pipeline node name, by ensuring batch OCR entries are built from the actual OCR params rather than re-looking up a pipeline node by name.

Changes:

  • Remove the “second lookup” phase in prepare_batch_ocr that re-fetches pipeline data by collected names and std::gets OCR params from potentially mismatched node types.
  • Build BatchOCREntry directly inside try_add_ocr_node using the already-known OCRerParam.
  • Adjust the “batch OCR not needed” threshold check to use the collected entries count.

@@ -410,6 +398,7 @@ void PipelineTask::try_add_ocr_node(OCRCollectContext& ctx, const std::string& n
}

ctx.plan.node_names.emplace(name);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.plan.node_names.emplace(name) is not checked for insertion success, but entries.emplace_back(...) always runs. If the same OCR name is collected multiple times (e.g., duplicated sub_name across inline subs), this will create duplicate BatchOCREntrys while node_names stays unique, which can lead to redundant/incorrect batch plans. Consider only pushing to entries when the set insertion succeeds (or otherwise dedupe entries by name).

Suggested change
ctx.plan.node_names.emplace(name);
const auto [_, inserted] = ctx.plan.node_names.emplace(name);
if (!inserted) {
LogDebug << "batch OCR skipping duplicate node" << VAR(name);
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines +361 to +362
if (ctx.plan.entries.size() < 2) {
LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size());
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eligibility check now uses ctx.plan.entries.size() < 2, but entries can diverge from the unique-node count (especially if duplicates slip in). Since the trigger logic uses node_names and the log message refers to “eligible OCR nodes”, it’s safer/clearer to base this threshold on the unique set (node_names.size()) or ensure entries is deduplicated so the count matches the intended semantics.

Suggested change
if (ctx.plan.entries.size() < 2) {
LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size());
if (ctx.plan.node_names.size() < 2) {
LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.node_names.size());

Copilot uses AI. Check for mistakes.
@MistEO
Copy link
Copy Markdown
Member Author

MistEO commented Apr 17, 2026

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些总体反馈:

  • prepare_batch_ocr 中,批处理决策是基于 entries.size() 做出的,而日志仍然输出的是 ctx.plan.node_names;建议两处都使用同一个容器,以避免它们在未来发生偏离时产生混淆。
  • try_add_ocr_node 中,如果多次使用相同的 name 但不同的 param 调用,该函数会因为集合插入检查而静默忽略新的参数;如果这不是预期行为,建议在遇到重复名称且参数不匹配时进行日志记录或断言。
面向 AI 代理的提示
Please address the comments from this code review:

## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
     }

-    ctx.plan.node_names.emplace(name);
+    auto [_, inserted] = ctx.plan.node_names.emplace(name);
+    if (inserted) {
+        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+    }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.

`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.

```suggestion
    if (ctx.plan.node_names.emplace(name).second) {
        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
    }
```
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得这份评审对你有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续评审。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • In prepare_batch_ocr, the batching decision is based on entries.size() while the log still reports ctx.plan.node_names; consider using the same container for both to avoid confusion if they ever diverge.
  • In try_add_ocr_node, repeated calls with the same name and differing param will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
     }

-    ctx.plan.node_names.emplace(name);
+    auto [_, inserted] = ctx.plan.node_names.emplace(name);
+    if (inserted) {
+        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+    }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.

`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.

```suggestion
    if (ctx.plan.node_names.emplace(name).second) {
        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread source/MaaFramework/Task/PipelineTask.cpp Outdated
@MistEO MistEO force-pushed the fix/batch-ocr-crash-on-sub-name-conflict branch from 27796fc to 0d1faf6 Compare April 17, 2026 10:21
@MistEO MistEO merged commit 7a8bf03 into main Apr 17, 2026
6 checks passed
@MistEO MistEO deleted the fix/batch-ocr-crash-on-sub-name-conflict branch April 17, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【Bug】Pipeline 参数 sub_name 配置错误导致框架无日志静默闪退

2 participants