fix: batch OCR crash when inline sub_name conflicts with pipeline node by MistEO · Pull Request #1280 · MaaXYZ/MaaFramework

MistEO · 2026-04-17T08:56:33Z

Summary

Fixes crash (std::bad_variant_access → std::terminate) when an And/Or sub-recognition's sub_name matches an existing pipeline node of a different type

Root Cause

prepare_batch_ocr used a two-phase approach:

Collect phase: try_add_ocr_node stored OCR node names in node_names (using sub_name for inline subs)
Build phase: Loop over node_names, call get_pipeline_data(name), and std::get<OCRerParam>(node_opt->reco_param)

For inline sub-recognitions inside And/Or, the sub_name is user-defined and doesn't necessarily correspond to an actual pipeline node. When sub_name happened to match a real pipeline node with a different recognition type (e.g. ColorMatch), std::get<OCRerParam> threw std::bad_variant_access, causing silent process termination.

Fix

Build BatchOCREntry directly in try_add_ocr_node using the already-available OCRerParam, eliminating the second lookup via get_pipeline_data. This ensures the param always comes from the actual OCR recognition, not from a potentially mismatched pipeline node.

Closes #1268

Summary by Sourcery

防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃，并在节点收集阶段直接构建批量 OCR 计划。

Bug 修复：

修复由于行内 OCR 子识别名称与已有非 OCR 流水线节点名称发生冲突而导致的批量 OCR 崩溃问题。

增强：

在收集阶段构建批量 OCR 条目，并使用收集到的条目列表来决定是否需要批量 OCR 计划，从而避免重复的流水线查找。

Original summary in English

Summary by Sourcery

Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.

Bug Fixes:

Fix batch OCR crash caused by inline OCR sub-recognition names colliding with existing non-OCR pipeline node names.

Enhancements:

Construct batch OCR entries at collection time and use the collected entries list to decide whether a batch OCR plan is needed, avoiding redundant pipeline lookups.

Original summary in English

Summary by Sourcery

防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃，并在节点收集阶段直接构建批量 OCR 计划。

Bug 修复：

修复由于行内 OCR 子识别名称与已有非 OCR 流水线节点名称发生冲突而导致的批量 OCR 崩溃问题。

增强：

在收集阶段构建批量 OCR 条目，并使用收集到的条目列表来决定是否需要批量 OCR 计划，从而避免重复的流水线查找。

Original summary in English

Summary by Sourcery

Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.

Bug Fixes:

Fix batch OCR crash caused by inline OCR sub-recognition names colliding with existing non-OCR pipeline node names.

Enhancements:

Construct batch OCR entries at collection time and use the collected entries list to decide whether a batch OCR plan is needed, avoiding redundant pipeline lookups.

When an And/Or sub-recognition's sub_name matched an actual pipeline node of a different type (e.g. ColorMatch), the batch OCR plan builder would crash with std::bad_variant_access. Root cause: prepare_batch_ocr used a two-phase approach - first collecting OCR node names via try_add_ocr_node, then re-looking up pipeline data via get_pipeline_data(name) and calling std::get<OCRerParam>. For inline subs, the name came from sub_name which could resolve to a non-OCR pipeline node. Fix: build BatchOCREntry directly in try_add_ocr_node using the already-available OCRerParam, eliminating the second lookup. Closes #1268 Made-with: Cursor

sourcery-ai

Hey - 我发现了 1 个问题，并留下了一些整体性的反馈：

在 try_add_ocr_node 中向 ctx.plan.entries 添加元素时，建议参考 node_names 的去重语义（例如，仅当 emplace(name) 实际插入成功时才 push entry），以避免针对同一节点名称生成重复的 BatchOCREntry 条目。

给 AI Agent 的提示词

Please address the comments from this code review:

## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
     }

     ctx.plan.node_names.emplace(name);
+    ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
 }

</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.

`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>

Sourcery 对开源项目是免费的 —— 如果你觉得我们的 Review 有帮助，欢迎分享 ✨

_{帮我变得更有用！请在每条评论上点 👍 或 👎，我会根据反馈来改进后续的代码审查。}

Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

When adding to ctx.plan.entries in try_add_ocr_node, consider mirroring the deduplication semantics of node_names (e.g., only pushing an entry if emplace(name) actually inserts) to avoid duplicate BatchOCREntry entries for the same node name.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
     }

     ctx.plan.node_names.emplace(name);
+    ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
 }

</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.

`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Copilot

Pull request overview

Fixes a pipeline batch-OCR crash caused by inline And/Or sub-recognitions whose sub_name collides with an existing (non-OCR) pipeline node name, by ensuring batch OCR entries are built from the actual OCR params rather than re-looking up a pipeline node by name.

Changes:

Remove the “second lookup” phase in prepare_batch_ocr that re-fetches pipeline data by collected names and std::gets OCR params from potentially mismatched node types.
Build BatchOCREntry directly inside try_add_ocr_node using the already-known OCRerParam.
Adjust the “batch OCR not needed” threshold check to use the collected entries count.

Copilot · 2026-04-17T08:59:52Z

@@ -410,6 +398,7 @@ void PipelineTask::try_add_ocr_node(OCRCollectContext& ctx, const std::string& n
    }

    ctx.plan.node_names.emplace(name);


ctx.plan.node_names.emplace(name) is not checked for insertion success, but entries.emplace_back(...) always runs. If the same OCR name is collected multiple times (e.g., duplicated sub_name across inline subs), this will create duplicate BatchOCREntrys while node_names stays unique, which can lead to redundant/incorrect batch plans. Consider only pushing to entries when the set insertion succeeds (or otherwise dedupe entries by name).

Suggested change

ctx.plan.node_names.emplace(name);

const auto [_, inserted] = ctx.plan.node_names.emplace(name);

if (!inserted) {

LogDebug << "batch OCR skipping duplicate node" << VAR(name);

return;

}

Copilot · 2026-04-17T08:59:52Z

+    if (ctx.plan.entries.size() < 2) {
+        LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size());


The eligibility check now uses ctx.plan.entries.size() < 2, but entries can diverge from the unique-node count (especially if duplicates slip in). Since the trigger logic uses node_names and the log message refers to “eligible OCR nodes”, it’s safer/clearer to base this threshold on the unique set (node_names.size()) or ensure entries is deduplicated so the count matches the intended semantics.

Suggested change

if (ctx.plan.entries.size() < 2) {

LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size());

if (ctx.plan.node_names.size() < 2) {

LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.node_names.size());

MistEO · 2026-04-17T10:16:37Z

@sourcery-ai review

sourcery-ai

Hey - 我发现了 1 个问题，并给出了一些总体反馈：

在 prepare_batch_ocr 中，批处理决策是基于 entries.size() 做出的，而日志仍然输出的是 ctx.plan.node_names；建议两处都使用同一个容器，以避免它们在未来发生偏离时产生混淆。
在 try_add_ocr_node 中，如果多次使用相同的 name 但不同的 param 调用，该函数会因为集合插入检查而静默忽略新的参数；如果这不是预期行为，建议在遇到重复名称且参数不匹配时进行日志记录或断言。

面向 AI 代理的提示

Please address the comments from this code review:

## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
     }

-    ctx.plan.node_names.emplace(name);
+    auto [_, inserted] = ctx.plan.node_names.emplace(name);
+    if (inserted) {
+        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+    }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.

`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.

```suggestion
    if (ctx.plan.node_names.emplace(name).second) {
        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
    }
```
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得这份评审对你有帮助，欢迎分享 ✨

_{帮我变得更有用！请在每条评论上点 👍 或 👎，我会根据你的反馈改进后续评审。}

Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

In prepare_batch_ocr, the batching decision is based on entries.size() while the log still reports ctx.plan.node_names; consider using the same container for both to avoid confusion if they ever diverge.
In try_add_ocr_node, repeated calls with the same name and differing param will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.

## Individual Comments

### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
     }

-    ctx.plan.node_names.emplace(name);
+    auto [_, inserted] = ctx.plan.node_names.emplace(name);
+    if (inserted) {
+        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+    }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.

`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.

```suggestion
    if (ctx.plan.node_names.emplace(name).second) {
        ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Made-with: Cursor

Copilot AI review requested due to automatic review settings April 17, 2026 08:56

Copilot started reviewing on behalf of MistEO April 17, 2026 08:57 View session

sourcery-ai bot reviewed Apr 17, 2026

View reviewed changes

Comment thread source/MaaFramework/Task/PipelineTask.cpp Outdated

Copilot AI reviewed Apr 17, 2026

View reviewed changes

sourcery-ai bot reviewed Apr 17, 2026

View reviewed changes

Comment thread source/MaaFramework/Task/PipelineTask.cpp Outdated

fix: only add BatchOCREntry when node name is new

0d1faf6

Made-with: Cursor

MistEO force-pushed the fix/batch-ocr-crash-on-sub-name-conflict branch from 27796fc to 0d1faf6 Compare April 17, 2026 10:21

MistEO merged commit 7a8bf03 into main Apr 17, 2026
6 checks passed

MistEO deleted the fix/batch-ocr-crash-on-sub-name-conflict branch April 17, 2026 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280

fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280
MistEO merged 2 commits intomainfrom
fix/batch-ocr-crash-on-sub-name-conflict

MistEO commented Apr 17, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

MistEO commented Apr 17, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -410,6 +398,7 @@ void PipelineTask::try_add_ocr_node(OCRCollectContext& ctx, const std::string& n
		}

		ctx.plan.node_names.emplace(name);

-    ctx.plan.node_names.emplace(name);
+    const auto [_, inserted] = ctx.plan.node_names.emplace(name);
+    if (!inserted) {
+        LogDebug << "batch OCR skipping duplicate node" << VAR(name);
+        return;
+    }

		if (ctx.plan.entries.size() < 2) {
		LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size());

Conversation

MistEO commented Apr 17, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Summary by Sourcery

Summary by Sourcery

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

MistEO commented Apr 17, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MistEO commented Apr 17, 2026 •

edited by sourcery-ai bot

Loading