fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280
fix: batch OCR crash when inline sub_name conflicts with pipeline node#1280
Conversation
When an And/Or sub-recognition's sub_name matched an actual pipeline node of a different type (e.g. ColorMatch), the batch OCR plan builder would crash with std::bad_variant_access. Root cause: prepare_batch_ocr used a two-phase approach - first collecting OCR node names via try_add_ocr_node, then re-looking up pipeline data via get_pipeline_data(name) and calling std::get<OCRerParam>. For inline subs, the name came from sub_name which could resolve to a non-OCR pipeline node. Fix: build BatchOCREntry directly in try_add_ocr_node using the already-available OCRerParam, eliminating the second lookup. Closes #1268 Made-with: Cursor
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并留下了一些整体性的反馈:
- 在
try_add_ocr_node中向ctx.plan.entries添加元素时,建议参考node_names的去重语义(例如,仅当emplace(name)实际插入成功时才 push entry),以避免针对同一节点名称生成重复的BatchOCREntry条目。
给 AI Agent 的提示词
Please address the comments from this code review:
## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.
## Individual Comments
### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
}
ctx.plan.node_names.emplace(name);
+ ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.
`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据反馈来改进后续的代码审查。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- When adding to
ctx.plan.entriesintry_add_ocr_node, consider mirroring the deduplication semantics ofnode_names(e.g., only pushing an entry ifemplace(name)actually inserts) to avoid duplicateBatchOCREntryentries for the same node name.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- When adding to `ctx.plan.entries` in `try_add_ocr_node`, consider mirroring the deduplication semantics of `node_names` (e.g., only pushing an entry if `emplace(name)` actually inserts) to avoid duplicate `BatchOCREntry` entries for the same node name.
## Individual Comments
### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="401" />
<code_context>
}
ctx.plan.node_names.emplace(name);
+ ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential duplicate `BatchOCREntry` entries even though `node_names` stays unique.
`node_names` is still unique because it’s a set, but `entries` now appends on every call. Previously, `entries` was rebuilt from the unique `node_names`, so duplicates couldn’t occur. With this change, you can get multiple `BatchOCREntry`s for the same `name`, leading to redundant OCR or confusing downstream behavior. Please either only append when the `name` is new, or derive `entries` from the unique `node_names` as before.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Fixes a pipeline batch-OCR crash caused by inline And/Or sub-recognitions whose sub_name collides with an existing (non-OCR) pipeline node name, by ensuring batch OCR entries are built from the actual OCR params rather than re-looking up a pipeline node by name.
Changes:
- Remove the “second lookup” phase in
prepare_batch_ocrthat re-fetches pipeline data by collected names andstd::gets OCR params from potentially mismatched node types. - Build
BatchOCREntrydirectly insidetry_add_ocr_nodeusing the already-knownOCRerParam. - Adjust the “batch OCR not needed” threshold check to use the collected entries count.
| @@ -410,6 +398,7 @@ void PipelineTask::try_add_ocr_node(OCRCollectContext& ctx, const std::string& n | |||
| } | |||
|
|
|||
| ctx.plan.node_names.emplace(name); | |||
There was a problem hiding this comment.
ctx.plan.node_names.emplace(name) is not checked for insertion success, but entries.emplace_back(...) always runs. If the same OCR name is collected multiple times (e.g., duplicated sub_name across inline subs), this will create duplicate BatchOCREntrys while node_names stays unique, which can lead to redundant/incorrect batch plans. Consider only pushing to entries when the set insertion succeeds (or otherwise dedupe entries by name).
| ctx.plan.node_names.emplace(name); | |
| const auto [_, inserted] = ctx.plan.node_names.emplace(name); | |
| if (!inserted) { | |
| LogDebug << "batch OCR skipping duplicate node" << VAR(name); | |
| return; | |
| } |
| if (ctx.plan.entries.size() < 2) { | ||
| LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size()); |
There was a problem hiding this comment.
The eligibility check now uses ctx.plan.entries.size() < 2, but entries can diverge from the unique-node count (especially if duplicates slip in). Since the trigger logic uses node_names and the log message refers to “eligible OCR nodes”, it’s safer/clearer to base this threshold on the unique set (node_names.size()) or ensure entries is deduplicated so the count matches the intended semantics.
| if (ctx.plan.entries.size() < 2) { | |
| LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.entries.size()); | |
| if (ctx.plan.node_names.size() < 2) { | |
| LogDebug << "batch OCR not needed, eligible OCR nodes < 2" << VAR(ctx.plan.node_names.size()); |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - 我发现了 1 个问题,并给出了一些总体反馈:
- 在
prepare_batch_ocr中,批处理决策是基于entries.size()做出的,而日志仍然输出的是ctx.plan.node_names;建议两处都使用同一个容器,以避免它们在未来发生偏离时产生混淆。 - 在
try_add_ocr_node中,如果多次使用相同的name但不同的param调用,该函数会因为集合插入检查而静默忽略新的参数;如果这不是预期行为,建议在遇到重复名称且参数不匹配时进行日志记录或断言。
面向 AI 代理的提示
Please address the comments from this code review:
## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.
## Individual Comments
### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
}
- ctx.plan.node_names.emplace(name);
+ auto [_, inserted] = ctx.plan.node_names.emplace(name);
+ if (inserted) {
+ ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+ }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.
`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.
```suggestion
if (ctx.plan.node_names.emplace(name).second) {
ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
}
```
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续评审。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- In
prepare_batch_ocr, the batching decision is based onentries.size()while the log still reportsctx.plan.node_names; consider using the same container for both to avoid confusion if they ever diverge. - In
try_add_ocr_node, repeated calls with the samenameand differingparamwill silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `prepare_batch_ocr`, the batching decision is based on `entries.size()` while the log still reports `ctx.plan.node_names`; consider using the same container for both to avoid confusion if they ever diverge.
- In `try_add_ocr_node`, repeated calls with the same `name` and differing `param` will silently ignore the new param due to the set insertion check; if this is unexpected, consider logging or asserting when a duplicate name with mismatched parameters is encountered.
## Individual Comments
### Comment 1
<location path="source/MaaFramework/Task/PipelineTask.cpp" line_range="400-403" />
<code_context>
}
- ctx.plan.node_names.emplace(name);
+ auto [_, inserted] = ctx.plan.node_names.emplace(name);
+ if (inserted) {
+ ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
+ }
</code_context>
<issue_to_address>
**suggestion:** The insertion pattern can be simplified to avoid the unused structured binding element.
`auto [_, inserted]` introduces an unused variable that can be noisy for readers and linters. Consider using `if (ctx.plan.node_names.emplace(name).second) { ... }` or naming the iterator (e.g. `it`) and using it if needed later to avoid the dummy `_` binding.
```suggestion
if (ctx.plan.node_names.emplace(name).second) {
ctx.plan.entries.emplace_back(BatchOCREntry { .name = name, .param = param });
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Made-with: Cursor
27796fc to
0d1faf6
Compare
Summary
std::bad_variant_access→std::terminate) when anAnd/Orsub-recognition'ssub_namematches an existing pipeline node of a different typeRoot Cause
prepare_batch_ocrused a two-phase approach:try_add_ocr_nodestored OCR node names innode_names(usingsub_namefor inline subs)node_names, callget_pipeline_data(name), andstd::get<OCRerParam>(node_opt->reco_param)For inline sub-recognitions inside
And/Or, thesub_nameis user-defined and doesn't necessarily correspond to an actual pipeline node. Whensub_namehappened to match a real pipeline node with a different recognition type (e.g.ColorMatch),std::get<OCRerParam>threwstd::bad_variant_access, causing silent process termination.Fix
Build
BatchOCREntrydirectly intry_add_ocr_nodeusing the already-availableOCRerParam, eliminating the second lookup viaget_pipeline_data. This ensures the param always comes from the actual OCR recognition, not from a potentially mismatched pipeline node.Closes #1268
Summary by Sourcery
防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃,并在节点收集阶段直接构建批量 OCR 计划。
Bug 修复:
增强:
Original summary in English
Summary by Sourcery
Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.
Bug Fixes:
Enhancements:
Original summary in English
Summary by Sourcery
防止在行内 OCR 子识别名称与非 OCR 流水线节点名称冲突时导致批量 OCR 流水线崩溃,并在节点收集阶段直接构建批量 OCR 计划。
Bug 修复:
增强:
Original summary in English
Summary by Sourcery
Prevent batch OCR pipeline from crashing when inline OCR sub-recognition names collide with non-OCR pipeline nodes and build the batch OCR plan directly during node collection.
Bug Fixes:
Enhancements: