Fix a bug in CachedOP.#12184
Conversation
| RunGraph(false, idx, arrays, 0, idx.num_nodes(), std::move(array_reqs), | ||
| std::move(ref_count), &states, dispatch_modes, | ||
| !recording || inlining_); | ||
| recording); |
There was a problem hiding this comment.
I recommend putting the recording logic in this line and not changing the semantics of recording variable in case in the future someone decides to use recording below RunGraph call.
RunGraph(false, idx, arrays, 0, idx.num_nodes(), std::move(array_reqs),
std::move(ref_count), &states, dispatch_modes,
recording && inlining_);
There was a problem hiding this comment.
i'm trying to follow the logic of the original code. A little logic error can cause unexpected problems.
There was a problem hiding this comment.
I find it easier to understand the logic: RunGraph must record only when autograd is recording and CachedOp is inlining. An easier to understand logic can prevent future confusion. Up to you to change or not.
| @@ -823,9 +823,10 @@ OpStatePtr CachedOp::DynamicForward( | |||
|
|
|||
| // If we are already recording, we don't need RunGraph to record all | |||
| // computation again. | |||
There was a problem hiding this comment.
I recommend clarifying in the comment that "When CachedOp is inlining, RunGraph is in charge of recording the graph, otherwise we are in charge of recording and RunGraph shall not be recording".
|
Do we have any unit-test framework to check the growth of the bufferpool that we can use to write a unit-test for this fix? |
|
@eric-haibin-lin @piiswrong This PR should be good to merge? |
This reverts commit cd9f9c8.
This reverts commit cd9f9c8.
* fix a bug. * address comments. * retrigger * address comments.
Description
As reported in #12116, the modification in #11951 causes excessive memory consumption. The reason is that #11951 actually turns on recording even in the inference mode.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments