Skip to content

Conversation

@adstraw
Copy link
Contributor

@adstraw adstraw commented Oct 17, 2022

Refactor HexagonBufferManager to handle post release buffer frees as a no-op using internal data structures versus returning state through the prior current_allocations method.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 17, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@adstraw
Copy link
Contributor Author

adstraw commented Oct 17, 2022

CC @janetsc and @Lunderberg RE post-review feedback on PR #13028

@adstraw
Copy link
Contributor Author

adstraw commented Oct 17, 2022

@tvm-bot rerun

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general cleanup, though it looks like there's some potential threading issues that could occur.

void Reset() {
std::lock_guard<std::mutex> lock(mutex_);
hexagon_buffer_map_.clear();
released_buffers_.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the released_buffers_ be cleared here? As I understand it, maintaining two separate lists is intended to avoid throwing an error in cases where the following sequence occurs.

  1. Scope A enters
  2. Allocation of X
  3. Scope A exits, implicit de-allocation of X
  4. Scope B enters
  5. Request to de-allocate X.

If step 4 calls Reset and clears released_buffers_, then step 5 would raise an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to hear thoughts from @janetsc. My understanding was that once we call AcquireResources in the Hexagon Device API (step 4) which in turn calls this Reset method we want the any past free of buffers (step 5) to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - once we call AcquireResources, we are starting a new session. Everything we had previously tracked should be forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is now handled with a Enable / Disable method. We check on Enable which is called from AcquireResources that we have freed everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piggybacking of @Lunderberg's example, this means that we will fail in step 4 if we have not de-allocated X prior.

if (it_released != released_buffers_.end()) {
released_buffers_.erase(it_released);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want to happen if a void* occurs in both released_buffers_ and hexagon_buffer_map_? I'm picturing a case where the an allocation returns the same data pointer as a previous allocation that had been left open. I think with the current implementation, it would preferentially treat it as a known-previously-released buffer, and would not de-allocate the memory would be ignored, because released_buffers_ gets checked first. I don't think that's the desired behavior, because that could result in an unexpectedly higher memory footprint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want here is to not allow any new allocations when released_buffers_ has any values.

In the current checked in code, that's what would happen because there would be no runtime_hexbuffs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added a change to AllocateHexagonBuffer to remove ptr from the list of released_buffers_ so that it should only be in one data structure at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no longer any released_buffers_ structure so the original feedback is no longer valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want here is to not allow any new allocations when released_buffers_ has any values.

This PR will fail / assert if we Enable the HexagonBufferManager with unfreed buffers.

Copy link
Contributor

@janetsc janetsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that the currently implementation "breaks" some abstraction by allowing the list of allocated buffers to be saved before it is destroyed in ReleaseResources.

However, this introduces some new problems, I think, including the overall state of the device resources. By previously having a buffer management object created/destroyed in Acquire/Release, it allows is to manage resources very explicitly in a session. Now that the object lives in this PR, it has an additional state - "released," where no new allocations should be allowed.

I almost think it would be better to go for the solution where we clean up any frees that are currently happening after ReleaseResources, and make it an error to call Free after ReleaseResources.

<< "HexagonDeviceAPI::FreeWorkspace outside of a session. "
<< "Please call HexagonDeviceAPI::AcquireResources";
CHECK(runtime_hexbuffs->count(data) != 0)
CHECK(runtime_hexbuffs.count(data) != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the side effect that it will fail for things that were released in Release(). It isn't as clear that's the case with this refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is redundant to what's already done in the FreeHexagonBuffer method.

void Reset() {
std::lock_guard<std::mutex> lock(mutex_);
hexagon_buffer_map_.clear();
released_buffers_.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - once we call AcquireResources, we are starting a new session. Everything we had previously tracked should be forgotten.

if (it_released != released_buffers_.end()) {
released_buffers_.erase(it_released);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want here is to not allow any new allocations when released_buffers_ has any values.

In the current checked in code, that's what would happen because there would be no runtime_hexbuffs

@adstraw adstraw changed the title [Hexagon] Refactor HexagonBufferManager; add Release API [Hexagon] Refactor HexagonBufferManager Oct 18, 2022
CHECK(runtime_vtcm) << "runtime_vtcm was not created in AcquireResources";
runtime_vtcm.reset();

runtime_hexbuffs.Disable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_vtcm needs to be released last - otherwise if there are still VTCM buffers, it will not be able to free them.

Copy link
Contributor

@janetsc janetsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This latest version does not clear the map (and free the buffers) on ReleaseResources. That is the most important functionality for the Acquire/Release feature.

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@adstraw adstraw closed this Oct 19, 2022
@adstraw adstraw deleted the straw-hex-refactor-hexbuffmgr branch October 19, 2022 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants