Skip to content

Fix #5175: Add an extra check for input dependencies of the async producer#5202

Open
vksnk wants to merge 9 commits intomainfrom
vksnk/async-order
Open

Fix #5175: Add an extra check for input dependencies of the async producer#5202
vksnk wants to merge 9 commits intomainfrom
vksnk/async-order

Conversation

@vksnk
Copy link
Member

@vksnk vksnk commented Aug 20, 2020

As described in the #5175, it's possible to write an illegal schedule which successfully goes through the code generation, but generated code produces incorrect results.

This PR adds an extra check to detect such illegal schedules by verifying that if in realize node A there is a producer node B and producer A is async and is consumer of B then B must be also scheduled as async.

@steven-johnson steven-johnson requested a review from zvookin August 20, 2020 21:04
@steven-johnson
Copy link
Contributor

LGTM but @zvookin should probably review

@zvookin
Copy link
Member

zvookin commented Aug 25, 2020

There's something I'm not getting here. Generally the things an async scheduled Func depends on do not need to be async themselves. I'm guessing this has to do with the storage being allocated outside the async invocations and the compute being inside. But then I'm not quite seeing how requiring the dependency to be async always makes it correct. I will sit down and look at it a little more closely, but I think updating the description to say something in terms of which schedules are valid and invalid, rather than describing the IR conditions they result in, might help.

@vksnk
Copy link
Member Author

vksnk commented Aug 25, 2020

Thank you for reviewing, that's a great question.

Generally the things an async scheduled Func depends on do not need to be async themselves.

Right, but input dependency need to be computed fully beforehand. If input dependency is still being produced, async_producer needs to have a way to know that data it depends on is ready.

If I understand correctly, async producers are lifted into separate "thread" based on their store_level, so if store_level is the same as compute_level then there is sort of implicit synchronization point between input dependency and producer:

for some_dim:
  compute input_dependency
  --> implicit sync point <--
  fork {
    compute async_producer  (depends on input_dependency)
    semaphores to sync between async_producer and consumer
  } {
    semaphores to sync between async_producer and consumer
    compute consumer of async_producer
  }

However, if store level is different from compute level then async producer will be lifted out of the loop and this implicit synchronization point is gone:

fork {
for some_dim:
  compute async_producer  (depends on input_dependency, but can't really know how much of it was produced in another thread) 
  semaphores to sync between async_producer and consumer
} {
for some_dim:
  compute input_dependency
  
  semaphores to sync between async_producer and consumer
  compute consumer of async_producer
}

If this logic is correct then solution can be either making a copy of input_producer in both "threads" or inserting explicit synchronization between input_producer and async_producer. The latter looks exactly like what async() already does, so it seemed to me like a more reasonable way to address it.

(Maybe, it's also possible to do forking differently, so it's not based on store level, but I am not sure and it sounds like a really big change).

@vksnk
Copy link
Member Author

vksnk commented Aug 31, 2020

Monday review ping.

@steven-johnson
Copy link
Contributor

Is this still just awating review?

@steven-johnson
Copy link
Contributor

Is this just waiting on review?

@vksnk
Copy link
Member Author

vksnk commented Sep 24, 2020

Yes, review is still needed.

Also, I recently realized that #5201 might be just a different type of the same issue, so maybe it's possible/worth trying to generalize check from this PR to detect the case from #5201.

@alexreinking alexreinking modified the milestones: v11.0.0, v12.0.0 Oct 16, 2020
@steven-johnson
Copy link
Contributor

Where does this PR stand?

@vksnk
Copy link
Member Author

vksnk commented Oct 22, 2020

I can try to generalize it to cover the other case, but it would be great to get a feedback first in case there is a flaw in the logic somewhere.

@steven-johnson
Copy link
Contributor

This PR is pretty old. Is it still active?

@vksnk
Copy link
Member Author

vksnk commented Apr 12, 2021

This is still a problem which needs fixing, so yes. That being said, I discovered a test which still fails, so this needs to be figured out first. I am planning to return to async in couple weeks for some internal stuff, so hopefully can get this fixed too then.

@abadams
Copy link
Member

abadams commented Sep 28, 2023

This is still a problem which needs fixing, so yes. That being said, I discovered a test which still fails, so this needs to be figured out first. I am planning to return to async in couple weeks for some internal stuff, so hopefully can get this fixed too then.

What was the test that still fails? Is it part of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants