Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Apr 14, 2021

No description provided.

@pitrou pitrou requested a review from lidavidm April 14, 2021 14:29
@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

cc @westonpace

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

I've checked locally that this fixes the TSan failures.

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue was that we might be in the middle of SpawnReal on an external thread while the main thread was destructing the state (e.g. it exited early due to an error or something)?

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

Yes.

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

Hm, wait, I shouldn't unlock before notifying in MarkFinished().

@lidavidm
Copy link
Member

Additionally, since we're handing around Executor* instead of shared_ptr or something, there's still the possibility that an I/O thread tries to use a dangling reference, right? The main thread has to make sure to block and wait for all I/O threads it spawns under all conditions.

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

Well, the idea is that the top-level task waits for all dependent tasks to finish, I think.

@lidavidm
Copy link
Member

True, it's just a potentially easy pitfall (though it comes with the territory I suppose).

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

Things may get tricky if cancellation or early exit is involved indeed...

@pitrou pitrou force-pushed the ARROW-12379-tsan branch from 9a07ed0 to 6e35a3f Compare April 14, 2021 14:58
@lidavidm
Copy link
Member

Seems Crossbow built Arrow master, not this branch. I threw up https://github.com/lidavidm/crossbow/actions/runs/748753207

@lidavidm
Copy link
Member

The crossbow job passes!

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

@lidavidm
Copy link
Member

Seemed the integration build timed out in Java but everything else passed.

@lidavidm lidavidm closed this in cb7a624 Apr 14, 2021
@pitrou pitrou deleted the ARROW-12379-tsan branch April 14, 2021 16:21
@westonpace
Copy link
Member

westonpace commented Apr 14, 2021

Thanks @pitrou . A few thoughts (nothing worrisome):

Why deque instead of queue? I don't mind the change. I just want to make sure I'm not missing some piece of knowledge.

Things may get tricky if cancellation or early exit is involved indeed...\

Cancellation should still flush the task queue. Early exit (via bad Status) should still flush the task queue. It is important we do because in both cases the remaining tasks might need to do some cleanup to handle the error. Exceptions/assertions might not but we're already in crash territory there.

@pitrou
Copy link
Member Author

pitrou commented Apr 14, 2021

It's only for consistency, since deque is used for the threadpool executor.

@github-actions
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants