-
Notifications
You must be signed in to change notification settings - Fork 16
Fix deadlock when an error occurs in the frame_generator #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
179fedc to
28b0a23
Compare
|
@rskew , when I was looking into this, i suspected that the try / catch around the frame_generator on
I noticed your PR looks to release in the destroy stream. Did you consider releasing the lock in _create_frames_generator funciton? |
28b0a23 to
e5e722f
Compare
|
@jonochang The lock is released in
which when aiko_services/src/aiko_services/main/pipeline.py Lines 1579 to 1581 in f2e42a1
The bug was in |
|
@jonochang Although note that in our testing using https://github.com/silverpond/aiko_services we saw slightly different behaviour due to also using the commits from #42 However the bug is the same |
Example graph:
__________
/ \ \
A B ---- C --->
\___/______/
has syntax in a pipeline definition:
"graph": [
"(A B (A.a_out_1: b_in_1 A.a_out_2: b_in_2) C (A.a_out_1: c_in_1 B.b_out_1: c_in_2 A.a_out_2: c_in_3))"
],
Note that output names must be fully-qualified, e.g. "B.b_out_1" instead
of "b_out_1". This is due to the graph traversal not yet handling edges
defined between B and C in the example graph, only between A and B, and
between A and C.
PipelineImpl posts to the listening response_queue and/or response_topic when the stream is destroyed. A stream creator might pass a queue_response or topic response when calling create_stream(), so it can be notified as frames are processed. If a stream exits due to an error in process_frame then these listeners will be notified, but if the stream exits without error then the listeners will not previously be notified.
destroy_stream() in an error condition Currently when an error is raised, _process_stream_event() will call destroy_stream() directly so that the stream is immediately terminated and cleaned up. However, _process_stream_event() releases stream.lock before calling destroy_stream(), allowing another thread to update stream.state before destroy_stream() can stop and clean-up the stream, meaning that stream.state cannot be used to signal that an error condition has occurred.
…es' start_stream() method to False, making the use of create_frames() the default
… pipeline elements
geekscape
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your fix.
I've made this change (just the fix) and pushed to master.
Since that change broke two of the unit tests, I've made a further change, which has also been pushed to master.
I have not yet included the unit test associated with the PR#45 fix, because it sounded like there is still a problem with that test to be resolved.
| raise RuntimeError("Simulated frame generator exception - this should cause unreleased lock!") | ||
|
|
||
| def process_frame(self, stream, **kwargs) -> Tuple[aiko.StreamEvent, dict]: | ||
| self.logger.warning(f"Processin frame {stream.frame_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Processin" --> "Processing"
|
Thanks @geekscape |
7d54176 to
239f6cb
Compare
When an exception is thrown in the frame generator, the stream lock was not being released.
This PR adds a test showing the problem, and the fix.
The test should pass, but reverting the changes to pipeline.py will cause the test to fail.