-
-
Notifications
You must be signed in to change notification settings - Fork 72
Fixes multiple reliability-Problems around Pad.block() #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
addinf Data-Probes are synchronized already
|
rebased to master |
neilcsmith-net
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have concerns about some of the changes, and think they should be split into separate PRs. Will add code comments.
| nb-configuration.xml | ||
|
|
||
| .idea | ||
| *.iml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
| return PadProbeReturn.REMOVE; | ||
| } | ||
| }, GstPadAPI.GST_PAD_PROBE_TYPE_BLOCKING | GstPadAPI.GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM); | ||
| }, GstPadAPI.GST_PAD_PROBE_TYPE_IDLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned this is changing the semantics of an existing method. Even if it currently has some issues. I'm not against it, as it's probably the right change. Are there any uses that might break with this change?
I am inclined to fix block() but also deprecate it with a view to adding generic addProbe() support. The idea of gst1-java-core is to stick close to implementing upstream GStreamer behaviour rather than add to it.
Having said that, wonder whether to deprecate block() and add a new method with your fixes - onIdle(Runnable r) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the Javadoc, .block does "Run a runnable under a blocked state".
GST_PAD_PROBE_TYPE_IDLE does that as well as GST_PAD_PROBE_TYPE_BLOCKING with the only difference, that GST_PAD_PROBE_TYPE_BLOCKING will only be called as soon as there is actual data flowing on a Pad (and the Data-Pushing Thread is then calling the Probe) whereas GST_PAD_PROBE_TYPE_IDLE can also be called directly from the probe-add.
The implication of this is, that a Probe for GST_PAD_PROBE_TYPE_BLOCKING might never be called, when there is no actual dataflow (ie. the source is some kind of network source, lets say an udpsrc) and you want to unlink it, then blocking for GST_PAD_PROBE_TYPE_BLOCKING might not work if the sending side does not send anymore packets and your probe will never be called.
Before GST_PAD_PROBE_TYPE_IDLE was introduced, one had to manually inject an event into the pipeline force a callback for GST_PAD_PROBE_TYPE_BLOCKING even when no data is flowing.
To to answer your question: I think that only blocking for GST_PAD_PROBE_TYPE_IDLE does exactly what the Javadoc and Method-Name indicates and what the caller expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're mixing up GST_PAD_PROBE_TYPE_BLOCKING and GST_PAD_PROBE_TYPE_BLOCK. Now, I'm not sure if anyone could be relying on the behaviour of the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I mixed them up in my comment.
In the usage GstPadAPI.GST_PAD_PROBE_TYPE_BLOCKING | GstPadAPI.GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM the Mask did match both, BLOCK and IDLE, which would then possibly call the Runnable twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my thought too, and why is the event mask there?! This was added before I took on the fork, but it does seem wrong - let's go with your change here and if anyone complains we can rethink it!
| /** | ||
| * Run a supplier (a callback which returns a value) under a blocked state and wait for the runnable to complete. Return the value to | ||
| * the waiting thread. | ||
| * The pad will remain blocked for as long as the callback is active. | ||
| * The callback may be called from this thread or another thread. | ||
| * | ||
| * @param supplier The code to run when pad is blocked | ||
| */ | ||
| public <T> T blockAndWait(final Supplier<T> supplier) { | ||
| CompletableFuture<T> future = new CompletableFuture<>(); | ||
| block(() -> { | ||
| T result = supplier.get(); | ||
| future.complete(result); | ||
| }); | ||
|
|
||
| try { | ||
| return future.get(1, TimeUnit.MINUTES); | ||
| } catch (InterruptedException | ExecutionException | TimeoutException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Run a runnable under a blocked state and wait for the runnable to complete. | ||
| * The pad will remain blocked for as long as the callback is active. | ||
| * The callback may be called from this thread or another thread. | ||
| * | ||
| * @param callback The code to run when pad is blocked | ||
| */ | ||
| public void blockAndWait(final Runnable callback) { | ||
| blockAndWait(() -> { | ||
| callback.run(); | ||
| return null; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not add these. They are things better implemented outside of the core library, and not sure about the hardcoded assumptions - eg. timeout. Probably better achieved by passing a FutureTask into block() / onIdle() and calling get() explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a fairly common use-case, for example when changing the Pipeline from a GUI Thread or from a Network Thread. In C there is no well defined way to synchronize these Paths, but in Java there is.
The hardcoded Timeout should probably better be a Parameter with a Default-Overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fairly common way to cause problems too! 😄 And encouraging people to block their GUI or network thread is not high on my agenda. Considering people can easily do, if they really need this and know there's no deadlock -
var task = new FutureTask(() -> connectSomething() ), null);
pad.block(task);
task.get(40, TimeUnit.DAYS);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can see that this API might lead to unexpected Uses. I'm fine with removing it.
I just found that I wrote the same Boilerplate in multiple places around .block and chose to add it; but I can see why you do not want this to be part of the API Bindings. I'll remove it from the splitted PRs.
| }; | ||
|
|
||
| GCallback cb = new GCallback(handle.addProbe(GstPadAPI.GST_PAD_PROBE_TYPE_BUFFER, probe), probe) { | ||
| GCallback cb = new GCallback(handle.addProbe(GstPadAPI.GST_PAD_PROBE_TYPE_BUFFER, probe, this), probe) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this - see later comment!
| private synchronized NativeLong addProbe(int mask, GstPadAPI.PadProbeCallback probe) { | ||
|
|
||
| private synchronized NativeLong addProbe(int mask, GstPadAPI.PadProbeCallback probe, Pad pad) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but we can't keep reference to Pad in Pad.Handle. This will cause memory leaks, even if it fixes your immediate problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a Test-Program (see Gist in the original Issue) spin around .block() a couple million times and monitored the stack with VisualVM, and there is no memory-leak here (not only does the Heap not grow, but also the number of Classes on the Heap breathes with the GC; there is no accumulation of stale Instances).
The String Ref to the Pad is stored alongside with the Probe-Callback and removed when the Probe-Callback is removed. The GC is then free to clean the Pad up. The reference in Handle is only held as long as there is a Probe pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about other methods than block().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the Probe-Callback (not only for block but for all Probes) refers to the Pad, the Pad-Instance has to live as long as the Probe does. When the Probe goes away, it has to call Handle.removeProbe (it does so through GCallback.disconnect when the Callback is removed).
Thus I cannot see, how the Pad-Instance in the Handle could form a Memleak without the Probe also forming one. It should be fairly simple to generate a small piece of Code that exposes such a leak ant it should be fairly easy to spot in VisualVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean because it's an inner class (not the method argument)? That the Pad and Probe reference each other isn't a problem. But Handle should not under any circumstances have a hard reference to the object it's a handle for. Handle has a very particular job to do around memory handling, and this risks bringing in huge problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that I mis-used (and did not understand) Handle here. Let's talk about alternatives. I though about implementing the Probe-Object as a Closure that holds an implicit reference to the surrounding Pad; something like
Pad padRef = this;
final GstPadAPI.PadProbeCallback probe = new GstPadAPI.PadProbeCallback() {
…
return listener.eventReceived(padRef, event);
…
}
This should also keep a Reference, as long as the inner Probe-Object exists. I have to check this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that but it alone does not help. To copy from the Issue (and remind myself of the actual problem): The root cause is, that
- the GstPadAPI.PadProbeCallback instance is only held on the Pad object
- nothing guarantees that the Pad object lives long enough so that not Path, the Pad and the GstPadAPI.PadProbeCallback instance are GCed while the Probe is in place.
So somewhere the Pad OR the GstPadAPI.PadProbeCallback instance needs to be stored within the Java Object-Graph while the Probe is active within the GStreamer Stack.
OR, because we can attach a Reference to the Pad to the GstPadAPI.PadProbeCallback by making it a final reference as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle is part of a dual hierarchy that is used in NativeObject.NativeRef - https://github.com/gstreamer-java/gst1-java-core/blob/master/src/org/freedesktop/gstreamer/glib/NativeObject.java#L241 Note that that is a weak reference to the native object with a strong reference to the handle - it's how the memory handling works and the handle having a strong reference back to the actual native object is likely to cause problems.
My inclination may still be to store the Pads inside the Element where necessary. But let's clarify where exactly the Pad is coming from, and the ownership / refcounts on it? I have some code I found that tests some probe calls for GC issues from earlier problems - not sure when - but that doesn't have an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual Code triggering the Problem reliably is this: https://github.com/MaZderMind/gintercom-mixing-core/blob/master/src/main/java/de/mazdermind/gintercom/mixingcore/Panel.java#L123-L132
As yo ucan see, the Pad-Object is generated from Pad mixerPad = ((GhostPad) pad).getTarget();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see if adding @CallerOwnsReturn to gst_element_get_request_pad helps? eg. https://github.com/gstreamer-java/gst1-java-core/blob/master/src/org/freedesktop/gstreamer/lowlevel/GstElementAPI.java#L77 doesn't match up with transfer-full at https://developer.gnome.org/gstreamer/stable/GstElement.html#gst-element-get-request-pad
|
I'm fine with splitting the PR into multiple. Let's finish the discussion here and then I'll close this one and open individual PRs per group of changes. |
|
I extracted all changes related to the reliability of |
This PR fixes multiple Reliability-Problems around Pad.block() that are descussed in great length in #184. The PR contains multiple fixes and a new Api,
Pad.blockAndWait, so If you want I can split it up into multiple PRs. The different fixes and changes are in different well separated commits though, so maybe that's enough.fixes #184