Skip to content

Conversation

@MaZderMind
Copy link
Contributor

@MaZderMind MaZderMind commented Apr 5, 2020

This PR fixes multiple Reliability-Problems around Pad.block() that are descussed in great length in #184. The PR contains multiple fixes and a new Api, Pad.blockAndWait, so If you want I can split it up into multiple PRs. The different fixes and changes are in different well separated commits though, so maybe that's enough.

fixes #184

@MaZderMind MaZderMind changed the title Pad block Fixes multiple reliability-Problems around Pad.block() Apr 5, 2020
@MaZderMind
Copy link
Contributor Author

rebased to master

Copy link
Member

@neilcsmith-net neilcsmith-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have concerns about some of the changes, and think they should be split into separate PRs. Will add code comments.

nb-configuration.xml

.idea
*.iml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

return PadProbeReturn.REMOVE;
}
}, GstPadAPI.GST_PAD_PROBE_TYPE_BLOCKING | GstPadAPI.GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM);
}, GstPadAPI.GST_PAD_PROBE_TYPE_IDLE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this is changing the semantics of an existing method. Even if it currently has some issues. I'm not against it, as it's probably the right change. Are there any uses that might break with this change?

I am inclined to fix block() but also deprecate it with a view to adding generic addProbe() support. The idea of gst1-java-core is to stick close to implementing upstream GStreamer behaviour rather than add to it.

Having said that, wonder whether to deprecate block() and add a new method with your fixes - onIdle(Runnable r) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Javadoc, .block does "Run a runnable under a blocked state".
GST_PAD_PROBE_TYPE_IDLE does that as well as GST_PAD_PROBE_TYPE_BLOCKING with the only difference, that GST_PAD_PROBE_TYPE_BLOCKING will only be called as soon as there is actual data flowing on a Pad (and the Data-Pushing Thread is then calling the Probe) whereas GST_PAD_PROBE_TYPE_IDLE can also be called directly from the probe-add.

The implication of this is, that a Probe for GST_PAD_PROBE_TYPE_BLOCKING might never be called, when there is no actual dataflow (ie. the source is some kind of network source, lets say an udpsrc) and you want to unlink it, then blocking for GST_PAD_PROBE_TYPE_BLOCKING might not work if the sending side does not send anymore packets and your probe will never be called.

Before GST_PAD_PROBE_TYPE_IDLE was introduced, one had to manually inject an event into the pipeline force a callback for GST_PAD_PROBE_TYPE_BLOCKING even when no data is flowing.

To to answer your question: I think that only blocking for GST_PAD_PROBE_TYPE_IDLE does exactly what the Javadoc and Method-Name indicates and what the caller expects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're mixing up GST_PAD_PROBE_TYPE_BLOCKING and GST_PAD_PROBE_TYPE_BLOCK. Now, I'm not sure if anyone could be relying on the behaviour of the former.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I mixed them up in my comment.

In the usage GstPadAPI.GST_PAD_PROBE_TYPE_BLOCKING | GstPadAPI.GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM the Mask did match both, BLOCK and IDLE, which would then possibly call the Runnable twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my thought too, and why is the event mask there?! This was added before I took on the fork, but it does seem wrong - let's go with your change here and if anyone complains we can rethink it!

Comment on lines +336 to +369
/**
* Run a supplier (a callback which returns a value) under a blocked state and wait for the runnable to complete. Return the value to
* the waiting thread.
* The pad will remain blocked for as long as the callback is active.
* The callback may be called from this thread or another thread.
*
* @param supplier The code to run when pad is blocked
*/
public <T> T blockAndWait(final Supplier<T> supplier) {
CompletableFuture<T> future = new CompletableFuture<>();
block(() -> {
T result = supplier.get();
future.complete(result);
});

try {
return future.get(1, TimeUnit.MINUTES);
} catch (InterruptedException | ExecutionException | TimeoutException e) {
throw new RuntimeException(e);
}
}

/**
* Run a runnable under a blocked state and wait for the runnable to complete.
* The pad will remain blocked for as long as the callback is active.
* The callback may be called from this thread or another thread.
*
* @param callback The code to run when pad is blocked
*/
public void blockAndWait(final Runnable callback) {
blockAndWait(() -> {
callback.run();
return null;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not add these. They are things better implemented outside of the core library, and not sure about the hardcoded assumptions - eg. timeout. Probably better achieved by passing a FutureTask into block() / onIdle() and calling get() explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a fairly common use-case, for example when changing the Pipeline from a GUI Thread or from a Network Thread. In C there is no well defined way to synchronize these Paths, but in Java there is.

The hardcoded Timeout should probably better be a Parameter with a Default-Overload.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fairly common way to cause problems too! 😄 And encouraging people to block their GUI or network thread is not high on my agenda. Considering people can easily do, if they really need this and know there's no deadlock -

var task = new FutureTask(() -> connectSomething() ), null);
pad.block(task);
task.get(40, TimeUnit.DAYS);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can see that this API might lead to unexpected Uses. I'm fine with removing it.
I just found that I wrote the same Boilerplate in multiple places around .block and chose to add it; but I can see why you do not want this to be part of the API Bindings. I'll remove it from the splitted PRs.

};

GCallback cb = new GCallback(handle.addProbe(GstPadAPI.GST_PAD_PROBE_TYPE_BUFFER, probe), probe) {
GCallback cb = new GCallback(handle.addProbe(GstPadAPI.GST_PAD_PROBE_TYPE_BUFFER, probe, this), probe) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this - see later comment!

private synchronized NativeLong addProbe(int mask, GstPadAPI.PadProbeCallback probe) {

private synchronized NativeLong addProbe(int mask, GstPadAPI.PadProbeCallback probe, Pad pad) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but we can't keep reference to Pad in Pad.Handle. This will cause memory leaks, even if it fixes your immediate problem.

Copy link
Contributor Author

@MaZderMind MaZderMind Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a Test-Program (see Gist in the original Issue) spin around .block() a couple million times and monitored the stack with VisualVM, and there is no memory-leak here (not only does the Heap not grow, but also the number of Classes on the Heap breathes with the GC; there is no accumulation of stale Instances).

The String Ref to the Pad is stored alongside with the Probe-Callback and removed when the Probe-Callback is removed. The GC is then free to clean the Pad up. The reference in Handle is only held as long as there is a Probe pending.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about other methods than block().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the Probe-Callback (not only for block but for all Probes) refers to the Pad, the Pad-Instance has to live as long as the Probe does. When the Probe goes away, it has to call Handle.removeProbe (it does so through GCallback.disconnect when the Callback is removed).

Thus I cannot see, how the Pad-Instance in the Handle could form a Memleak without the Probe also forming one. It should be fairly simple to generate a small piece of Code that exposes such a leak ant it should be fairly easy to spot in VisualVM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean because it's an inner class (not the method argument)? That the Pad and Probe reference each other isn't a problem. But Handle should not under any circumstances have a hard reference to the object it's a handle for. Handle has a very particular job to do around memory handling, and this risks bringing in huge problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that I mis-used (and did not understand) Handle here. Let's talk about alternatives. I though about implementing the Probe-Object as a Closure that holds an implicit reference to the surrounding Pad; something like

Pad padRef = this;
final GstPadAPI.PadProbeCallback probe = new GstPadAPI.PadProbeCallback() {
   …
   return listener.eventReceived(padRef, event);
   …
}

This should also keep a Reference, as long as the inner Probe-Object exists. I have to check this, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but it alone does not help. To copy from the Issue (and remind myself of the actual problem): The root cause is, that

  • the GstPadAPI.PadProbeCallback instance is only held on the Pad object
  • nothing guarantees that the Pad object lives long enough so that not Path, the Pad and the GstPadAPI.PadProbeCallback instance are GCed while the Probe is in place.

So somewhere the Pad OR the GstPadAPI.PadProbeCallback instance needs to be stored within the Java Object-Graph while the Probe is active within the GStreamer Stack.

OR, because we can attach a Reference to the Pad to the GstPadAPI.PadProbeCallback by making it a final reference as mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle is part of a dual hierarchy that is used in NativeObject.NativeRef - https://github.com/gstreamer-java/gst1-java-core/blob/master/src/org/freedesktop/gstreamer/glib/NativeObject.java#L241 Note that that is a weak reference to the native object with a strong reference to the handle - it's how the memory handling works and the handle having a strong reference back to the actual native object is likely to cause problems.

My inclination may still be to store the Pads inside the Element where necessary. But let's clarify where exactly the Pad is coming from, and the ownership / refcounts on it? I have some code I found that tests some probe calls for GC issues from earlier problems - not sure when - but that doesn't have an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual Code triggering the Problem reliably is this: https://github.com/MaZderMind/gintercom-mixing-core/blob/master/src/main/java/de/mazdermind/gintercom/mixingcore/Panel.java#L123-L132

As yo ucan see, the Pad-Object is generated from Pad mixerPad = ((GhostPad) pad).getTarget();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaZderMind
Copy link
Contributor Author

I'm fine with splitting the PR into multiple. Let's finish the discussion here and then I'll close this one and open individual PRs per group of changes.

@MaZderMind
Copy link
Contributor Author

I extracted all changes related to the reliability of Pad.block into their respective PRs; closing this PR.

@MaZderMind MaZderMind closed this Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reliability-Problems around Pad.block()

2 participants