Skip to content

Prevent panics by closing channels on unregister#31

Closed
daniellowtw wants to merge 2 commits intodonovanhide:masterfrom
daniellowtw:master
Closed

Prevent panics by closing channels on unregister#31
daniellowtw wants to merge 2 commits intodonovanhide:masterfrom
daniellowtw:master

Conversation

@daniellowtw
Copy link

There is a possibility of writing to a closed channel in the code.

The select block chooses a case at random when multiple cases are ready. There is a possible race in the code where case pub := <-srv.pub: is chosen over case sub := <-srv.unregister: and as a result, writing to the closed channel.

@donovanhide
Copy link
Owner

Thanks for the patch, looks good! Did you experience a panic in running code? Do you think you could add a test which recreates the scenario and fails with the old code and passes with the new?

@daniellowtw
Copy link
Author

Yes, we sometimes experienced a panic when the client disconnects, resulting in the buffer to fill up.

I would be happy to write a test, but this bug is subtle, and the panic happens in go srv.run, which is hard to assert in test. If you can provide some guidance on how the test should be structured, I can fill it in.

If you're interested, this is the test code which will cause the panic, but doesn't fail the test


func TestOutChanGetsClosedWhenBufferIsFull(t *testing.T) {
	// This is a regression test to make sure that when the receiving channel is full, we close the channel to prevent a race condition that can write to a closed channel, causing panic.
	for i := 0; i < 1000; i++ {
		server := NewServer()
		// create a chan of buffer size 0, so we'll always block.
		outChan := make(chan Event, 1)
		testChannelName := "foo"
		sub := &subscription{
			channel: testChannelName,
			out:     outChan,
		}
		// add server subscription
		server.subs <- sub
		wg := sync.WaitGroup{}
		wg.Add(1)
		for i := 0; i < 100; i++ {
			go func() {
				wg.Wait()
				server.pub <- &outbound{
					channels: []string{testChannelName},
				}
			}()
		}
		wg.Done()
	}
}

@donovanhide
Copy link
Owner

donovanhide commented Jul 17, 2017

Test panics for me :-)

Maybe have a look at these ideas for catching a panic using recover.

https://stackoverflow.com/questions/31595791/how-to-test-panics

Oops, just realised the panic happens in another goroutine.... Will have a think and get back!

@donovanhide
Copy link
Owner

This is admittedly a hard thing to test....
What I'm trying to work out is if your patch can allow the "closing of an already closed channel" panic to occur when:

https://github.com/daniellowtw/eventsource/blob/dbe5ca5ad8ebe78f9650de72bfb83d084c360e65/server.go#L145
https://github.com/daniellowtw/eventsource/blob/dbe5ca5ad8ebe78f9650de72bfb83d084c360e65/server.go#L153

might both be called when a connection closes and invokes:
https://github.com/daniellowtw/eventsource/blob/dbe5ca5ad8ebe78f9650de72bfb83d084c360e65/server.go#L95

and the out buffered channel is already full. Any thoughts on that?

@daniellowtw
Copy link
Author

Yes, the wrapping of go srv.run does make testing a lot more tricky. I've included test cases anyway, since running go test on the terminal does cause the test to fail.

As to closing an already closed channel, checking whether the subscription still exist should suffice. The out channel is not closed iff it is in the subscription map.

@daniellowtw daniellowtw changed the title Force unregister to be synchronous Prevent panics by closing channels on unregister Jul 17, 2017
@donovanhide
Copy link
Owner

So, I tried to rewrite your tests to only use the exported methods of Server and Stream with BufferSize set to a small number, and quickly came to the conclusion that a misbehaving client can actually block all calls to Publish() on the server, which is perhaps an even bigger bug :-) A timeout mechanism is obviously needed, probably using context.Context.

I'd be interested in hearing your views on this and if it might affect your approach to this actual issue, given what sounds like high loads in your application.

@donovanhide
Copy link
Owner

Apologies for the diversion, but related and interesting reads:
https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
golang/go#16100

@daniellowtw
Copy link
Author

https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
golang/go#16100

Those are indeed interesting reads.

...a misbehaving client can actually block all calls to Publish() on the server

Is a misbehaving client here a slow reading client, or one that refuses to close the connection? I don't see how it can block calls to Publish() on the server. The worst case scenario I can see is slow or inactive clients can leak file descriptor, as described in the post. However, I'm not really sure how big of an issue inactive clients will be in general.

A timeout mechanism is obviously needed

Do you mean associate with each subscriber a 'write timeout' so that if they take too long to read the channel, they should be unregistered? This could certainly solve the inactive client problem, but it might be tricky to do the bookkeeping. I think the current way of letting the buffer fill up and unregistering them as slow client is a good compromise.

@donovanhide
Copy link
Owner

Sorry, got confused by the default clause in select when sending on a full channel vs sending on a closed channel. Will get back to you tomorrow about the test!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants