Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,12 @@ property, when available, is noted below.
effect due to TLS handshake timeout when there are too many in-flight TLS
handshakes. Set it to something like 250 is good enough to avoid herd effect.

* *netty.server.earlyDropSecureConnectionHandshakes*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this to be false by default. However, AFACT this behaviour was enabled in 3.7.0 by default. Maybe we should mention it in the documentation. (and also in the release doc of 3.7.1 and 3.8.0 if we don't forget)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that "feature" was enabled in 3.6. this is way Pulsar and Pravega users are not able to upgrade ZK.

I can update the docs and explain the story.
I would like this patch to land to 3.6. 3.7 and 3.8

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

(Java system property: **zookeeper.netty.server.earlyDropSecureConnectionHandshakes**)
If the ZooKeeper server is not fully started, drop TCP connections before performing the TLS handshake.
This is useful in order to prevent flooding the server with many concurreny TLS handshakes after a restart.
Please note that if you enable this flag the server won't answer to 'ruok' commands if it is not fully started.

* *throttledOpWaitTime*
(Java system property: **zookeeper.throttled_op_wait_time**)
The time in the RequestThrottler queue longer than which a request will be marked as throttled.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,10 @@ private void receiveMessage(ByteBuf message) {
}
ZooKeeperServer zks = this.zkServer;
if (zks == null || !zks.isRunning()) {
throw new IOException("ZK down");
LOG.warn("Closing connection to {} because the server is not ready",
getRemoteSocketAddress());
close(DisconnectReason.IO_EXCEPTION);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know we were throwing IOException before which resulted in DisconnectReason.IO_EXCEPTION, but now that we do proper close here, we might use some better DisconnectReason. I see e.g. DisconnectReason.SERVER_SHUTDOWN. Although I'm not sure when this part is triggered... I guess this can be triggered either during initialization or shutdown. Maybe DisconnectReason.IO_EXCEPTION is good enough.

Also: before this change we did log the 'getRemoteSocketAddress()' in line 534, which might be handy to add to this warning above. (if someone is trying to figure out in the ZK log why a session terminated)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the same "ZK down" exception in line 477. Maybe the logic should be changed there as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same behaviour as before.
I would like to not change the behaviour with this patch if it is not strictly needed.

before this change we did log the 'getRemoteSocketAddress()' in line 534, which might be handy to add to this warning above
we are still printing it. I cannot get this comment

return;
}
// checkRequestSize will throw IOException if request is rejected
zks.checkRequestSizeWhenReceivingMessage(len);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ public class NettyServerCnxnFactory extends ServerCnxnFactory {
* Allow client-server sockets to accept both SSL and plaintext connections
*/
public static final String PORT_UNIFICATION_KEY = "zookeeper.client.portUnification";
public static final String EARLY_DROP_SECURE_CONNECTION_HANDSHAKES = "zookeeper.netty.server.earlyDropSecureConnectionHandshakes";
private final boolean shouldUsePortUnification;

/**
Expand Down Expand Up @@ -227,11 +228,14 @@ public void channelActive(ChannelHandlerContext ctx) throws Exception {

// Check the zkServer assigned to the cnxn is still running,
// close it before starting the heavy TLS handshake
if (!cnxn.isZKServerRunning()) {
LOG.warn("Zookeeper server is not running, close the connection before starting the TLS handshake");
ServerMetrics.getMetrics().CNXN_CLOSED_WITHOUT_ZK_SERVER_RUNNING.add(1);
channel.close();
return;
if (secure && !cnxn.isZKServerRunning()) {
boolean earlyDropSecureConnectionHandshakes = Boolean.getBoolean(EARLY_DROP_SECURE_CONNECTION_HANDSHAKES);
Comment thread
symat marked this conversation as resolved.
if (earlyDropSecureConnectionHandshakes) {
LOG.warn("Zookeeper server is not running, close the connection before starting the TLS handshake");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know we had warning level before, but what do you think about INFO level instead? I like to have logs here, just not sure if this is really something the user should worry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same thing we printed before in the "catch" block below, but without spamming the logs with a stacktrace and with a meaning less message.

So I did this way in order to not change the behaviour too much but at least removing the stacktrace

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to INFO

ServerMetrics.getMetrics().CNXN_CLOSED_WITHOUT_ZK_SERVER_RUNNING.add(1);
channel.close();
return;
}
}

if (handshakeThrottlingEnabled) {
Expand Down Expand Up @@ -510,6 +514,7 @@ private ServerBootstrap configureBootstrapAllocator(ServerBootstrap bootstrap) {
x509Util = new ClientX509Util();

boolean usePortUnification = Boolean.getBoolean(PORT_UNIFICATION_KEY);

LOG.info("{}={}", PORT_UNIFICATION_KEY, usePortUnification);
if (usePortUnification) {
try {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import static org.junit.jupiter.api.Assertions.fail;
import static org.mockito.Mockito.doNothing;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.when;
import io.netty.channel.Channel;
import io.netty.channel.ChannelFuture;
Expand Down Expand Up @@ -180,42 +181,70 @@ public void testNonMTLSRemoteConn() throws Exception {
when(zks.isRunning()).thenReturn(true);
ServerStats.Provider providerMock = mock(ServerStats.Provider.class);
when(zks.serverStats()).thenReturn(new ServerStats(providerMock));
testNonMTLSRemoteConn(zks);
testNonMTLSRemoteConn(zks, false, false);
}

@Test
public void testNonMTLSRemoteConnZookKeeperServerNotReady() throws Exception {
testNonMTLSRemoteConn(null);
testNonMTLSRemoteConn(null, false, false);
}

@Test
public void testNonMTLSRemoteConnZookKeeperServerNotReadyEarlyDropEnabled() throws Exception {
testNonMTLSRemoteConn(null, false, true);
}

@Test
public void testMTLSRemoteConnZookKeeperServerNotReadyEarlyDropEnabled() throws Exception {
testNonMTLSRemoteConn(null, true, true);
}

@Test
public void testMTLSRemoteConnZookKeeperServerNotReadyEarlyDropDisabled() throws Exception {
testNonMTLSRemoteConn(null, true, true);
}

@SuppressWarnings("unchecked")
private void testNonMTLSRemoteConn(ZooKeeperServer zks) throws Exception {
Channel channel = mock(Channel.class);
ChannelId id = mock(ChannelId.class);
ChannelFuture success = mock(ChannelFuture.class);
ChannelHandlerContext context = mock(ChannelHandlerContext.class);
ChannelPipeline channelPipeline = mock(ChannelPipeline.class);

when(context.channel()).thenReturn(channel);
when(channel.pipeline()).thenReturn(channelPipeline);
when(success.channel()).thenReturn(channel);
when(channel.closeFuture()).thenReturn(success);

InetSocketAddress address = new InetSocketAddress(0);
when(channel.remoteAddress()).thenReturn(address);
when(channel.id()).thenReturn(id);
NettyServerCnxnFactory factory = new NettyServerCnxnFactory();
factory.setZooKeeperServer(zks);
Attribute atr = mock(Attribute.class);
Mockito.doReturn(atr).when(channel).attr(
Mockito.any()
);
doNothing().when(atr).set(Mockito.any());
factory.channelHandler.channelActive(context);

if (zks != null) {
assertEquals(0, zks.serverStats().getNonMTLSLocalConnCount());
assertEquals(1, zks.serverStats().getNonMTLSRemoteConnCount());
private void testNonMTLSRemoteConn(ZooKeeperServer zks, boolean secure, boolean earlyDrop) throws Exception {
System.setProperty(NettyServerCnxnFactory.EARLY_DROP_SECURE_CONNECTION_HANDSHAKES, earlyDrop + "");
try {
Channel channel = mock(Channel.class);
ChannelId id = mock(ChannelId.class);
ChannelFuture success = mock(ChannelFuture.class);
ChannelHandlerContext context = mock(ChannelHandlerContext.class);
ChannelPipeline channelPipeline = mock(ChannelPipeline.class);

when(context.channel()).thenReturn(channel);
when(channel.pipeline()).thenReturn(channelPipeline);
when(success.channel()).thenReturn(channel);
when(channel.closeFuture()).thenReturn(success);

InetSocketAddress address = new InetSocketAddress(0);
when(channel.remoteAddress()).thenReturn(address);
when(channel.id()).thenReturn(id);
NettyServerCnxnFactory factory = new NettyServerCnxnFactory();
factory.setSecure(secure);
factory.setZooKeeperServer(zks);
Attribute atr = mock(Attribute.class);
Mockito.doReturn(atr).when(channel).attr(
Mockito.any()
);
doNothing().when(atr).set(Mockito.any());
factory.channelHandler.channelActive(context);

if (zks != null) {
assertEquals(0, zks.serverStats().getNonMTLSLocalConnCount());
assertEquals(1, zks.serverStats().getNonMTLSRemoteConnCount());
} else {
if (earlyDrop && secure) {
// the channel must have been forcibly closed
Mockito.verify(channel, times(1)).close();
} else {
Mockito.verify(channel, times(0)).close();
}
}
} finally {
System.clearProperty(NettyServerCnxnFactory.EARLY_DROP_SECURE_CONNECTION_HANDSHAKES);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you clear this in the finally block, then let's move the setProperty call into the try block. (now if the test killed just before the try block, the property won't be cleared. - very-very unlikely, but still...)

Alternatively I would be OK to put the clearProperty call to the afterEach() method and then you don't need the try-finally block.

Also just double-checking: we don't run multiple test classes (or methods in the same class) paralel in the same JVM, right? Let's make sure we avoid some flaky execution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't run multiple test classes (or methods in the same class)
this is correct, and as ZK uses System properties it wont't be possible in the short term

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's move the setProperty call into the try block
@symat fixed

}
}

Expand Down