-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][meta] Fix ephemeral Zookeeper put which creates a persistent znode #23902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work @heesung-sn
| CompletableFuture<Void> result = new CompletableFuture<>(); | ||
| store.put(path, payload, Optional.of(version), EnumSet.of(CreateOption.Ephemeral)) | ||
| .thenAccept(stat -> { | ||
| if (!stat.isEphemeral()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit this error from testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundle, where the lock suddenly became persistent, after invalidating the cache and updating it with null.
This is a bit surprising to me that the ephemeral lock can suddenly become persistent.
cache.invalidateLocalOwnerCache();
final var lock = pulsar.getCoordinationService().getLockManager(NamespaceEphemeralData.class)
.acquireLock(ServiceUnitUtils.path(bundle), new NamespaceEphemeralData()).join();
lock.updateValue(null);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this problem is caused by org.apache.zookeeper.MockZooKeeper? org.apache.pulsar.zookeeper.ZookeeperServerTest would be a real Zookeeper server. It would be great to have support directly in org.apache.pulsar.broker.testcontext.PulsarTestContext to use a real Zookeeper server. It's not that hard to implement in a similar way as withMockZookeeper():
pulsar/pulsar-broker/src/test/java/org/apache/pulsar/broker/testcontext/PulsarTestContext.java
Lines 450 to 490 in b6cfecc
| /** | |
| * Configure this PulsarTestContext to use a mock ZooKeeper instance which is | |
| * shared for both the local and configuration metadata stores. | |
| * | |
| * @return the builder | |
| */ | |
| public Builder withMockZookeeper() { | |
| return withMockZookeeper(false); | |
| } | |
| /** | |
| * Configure this PulsarTestContext to use a mock ZooKeeper instance. | |
| * | |
| * @param useSeparateGlobalZk if true, the global (configuration) zookeeper will be a separate instance | |
| * @return the builder | |
| */ | |
| public Builder withMockZookeeper(boolean useSeparateGlobalZk) { | |
| try { | |
| mockZooKeeper(createMockZooKeeper()); | |
| if (useSeparateGlobalZk) { | |
| mockZooKeeperGlobal(createMockZooKeeper()); | |
| } | |
| } catch (Exception e) { | |
| throw new RuntimeException(e); | |
| } | |
| return this; | |
| } | |
| private MockZooKeeper createMockZooKeeper() throws Exception { | |
| MockZooKeeper zk = MockZooKeeper.newInstance(MoreExecutors.newDirectExecutorService()); | |
| List<ACL> dummyAclList = new ArrayList<>(0); | |
| ZkUtils.createFullPathOptimistic(zk, "/ledgers/available/192.168.1.1:" + 5000, | |
| "".getBytes(StandardCharsets.UTF_8), dummyAclList, CreateMode.PERSISTENT); | |
| zk.create("/ledgers/LAYOUT", "1\nflat:1".getBytes(StandardCharsets.UTF_8), dummyAclList, | |
| CreateMode.PERSISTENT); | |
| registerCloseable(zk::shutdown); | |
| return zk; | |
| } |
Adding similar
withRealZookeeper() (using org.apache.pulsar.zookeeper.ZookeeperServerTest) would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rings a bell, found #13066. There's probably more changes required to implement it properly in MockZooKeeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/MultiBrokerTestZKBaseTest.java as a way how it's currently possible to use a real Zookeeper with tests. However, adding direct support to PulsarTestContext and making it easy to override some protected method in MockedPulsarServiceBaseTest to choose it would be more flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heesung-sn I pushed some changes to this PR for MockZooKeeperSession/MockZooKeeper to support ephemeral owner. The value for persistent nodes is now 0. Support was missing completely for multi-ops. In the mock zookeeper solution, the session id gets passed in a thread local since there's not another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundle method with a real ZooKeeper before this fix, and the result was the same as after this fix; future.get() always managed to get a result, causing the unit test to fail. It appears that the testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundle method previously relied on a flaw in MockZooKeeper, which incidentally allowed it to pass.
Next, should we:
- Optimize the
testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundlemethod to ensure its flow can pass and guarantee the code merge. - Remove the part of the code in
ResourceLockImpl.javathat actively deletes the ZooKeeper node, as it is redundant. After fixing MockZooKeeper or using a real ZooKeeper, this piece of code will no longer be needed. - Add support for
withRealZookeeperinPulsarTestContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👆@heesung-sn do you have a chance to check what @Joforde suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried 2. and still see some test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning to get back to this work on Thursday this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heesung-sn I added .withTestZookeeper (uses TestZKServer which is real ZooKeeper) support to PulsarTestContext and an easy way to use this in tests, that's in commit 17ec37a. I modified BrokerServiceLookupTest to use it so that it runs for both in 083d3ed. That's useful for comparing the results and catching problems when using real ZooKeeper. It seems that there's a bigger mess to fix.
|
@heesung-sn I'm closing this PR since this is superseded by #23984. I'll push a new PR with the remaining fixes, including fixes to MockZooKeeper to properly support ephemeral nodes and properly return ZK's stat information. That appears to be broken in MockZooKeeper with many other details. |
|
PR to fix remaining issues: #23988 |
Fixes #23889
Motivation
Fixes #23889
zk.put creates persistent znode although it passes ephemeral node creation option.
Modifications
ref : https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html
Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: