Skip to content

[Testing] Improve integration test base classes#9672

Merged
merlimat merged 5 commits intoapache:masterfrom
lhotari:lh-improve-integration-test-base-classes
Feb 24, 2021
Merged

[Testing] Improve integration test base classes#9672
merlimat merged 5 commits intoapache:masterfrom
lhotari:lh-improve-integration-test-base-classes

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented Feb 22, 2021

Motivation

Integration test classes misuse BeforeSuite and AfterSuite annotations. When AfterSuite is used to cleanup resources, it results in resources piling up for the duration of the complete test suite run. It's like a memory leak since resources aren't properly cleaned up during test execution.

Some test classes implemented ITest, which caused misleading error messages from TestNG. There shouldn't be a need to implement ITest.

Besides the annotation issue, some resources were using static fields without a real reason.

The usage of BeforeSuite & static fields leads to issues such as the one described in the comment. For example for the backwards compatibility tests, the last container to get initialized will be used for all tests in the suite and the backwards compatibility isn't tested at all for other than the last container version.

Modifications

  • replace BeforeSuite -> BeforeClass and AfterSuite -> AfterClass
  • remove implements ITest
  • use instance fields instead of static fields. drop "static" from the methods that reference these fields.

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 22, 2021

/pulsarbot run-failure-checks

@merlimat
Copy link
Copy Markdown
Contributor

It looks like the failure on https://github.com/apache/pulsar/pull/9672/checks?check_run_id=1956401458 is genuine.

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 23, 2021

It looks like the failure on https://github.com/apache/pulsar/pull/9672/checks?check_run_id=1956401458 is genuine.

Thanks for pointing that out. I'm able to reproduce the issue locally too.

by cherry-picking the changes from #9626 , I can get the error message from the broker side.

command to run the test and leave the container running for inspection:

PULSAR_CONTAINERS_LEAVE_RUNNING=true TESTCONTAINERS_REUSE_ENABLE=true mvn -B -pl org.apache.pulsar.tests:integration test -DintegrationTests -DredirectTestOutputToFile=false -DtestRetryCount=0 -DfailIfNoTests=false -Dtest=SmokeTest2_2

with docker logs [container-id] , I can see this kind of exceptions in the logs:

09:39:04.343 [pulsar-io-49-5] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /172.28.0.1:34170
09:39:04.351 [pulsar-io-49-5] WARN  org.apache.pulsar.broker.service.ServerCnx - [/172.28.0.1:34170] Got exception UninitializedMessageException : Message was missing required fields.  (Lite runtime could not determine which fields were missing).
org.apache.pulsar.shaded.com.google.protobuf.v241.UninitializedMessageException: Message was missing required fields.  (Lite runtime could not determine which fields were missing).
        at org.apache.pulsar.shaded.com.google.protobuf.v241.AbstractMessageLite$Builder.newUninitializedMessageException(AbstractMessageLite.java:298) ~[org.apache.pulsar-protobuf-shaded-2.1.0-incubating.jar:2.1.0-incubating]
        at org.apache.pulsar.common.api.proto.PulsarApi$BaseCommand$Builder.build(PulsarApi.java:26263) ~[org.apache.pulsar-pulsar-common-2.2.0.jar:2.2.0]
        at org.apache.pulsar.common.api.PulsarDecoder.channelRead(PulsarDecoder.java:86) ~[org.apache.pulsar-pulsar-common-2.2.0.jar:2.2.0]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:806) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:413) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:313) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
09:39:04.355 [pulsar-io-49-5] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.28.0.1:34170

``

I'm just wondering what change in this PR makes this test fail. I'll continue investigation to find out.

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 23, 2021

@merlimat It seems that the original tests are invalid.

In this case, the TestNG test suite is defined this way in tests/integration/src/test/resources/pulsar-backwards-compatibility.xml:

<!DOCTYPE suite SYSTEM "https://testng.org/testng-1.0.dtd" >
<suite name="Pulsar Messaging Backwards Compatibility Tests" verbose="2" annotations="JDK">
    <test name="messaging-backwards-compatibility-suite" preserve-order="true">
        <classes>
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.SmokeTest2_2" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.SmokeTest2_3" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.SmokeTest2_4" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.SmokeTest2_5" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_3" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_4" />
            <class name="org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_5" />
        </classes>
    </test>
</suite>

What happens in the original code (before this PR) is that the containers get initialized for all test classes up-front. Since each test class uses BeforeSuite to start the containers and the container is held in a static field, this results in the last container to be initialized to be used for all test runs.

I verified this behavior locally. Here is the output of docker ps when the first test starts executing:

❯ docker ps
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS          PORTS                                              NAMES
294c5cf14f6e   apachepulsar/pulsar:2.5.0   "bin/pulsar standalo…"   14 seconds ago   Up 14 seconds   0.0.0.0:49259->6650/tcp, 0.0.0.0:49258->8080/tcp   immhznye-standalone
d83abdf6798b   apachepulsar/pulsar:2.4.0   "bin/pulsar standalo…"   24 seconds ago   Up 24 seconds   0.0.0.0:49257->6650/tcp, 0.0.0.0:49256->8080/tcp   ntjmaswt-standalone
1135a3fdc63a   apachepulsar/pulsar:2.3.0   "bin/pulsar standalo…"   34 seconds ago   Up 34 seconds   0.0.0.0:49255->6650/tcp, 0.0.0.0:49254->8080/tcp   ehkeslmc-standalone
28d87c681157   apachepulsar/pulsar:2.2.0   "bin/pulsar standalo…"   44 seconds ago   Up 44 seconds   0.0.0.0:49253->6650/tcp, 0.0.0.0:49252->8080/tcp   idpnqdfo-standalone
f0404f5eb71c   testcontainers/ryuk:0.3.0   "/app"                   45 seconds ago   Up 44 seconds   0.0.0.0:49251->8080/tcp                            testcontainers-ryuk-23a90d7b-ca18-49c7-ace2-f86a719d1311

Therefore, I think that deleting the failing test (SmokeTest2_2.testBatchIndexAckDisabled) is the way to resolve this.

….3.0

- fails with exception on server side:
  "org.apache.pulsar.shaded.com.google.protobuf.v241.UninitializedMessageException:
   Message was missing required fields.
   (Lite runtime could not determine which fields were missing)."

- test wasn't executed before because of usage of BeforeSuite and static
  fields for initializing the test container
@lhotari lhotari force-pushed the lh-improve-integration-test-base-classes branch from e3267bf to 2359667 Compare February 23, 2021 14:42
@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 23, 2021

/pulsarbot run-failure-checks

5 similar comments
@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 23, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 24, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 24, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 24, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented Feb 24, 2021

/pulsarbot run-failure-checks

@merlimat merlimat added this to the 2.8.0 milestone Feb 24, 2021
@merlimat merlimat merged commit 2c77e01 into apache:master Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants