Skip to content

Conversation

@merlimat
Copy link
Contributor

Motivation

There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:

Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.

There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:
```
Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService
```

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.
@merlimat merlimat merged commit c579d53 into apache:master May 31, 2020
@merlimat merlimat deleted the pr-6ddc831f64 branch May 31, 2020 16:17
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 1, 2020
There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:
```
Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService
```

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 1, 2020
There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:
```
Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService
```

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 12, 2020
There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:
```
Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService
```

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
There seems to be a race when mocking the compactor in
MockedPulsarServiceBaseTest which yields errors like:
```
Caused by: java.lang.ClassCastException: org.apache.pulsar.compaction.TwoPhaseCompactor$MockitoMock$1141048386 cannot be cast to org.apache.pulsar.broker.service.BrokerService
```

This obviously causes tests to fail in strange and surprising
ways. I've not been able to reproduce, but the issue seems to be with
how we mock the compactor.

We don't want to have to construct the whole compactor as we just want
to spy on it, so we get the current compactor, wrap it in spy and tell
the PulsarService instance to return it when getCompactor is called.

However, we are doing this after we have already called
PulsarService#start(). So there are threads already using the mock
structures. The mock structures themselves are not thread safe, so
modifying them while they are in use, is not a good idea.

The fix is to finish mocking before invoking #start(). Do do this,
I've broken out the Compactor construction method in PulsarService, so
that alone can be mocked to wrap the object in a spy.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
jiazhai pushed a commit to streamnative/kop that referenced this pull request Jul 29, 2021
### Motivation

The KoP CI tests take much more time than CI tests of branch-2.7.2. 

The main reason is the `cleanup()` phase takes a long time, each time a test is cleaned up, it will take over 10 seconds to complete. This behavior was introduced from apache/pulsar#10199, which made broker shutdown gracefully by default but it would take longer to shutdown.

The other reason is caused by rebalance time. According to my observes, when a Kafka consumer subscribes a topic in KoP, it will take at least 3 seconds. Finally I found it's caused by the GroupInitialRebalanceDelayMs config, which has the same semantics with Kafka's [group.initial.rebalance.delay.ms](https://kafka.apache.org/documentation/#brokerconfigs_group.initial.rebalance.delay.ms). It makes Kafka server wait longer for `JOIN_GROUP` request for more consumers to join so that the rebalance count can reduce. However, it should be set zero  in tests.

After fixing these problems, sometimes the following error may happen and cause flakiness.

```
TwoPhaseCompactor$MockitoMock$1432102834 cannot be returned by getConfiguration()
getConfiguration() should return ServiceConfiguration
***
If you're unsure why you're getting above error read on.
Due to the nature of the syntax above problem might occur because:
1. This exception *might* occur in wrongly written multi-threaded tests.
   Please refer to Mockito FAQ on limitations of concurrency testing.
2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 
   - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method.
```

It's because `PulsarService#newCompactor` is not mocked well, see apache/pulsar#7102 for detail.

### Modifications

- Configure `GroupInitialRebalanceDelayMs` and `BrokerShutdownTimeoutMs` for each mocked `BrokerService`.
- Fix the flakiness caused by mocking compactor.

After the changes, the tests time has a significant improvement.

For example, `GroupCoordinatorTest` takes only 3 minutes now but it could take 9 minutes before. Because the `cleanup()` method is marked as `@AfterMethod` and would be called each time a single test finished.

Another example is that `BasicEndToEndKafkaTest` takes only 37 seconds now but it could take 56 seconds before. The `cleanup()` is marked as `@AfterClass` and only happens once, but many consumers will be created during the whole tests and each time a subscribe call can take 3 seconds.

* Speed up tests and fix flakiness

* Speed up tests that creat their own configs

* Ignore golang-sarama tests
BewareMyPower added a commit to streamnative/kop that referenced this pull request Aug 5, 2021
### Motivation

The KoP CI tests take much more time than CI tests of branch-2.7.2. 

The main reason is the `cleanup()` phase takes a long time, each time a test is cleaned up, it will take over 10 seconds to complete. This behavior was introduced from apache/pulsar#10199, which made broker shutdown gracefully by default but it would take longer to shutdown.

The other reason is caused by rebalance time. According to my observes, when a Kafka consumer subscribes a topic in KoP, it will take at least 3 seconds. Finally I found it's caused by the GroupInitialRebalanceDelayMs config, which has the same semantics with Kafka's [group.initial.rebalance.delay.ms](https://kafka.apache.org/documentation/#brokerconfigs_group.initial.rebalance.delay.ms). It makes Kafka server wait longer for `JOIN_GROUP` request for more consumers to join so that the rebalance count can reduce. However, it should be set zero  in tests.

After fixing these problems, sometimes the following error may happen and cause flakiness.

```
TwoPhaseCompactor$MockitoMock$1432102834 cannot be returned by getConfiguration()
getConfiguration() should return ServiceConfiguration
***
If you're unsure why you're getting above error read on.
Due to the nature of the syntax above problem might occur because:
1. This exception *might* occur in wrongly written multi-threaded tests.
   Please refer to Mockito FAQ on limitations of concurrency testing.
2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 
   - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method.
```

It's because `PulsarService#newCompactor` is not mocked well, see apache/pulsar#7102 for detail.

### Modifications

- Configure `GroupInitialRebalanceDelayMs` and `BrokerShutdownTimeoutMs` for each mocked `BrokerService`.
- Fix the flakiness caused by mocking compactor.

After the changes, the tests time has a significant improvement.

For example, `GroupCoordinatorTest` takes only 3 minutes now but it could take 9 minutes before. Because the `cleanup()` method is marked as `@AfterMethod` and would be called each time a single test finished.

Another example is that `BasicEndToEndKafkaTest` takes only 37 seconds now but it could take 56 seconds before. The `cleanup()` is marked as `@AfterClass` and only happens once, but many consumers will be created during the whole tests and each time a subscribe call can take 3 seconds.

* Speed up tests and fix flakiness

* Speed up tests that creat their own configs

* Ignore golang-sarama tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants