use producer to publish message for metrics collected by pulsar by dockerzhang · Pull Request #228 · streamnative/kop

dockerzhang · 2020-11-11T11:35:41Z

to solve #226, and this patch can improve producing performance about 5 times.

BewareMyPower · 2020-11-12T02:55:14Z

The main issue is that Pulsar consumer cannot consume messages produced from Kafka producer. It looks like the original implementation does some extra work.

The following is just an experiment. Let's change doPublishMessages to

                        // >1. the old way
                        persistentTopic.publishMessage(
                                headerAndPayload,
                                MessagePublishContext.get(
                                        offsetFuture, persistentTopic, System.nanoTime()));
                        // >2. the new way
                        topicManager.registerProducerInPersistentTopic(topic.toString(), persistentTopic);
                        topicManager.getReferenceProducer(topic.toString()).publishMessage(0, 0,
                                headerAndPayload, size, false);
                        offsetFuture.complete(Long.valueOf(size));

to produce 2 same messages once.

And change the testKafkaProducePulsarConsumeMessageOrder to receive twice:

            msg = consumer.receive(1000, TimeUnit.MILLISECONDS);
            assertNotNull(msg);
            msg = consumer.receive(1000, TimeUnit.MILLISECONDS);
            assertNotNull(msg);

Then the unit test passed even if >1 and >2 are swapped.So maybe we need to figure out what the extra work persistentTopic.publishMessage does.

By the way, the NPE of Producer$MessagePublishContext.completed(Producer.java:377) seems to be not relative to this issue.

BewareMyPower · 2020-11-12T04:37:18Z

It looks like that the original implementation with a stats update could solve the problem:

                        topicManager.getReferenceProducer(topic.toString())
                                .getTopic().incrementPublishCount(size, headerAndPayload.readableBytes());
                        persistentTopic.publishMessage(
                                headerAndPayload,
                                MessagePublishContext.get(
                                        offsetFuture, persistentTopic, System.nanoTime()));

In addition, if the problem was solved, a unit test should be added to verify the stats has been updated:

Use kafka producer to produce some messages;
Use pulsar admin to get topic stats to verify bytesInCounter and msgInCounter.

BewareMyPower · 2020-11-12T05:01:17Z

Also I found the possible reason. It still may be related to the NPE of Producer$MessagePublishContext.completed(Producer.java:377).

When use service.Producer#publishMessage, it would finally call the same method PersistentTopic#publishMessage:

    public void publishMessage(ByteBuf headersAndPayload, PublishContext publishContext) {
        pendingWriteOps.incrementAndGet(); // ** [1] pendingWriteOps++ here, we need to -- later **/ 
        /* ... */
        switch (status) {
            case NotDup:
                // this is a PersistentTopic instance that has implemented addComplete/addFailed methods
                ledger.asyncAddEntry(headersAndPayload, this, publishContext);
                break;
            /* other cases... */
        }
    }

The only difference is the publishContext. The original implemenation uses a io.streamnative.pulsar.handlers.kop.MessagePublishContext whose complete method only update the publish latency and complete the future.

while Producer#publishMessage uses Producer.MessagePublishContext whose complete method is:

        public void completed(Exception exception, long ledgerId, long entryId) {
            if (exception != null) {
                /* ... */
            } else {
                /* ... */
                this.ledgerId = ledgerId;
                this.entryId = entryId;
                // NPE happens here, producer.cnx.ctx() == null
                producer.cnx.ctx().channel().eventLoop().execute(this);
            }
        }

And see the PersistentTopic#addComplete:

    public void addComplete(Position pos, Object ctx) {
        PublishContext publishContext = (PublishContext) ctx;
        PositionImpl position = (PositionImpl) pos;

        messageDeduplication.recordMessagePersisted(publishContext, position);
        // NPE caused by Producer.MessagePublishContext#complete, so the following
        // pendingWriteOps-- would be skipped.
        publishContext.completed(null, position.getLedgerId(), position.getEntryId());
        // ** [2] pendingWriteOps-- here, it has ++ in publishMessage  **/ 
        decrementPendingWriteOpsAndCheck();
    }

I'm not sure if the reference count leak is the reason, but the NPE here is really an issue. See the example stack:

java.lang.NullPointerException: null
	at org.apache.pulsar.broker.service.Producer$MessagePublishContext.completed(Producer.java:377) ~[pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.addComplete(PersistentTopic.java:370) ~[pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.bookkeeper.mledger.impl.OpAddEntry.safeRun(OpAddEntry.java:192) ~[managed-ledger-2.6.2.jar:2.6.2]

BewareMyPower · 2020-11-13T03:46:42Z

Add an explanation for

improve producing performance about 5 times.

After discussing with @dockerzhang , the reason is that the original commit used Producer#publishMessage, which has a default PublishContext that doesn't parse MessageId but simply set offset to 0. Though the parsed offset is not used anywhere now, see #230 .

use producer to publish message for metrics collected by pulsar

19c26b1

dockerzhang requested review from jiazhai and sijie as code owners November 11, 2020 11:35

BewareMyPower assigned dockerzhang Nov 11, 2020

BewareMyPower linked an issue Nov 11, 2020 that may be closed by this pull request

[BUG] the producer metrics of kop topic is not changed when producing #226

Closed

dockerzhang added 3 commits November 12, 2020 19:19

add UT

8c20da4

fix checkstyle

f6e6cf1

fix UT

3425df8

dockerzhang force-pushed the producer-publish branch from 26b69af to 3425df8 Compare November 12, 2020 12:21

BewareMyPower suggested changes Nov 12, 2020

View reviewed changes

Comment thread tests/src/test/java/io/streamnative/pulsar/handlers/kop/KafkaRequestTypeTest.java Outdated

BewareMyPower reviewed Nov 12, 2020

View reviewed changes

Comment thread tests/src/test/java/io/streamnative/pulsar/handlers/kop/KafkaRequestTypeTest.java Outdated

remove redundant code

3f2dea3

BewareMyPower approved these changes Nov 13, 2020

View reviewed changes

BewareMyPower merged commit b8528a3 into streamnative:master Nov 13, 2020

dockerzhang mentioned this pull request Dec 24, 2020

[BUG] msgRateIn and msgThroughputIn is 0 #284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use producer to publish message for metrics collected by pulsar#228

use producer to publish message for metrics collected by pulsar#228
BewareMyPower merged 5 commits intostreamnative:masterfrom
dockerzhang:producer-publish

dockerzhang commented Nov 11, 2020

Uh oh!

BewareMyPower commented Nov 12, 2020 •

edited

Loading

Uh oh!

BewareMyPower commented Nov 12, 2020

Uh oh!

BewareMyPower commented Nov 12, 2020

Uh oh!

Uh oh!

Uh oh!

BewareMyPower commented Nov 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dockerzhang commented Nov 11, 2020

Uh oh!

BewareMyPower commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower commented Nov 12, 2020

Uh oh!

BewareMyPower commented Nov 12, 2020

Uh oh!

Uh oh!

Uh oh!

BewareMyPower commented Nov 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BewareMyPower commented Nov 12, 2020 •

edited

Loading