Fetch: trigger pending fetches when producing messages. #973

eolivelli · 2021-12-16T12:04:36Z

When the Kafka client issues a Fetch and sets a maxWait time we are already scheduling a DelayedFetch, but there is no way to trigger that Fetch and it is deemed to wait for the fully timeout.
This adds latency spikes on the Kafka Consumer.

With this patch we are triggering any pending DelayedFetch in case of writing any record to one of the partitions interested by the Fetch.

This is only a first implementation, in the future we can make it better and do not trigger at the first record, but wait in any case for more records to come.
With this implementation the Fetch result will contain usually only 1 record, but this is enough to let the Kafka Client start a new Fetch cycle and do not waste time in doing nothing (waiting for maxWait).

Changes:

trigger pending Fetches while producing to the topic
add new metric WAITING_FETCHES_TRIGGERED
add DelayedOperation#wakeup, that means the operation should wake up due to some trigger (in this case the Production of records to the topic)
add a new test that would fail without this patch (because the tests asserts that there is no idle cycle in the Consumer loop)

BewareMyPower

I think adding a parameter to tryComplete brings too many changes. Most of them only accept a false argument.

I think it's better to add a wakeup() method to DelayedOperation.

    public boolean wakeup() {
        // No ops
        return true;
    }

Then in DelayedFetch, override this method.

    @Override
    public boolean wakeup() {
        // if we are here then we were waiting for the condition
        // someone wrote some messages to one of the topics
        // trigger the Fetch from scratch
        restarted.set(true);
        messageFetchContext.onDataWrittenToSomePartition();
        return true;
    }

Finally, modify DelayedOperationPurgatory#Watchers#tryCompleted:

                } else if (curr.wakeup() && curr.maybeTryComplete()) {

BTW, I did a trick that wakeup() returns a boolean so that wakeup() and maybeTryComplete() calls can be combined in a line. The wakeup() method always returns true.

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/RequestStats.java

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaRequestHandler.java

eolivelli

@BewareMyPower I have addressed your comments.

Nice suggestions !

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/DelayedFetch.java

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/MessageFetchContext.java

...c/main/java/io/streamnative/pulsar/handlers/kop/utils/delayed/DelayedOperationPurgatory.java

BewareMyPower · 2021-12-21T04:05:07Z

I've left some comments, PTAL. BTW, please update the PR description because the current design changes a bit from the initial design.

…essageFetchContext.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>

…elayedFetch.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>

eolivelli · 2021-12-21T09:45:05Z

@BewareMyPower description updated.
if you want I can drop the "key" parameter as it is currently useless

eolivelli · 2021-12-21T11:35:47Z

@BewareMyPower I have removed that parameter
PTAL

eolivelli · 2021-12-21T12:39:55Z

@BewareMyPower @Demogorgon314 CI passed

When the Kafka client issues a Fetch and sets a maxWait time we are already scheduling a DelayedFetch, but there is no way to trigger that Fetch and it is deemed to wait for the fully timeout. This adds latency spikes on the Kafka Consumer. With this patch we are triggering any pending DelayedFetch in case of writing any record to one of the partitions interested by the Fetch. This is only a first implementation, in the future we can make it better and do not trigger at the first record, but wait in any case for more records to come. With this implementation the Fetch result will contain usually only 1 record, but this is enough to let the Kafka Client start a new Fetch cycle and do not waste time in doing nothing (waiting for maxWait). Changes: - trigger pending Fetches while producing to the topic - add new metric WAITING_FETCHES_TRIGGERED - add DelayedOperation#wakeup, that means the operation should wake up due to some trigger (in this case the Production of records to the topic) - add a new test that would fail without this patch (because the tests asserts that there is no idle cycle in the Consumer loop)

…amnative#973)" This reverts commit 37f0583

…" (#1034) This reverts commit 37f0583

BewareMyPower · 2022-01-26T15:02:09Z

I'll continue the discussion here. For the previous discussion:

See [BUG]openmessage Rebalance failed. Unable to consume #1032 for the bug caused by this PR.
Fix NPE caused by empty polls for a consumer of multiple partitions #1033 tried to fix the bug, but it seems far harder than I've thought.
Then Revert "Fetch: trigger pending fetches when producing messages. (#973)" #1034 reverted this PR.

The root cause is that the delayed fetch (DelayedFetch) holds a fetch context (MessageFetchContext) but the delayed fetch doesn't know if the fetch context is recycled. This PR calls KafkaRequestHandler#notifyPendingFetches in produce callback, which runs DelayedOperationPurgatory#checkAndComplete in pulsar-io thread. The stack is

DelayedOperationPurgatory#checkAndComplete
  DelayedOperationPurgatory#Watchers#tryCompleteWatched
    DelayedOperation#DelayedOperationPurgatory
      DelayedFetch#tryComplete
        MessageFetchContext#onDataWrittenToSomePartition  [1]
          MessageFetchContext#handleFetch                 [2]

However, the methods of MessageFetchContext are usually called in BookKeeperClientWorker-OrderedExecutor thread. Then the race condition happens.

The bug happens when a consumer tries to fetch multiple partitions. (The reason is to be figured out)

The first commit of #1033 tries to remove the associated delayed fetches from the purgatory. It fixes the NPE of [1] though this patch harms the performance. However, even ignore the performance overhead, it's very hard to solve the race condition in [2].

The root cause is that this PR makes MessageFetchContext#handleFetch called in a pulsar-io thread. It's too complicated to handle it carefully.

A typical error is

io.streamnative.pulsar.handlers.kop.utils.KopTopic$KoPTopicIllegalArgumentException: Invalid short topic name 'my-topic', it should be in the format of <tenant>/<namespace>/<topic> or <topic>
    at io.streamnative.pulsar.handlers.kop.utils.KopTopic.expandToFullName(KopTopic.java:77) ~[?:?]
    at io.streamnative.pulsar.handlers.kop.utils.KopTopic.<init>(KopTopic.java:59) ~[?:?]
    at io.streamnative.pulsar.handlers.kop.utils.KopTopic.toString(KopTopic.java:96) ~[?:?]
    at io.streamnative.pulsar.handlers.kop.MessageFetchContext.lambda$handleFetch$4(MessageFetchContext.java:330) ~[?:?]

It's because when handleFetch is called, the fetch context is recycled and namespacePrefix is null now. Then following code crashes.

            final String fullTopicName = KopTopic.toString(topicPartition, namespacePrefix);

I tried to perform some null checks. However, it's hard to solve the problem thoroughly.

In addition, here is an example code that can reproduce the bug with a KoP standalone.

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        final String topic = "my-topic";
        try (AdminClient client = AdminClient.create(KafkaUtils.newAdminProperties())) {
            client.createTopics(Collections.singletonList(new NewTopic(topic, 16, (short) 2))).all().get();
        }

        int n = 0;
        final int numMessages = 10000;
        final AtomicInteger numReceived = new AtomicInteger(0);
        final Object object = new Object();
        final AtomicBoolean consumeFailed = new AtomicBoolean(false);

        final ExecutorService executor = Executors.newSingleThreadExecutor();
        final Future<?> future = executor.submit(() -> {
            final Properties props = KafkaUtils.newKafkaProducerProperties();
            props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
            props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1048576);
            final CountDownLatch latch = new CountDownLatch(numMessages);
            final String value = newValue(100);
            try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
                synchronized (object) {
                    object.wait();
                }
                for (int i = 0; i < numMessages; i++) {
                    if (consumeFailed.get()) {
                        break;
                    }
                    final int index = i;
                    producer.send(new ProducerRecord<>(topic, value), (recordMetadata, e) -> {
                        if (e != null) {
                            log.error("Failed to send {}: {}", index, e.getMessage());
                        }
                        latch.countDown();
                    });
                    Thread.sleep(1);
                }
                if (!consumeFailed.get()) {
                    latch.await();
                }
            } catch (Exception e) {
                log.error("Failed to consume", e);
            }
        });

        final Properties props = KafkaUtils.newKafkaConsumerProperties();
        props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 3000);
        try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
            consumer.subscribe(Collections.singleton(topic), new ConsumerRebalanceListener() {
                @Override
                public void onPartitionsRevoked(Collection<TopicPartition> collection) {
                    // No ops
                }

                @Override
                public void onPartitionsAssigned(Collection<TopicPartition> collection) {
                    synchronized (object) {
                        object.notifyAll();
                    }
                }
            });
            consumer.poll(Duration.ofMillis(3000));
            while (n < 10000) {
                final Map<TopicPartition, OffsetAndMetadata> offsetMap = new HashMap<>();
                for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) {
                    log.info("Received from {}-{}@{}", record.topic(), record.partition(), record.offset());
                    offsetMap.put(new TopicPartition(record.topic(), record.partition()),
                            new OffsetAndMetadata(record.offset() + 1));
                    n++;
                    numReceived.incrementAndGet();
                }
                consumer.commitAsync(offsetMap, null);
            }
        } catch (Exception e) {
            log.error("Failed to consume at {}", numReceived, e);
            consumeFailed.set(true);
        }

        future.get();
        executor.shutdown();
    }

    private static String newValue(int size) {
        final byte[] bytes = new byte[size];
        Arrays.fill(bytes, (byte) 'a');
        return new String(bytes);
    }

For example, with my latest patch, it could still fail with

2022-01-26 22:22:14:342 [main] ERROR ProduceConsumeDemo - Failed to consume at 4933
java.lang.IllegalStateException: Unexpected error code 7 while fetching at offset 309 from topic-partition my-topic-0
	at org.apache.kafka.clients.consumer.internals.Fetcher.initializeCompletedFetch(Fetcher.java:1339)
	at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:613)
	at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1303)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)

eolivelli · 2022-01-26T16:26:55Z

Thank you @BewareMyPower
I will work on a fix.
Probably early next week

…" (#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)

…amnative#973)" (streamnative#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)

eolivelli added 2 commits December 16, 2021 12:54

Fetch: trigger pending Fetches when records are produced

9e47f6b

fix checkstyle

6f36139

eolivelli requested review from BewareMyPower and jiazhai as code owners December 16, 2021 12:04

Demogorgon314 assigned eolivelli Dec 17, 2021

BewareMyPower added type/enhancement Indicates an improvement to an existing feature release/2.8.1.x release/2.9.0 labels Dec 17, 2021

BewareMyPower reviewed Dec 17, 2021

View reviewed changes

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/RequestStats.java Show resolved Hide resolved

kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaRequestHandler.java Outdated Show resolved Hide resolved

BewareMyPower added release/2.9.1 and removed release/2.9.0 labels Dec 18, 2021

eolivelli added 2 commits December 20, 2021 09:23

Add docs

203f5ca

Move tryComplete(boolean) to wakeup()

605a342

eolivelli commented Dec 20, 2021

View reviewed changes

BewareMyPower reviewed Dec 21, 2021

View reviewed changes

eolivelli and others added 2 commits December 21, 2021 10:34

Update kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/M…

464c260

…essageFetchContext.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>

Update kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/D…

ec48967

…elayedFetch.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>

Address comments

4aae40a

BewareMyPower approved these changes Dec 22, 2021

View reviewed changes

BewareMyPower added release/2.8.2.x and removed release/2.8.1.x labels Dec 22, 2021

BewareMyPower merged commit 37f0583 into streamnative:master Dec 22, 2021

BewareMyPower added the cherry-picked/branch-2.8.2 label Dec 29, 2021

BewareMyPower added the cherry-picked/branch-2.9.1 label Dec 29, 2021

BewareMyPower mentioned this pull request Jan 25, 2022

Fix NPE caused by empty polls for a consumer of multiple partitions #1033

Closed

Demogorgon314 added a commit to Demogorgon314/kop that referenced this pull request Jan 26, 2022

Revert "Fetch: trigger pending fetches when producing messages. (stre…

ecafe5e

…amnative#973)" This reverts commit 37f0583

BewareMyPower pushed a commit that referenced this pull request Jan 26, 2022

Revert "Fetch: trigger pending fetches when producing messages. (#973)…

cb9aa4e

…" (#1034) This reverts commit 37f0583

BewareMyPower pushed a commit that referenced this pull request Jan 26, 2022

Revert "Fetch: trigger pending fetches when producing messages. (#973)…

f8f9ec8

…" (#1034) This reverts commit 37f0583

BewareMyPower pushed a commit that referenced this pull request Jan 26, 2022

Revert "Fetch: trigger pending fetches when producing messages. (#973)…

62da2b4

…" (#1034) This reverts commit 37f0583

eolivelli deleted the impl/trigger-delayed-fetch-kop branch January 26, 2022 15:06

Demogorgon314 added a commit that referenced this pull request Jan 27, 2022

Revert "Fetch: trigger pending fetches when producing messages. (#973)…

2d0ecfb

…" (#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)

BewareMyPower pushed a commit that referenced this pull request Feb 9, 2022

Revert "Fetch: trigger pending fetches when producing messages. (#973)…

35ac55d

…" (#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)

eolivelli pushed a commit to eolivelli/kop that referenced this pull request Feb 24, 2022

Revert "Fetch: trigger pending fetches when producing messages. (stre…

687fd10

…amnative#973)" (streamnative#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch: trigger pending fetches when producing messages. #973

Fetch: trigger pending fetches when producing messages. #973

Uh oh!

eolivelli commented Dec 16, 2021 •

edited

Loading

Uh oh!

BewareMyPower left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

eolivelli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BewareMyPower commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

BewareMyPower commented Jan 26, 2022 •

edited

Loading

Uh oh!

eolivelli commented Jan 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fetch: trigger pending fetches when producing messages. #973

Fetch: trigger pending fetches when producing messages. #973

Uh oh!

Conversation

eolivelli commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eolivelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BewareMyPower commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

eolivelli commented Dec 21, 2021

Uh oh!

BewareMyPower commented Jan 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eolivelli commented Jan 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eolivelli commented Dec 16, 2021 •

edited

Loading

BewareMyPower left a comment •

edited

Loading

BewareMyPower commented Jan 26, 2022 •

edited

Loading