-
Notifications
You must be signed in to change notification settings - Fork 142
Fetch: trigger pending fetches when producing messages. #973
Fetch: trigger pending fetches when producing messages. #973
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a parameter to tryComplete brings too many changes. Most of them only accept a false argument.
I think it's better to add a wakeup() method to DelayedOperation.
public boolean wakeup() {
// No ops
return true;
}Then in DelayedFetch, override this method.
@Override
public boolean wakeup() {
// if we are here then we were waiting for the condition
// someone wrote some messages to one of the topics
// trigger the Fetch from scratch
restarted.set(true);
messageFetchContext.onDataWrittenToSomePartition();
return true;
}Finally, modify DelayedOperationPurgatory#Watchers#tryCompleted:
} else if (curr.wakeup() && curr.maybeTryComplete()) {BTW, I did a trick that wakeup() returns a boolean so that wakeup() and maybeTryComplete() calls can be combined in a line. The wakeup() method always returns true.
kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/RequestStats.java
Show resolved
Hide resolved
kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaRequestHandler.java
Outdated
Show resolved
Hide resolved
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BewareMyPower I have addressed your comments.
Nice suggestions !
kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/DelayedFetch.java
Outdated
Show resolved
Hide resolved
kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/MessageFetchContext.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/streamnative/pulsar/handlers/kop/utils/delayed/DelayedOperationPurgatory.java
Show resolved
Hide resolved
|
I've left some comments, PTAL. BTW, please update the PR description because the current design changes a bit from the initial design. |
…essageFetchContext.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>
…elayedFetch.java Co-authored-by: Yunze Xu <xyzinfernity@163.com>
|
@BewareMyPower description updated. |
|
@BewareMyPower I have removed that parameter |
|
@BewareMyPower @Demogorgon314 CI passed |
When the Kafka client issues a Fetch and sets a maxWait time we are already scheduling a DelayedFetch, but there is no way to trigger that Fetch and it is deemed to wait for the fully timeout. This adds latency spikes on the Kafka Consumer. With this patch we are triggering any pending DelayedFetch in case of writing any record to one of the partitions interested by the Fetch. This is only a first implementation, in the future we can make it better and do not trigger at the first record, but wait in any case for more records to come. With this implementation the Fetch result will contain usually only 1 record, but this is enough to let the Kafka Client start a new Fetch cycle and do not waste time in doing nothing (waiting for maxWait). Changes: - trigger pending Fetches while producing to the topic - add new metric WAITING_FETCHES_TRIGGERED - add DelayedOperation#wakeup, that means the operation should wake up due to some trigger (in this case the Production of records to the topic) - add a new test that would fail without this patch (because the tests asserts that there is no idle cycle in the Consumer loop)
When the Kafka client issues a Fetch and sets a maxWait time we are already scheduling a DelayedFetch, but there is no way to trigger that Fetch and it is deemed to wait for the fully timeout. This adds latency spikes on the Kafka Consumer. With this patch we are triggering any pending DelayedFetch in case of writing any record to one of the partitions interested by the Fetch. This is only a first implementation, in the future we can make it better and do not trigger at the first record, but wait in any case for more records to come. With this implementation the Fetch result will contain usually only 1 record, but this is enough to let the Kafka Client start a new Fetch cycle and do not waste time in doing nothing (waiting for maxWait). Changes: - trigger pending Fetches while producing to the topic - add new metric WAITING_FETCHES_TRIGGERED - add DelayedOperation#wakeup, that means the operation should wake up due to some trigger (in this case the Production of records to the topic) - add a new test that would fail without this patch (because the tests asserts that there is no idle cycle in the Consumer loop)
…amnative#973)" This reverts commit 37f0583
|
I'll continue the discussion here. For the previous discussion:
The root cause is that the delayed fetch ( However, the methods of The bug happens when a consumer tries to fetch multiple partitions. (The reason is to be figured out) The first commit of #1033 tries to remove the associated delayed fetches from the purgatory. It fixes the NPE of The root cause is that this PR makes A typical error is It's because when final String fullTopicName = KopTopic.toString(topicPartition, namespacePrefix);I tried to perform some null checks. However, it's hard to solve the problem thoroughly. In addition, here is an example code that can reproduce the bug with a KoP standalone. public static void main(String[] args) throws ExecutionException, InterruptedException {
final String topic = "my-topic";
try (AdminClient client = AdminClient.create(KafkaUtils.newAdminProperties())) {
client.createTopics(Collections.singletonList(new NewTopic(topic, 16, (short) 2))).all().get();
}
int n = 0;
final int numMessages = 10000;
final AtomicInteger numReceived = new AtomicInteger(0);
final Object object = new Object();
final AtomicBoolean consumeFailed = new AtomicBoolean(false);
final ExecutorService executor = Executors.newSingleThreadExecutor();
final Future<?> future = executor.submit(() -> {
final Properties props = KafkaUtils.newKafkaProducerProperties();
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1048576);
final CountDownLatch latch = new CountDownLatch(numMessages);
final String value = newValue(100);
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
synchronized (object) {
object.wait();
}
for (int i = 0; i < numMessages; i++) {
if (consumeFailed.get()) {
break;
}
final int index = i;
producer.send(new ProducerRecord<>(topic, value), (recordMetadata, e) -> {
if (e != null) {
log.error("Failed to send {}: {}", index, e.getMessage());
}
latch.countDown();
});
Thread.sleep(1);
}
if (!consumeFailed.get()) {
latch.await();
}
} catch (Exception e) {
log.error("Failed to consume", e);
}
});
final Properties props = KafkaUtils.newKafkaConsumerProperties();
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 3000);
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Collections.singleton(topic), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> collection) {
// No ops
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> collection) {
synchronized (object) {
object.notifyAll();
}
}
});
consumer.poll(Duration.ofMillis(3000));
while (n < 10000) {
final Map<TopicPartition, OffsetAndMetadata> offsetMap = new HashMap<>();
for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) {
log.info("Received from {}-{}@{}", record.topic(), record.partition(), record.offset());
offsetMap.put(new TopicPartition(record.topic(), record.partition()),
new OffsetAndMetadata(record.offset() + 1));
n++;
numReceived.incrementAndGet();
}
consumer.commitAsync(offsetMap, null);
}
} catch (Exception e) {
log.error("Failed to consume at {}", numReceived, e);
consumeFailed.set(true);
}
future.get();
executor.shutdown();
}
private static String newValue(int size) {
final byte[] bytes = new byte[size];
Arrays.fill(bytes, (byte) 'a');
return new String(bytes);
}For example, with my latest patch, it could still fail with |
|
Thank you @BewareMyPower |
…amnative#973)" (streamnative#1034) This reverts commit 37f0583 (cherry picked from commit cb9aa4e)
When the Kafka client issues a Fetch and sets a maxWait time we are already scheduling a DelayedFetch, but there is no way to trigger that Fetch and it is deemed to wait for the fully timeout.
This adds latency spikes on the Kafka Consumer.
With this patch we are triggering any pending DelayedFetch in case of writing any record to one of the partitions interested by the Fetch.
This is only a first implementation, in the future we can make it better and do not trigger at the first record, but wait in any case for more records to come.
With this implementation the Fetch result will contain usually only 1 record, but this is enough to let the Kafka Client start a new Fetch cycle and do not waste time in doing nothing (waiting for maxWait).
Changes: