Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

Conversation

@Demogorgon314
Copy link
Member

@Demogorgon314 Demogorgon314 commented Jan 4, 2022

Fixes #957 #1007

Motivation

Currently, the KoP use Kafka's implantation to check duplicate message, but it is hard to support MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1.

However, we should reuse the Pulsar message duplication check in KoP by mapping baseSequence to sequenceId and lastSequence to highestSequenceId.

Pulsar is using producer name to persist sequenced, in KoP we want to follow the Kafka behavior. So we need to use a name role to build a producer name.

Because Kafka will reuse PID when the transaction ID is the same but will increase the producer Enoch. So we need to ensure the producer name is not the same.

So the producer name role is PID_PREFIX-{producerId}-{producerEpoch}.

Modifications

Support MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1, and reuse Pulsar message deuplication.

@Demogorgon314 Demogorgon314 self-assigned this Jan 4, 2022
@Demogorgon314 Demogorgon314 changed the title [WIP] Reusing the Pulsar message deduplication [FEATURE] Reusing the Pulsar message deduplication Jan 4, 2022
@Demogorgon314 Demogorgon314 force-pushed the impl/reuse-pulsar-message-deduplication branch from 0f2ed27 to 0f847cf Compare January 18, 2022 02:13
@Demogorgon314 Demogorgon314 marked this pull request as ready for review January 19, 2022 02:16
@BewareMyPower BewareMyPower added the doc-required This pr needs a document label Jan 28, 2022
@BewareMyPower
Copy link
Collaborator

Add the doc-required label since we need to enable deduplication at Pulsar side.

Copy link
Collaborator

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I've left some comments, PTAL. I'll review again after comments are addressed.

@Demogorgon314 Demogorgon314 force-pushed the impl/reuse-pulsar-message-deduplication branch from 940b142 to 1310cd7 Compare January 29, 2022 06:18
@BewareMyPower BewareMyPower merged commit 1841b82 into streamnative:master Jan 30, 2022
@Demogorgon314 Demogorgon314 deleted the impl/reuse-pulsar-message-deduplication branch January 30, 2022 07:50
BewareMyPower pushed a commit that referenced this pull request Feb 9, 2022
Fixes #957 #1007

## Motivation
Currently, the KoP use Kafka's implantation to check duplicate message, but it is hard to support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`.

However, we should reuse the Pulsar message duplication check in KoP by mapping `baseSequence` to `sequenceId` and `lastSequence` to `highestSequenceId`.

Pulsar is using producer name to persist sequenced, in KoP we want to follow the Kafka behavior. So we need to use a name role to build a producer name.

Because Kafka will reuse PID when the transaction ID is the same but will increase the producer Enoch. So we need to ensure the producer name is not the same.

So the producer name role is `PID_PREFIX-{producerId}-{producerEpoch}`.

## Modifications
Support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`, and reuse Pulsar message deuplication.

(cherry picked from commit 1841b82)
BewareMyPower pushed a commit that referenced this pull request Feb 9, 2022
Fixes #957 #1007

## Motivation
Currently, the KoP use Kafka's implantation to check duplicate message, but it is hard to support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`.

However, we should reuse the Pulsar message duplication check in KoP by mapping `baseSequence` to `sequenceId` and `lastSequence` to `highestSequenceId`.

Pulsar is using producer name to persist sequenced, in KoP we want to follow the Kafka behavior. So we need to use a name role to build a producer name.

Because Kafka will reuse PID when the transaction ID is the same but will increase the producer Enoch. So we need to ensure the producer name is not the same.

So the producer name role is `PID_PREFIX-{producerId}-{producerEpoch}`.

## Modifications
Support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`, and reuse Pulsar message deuplication.
BewareMyPower pushed a commit that referenced this pull request Feb 9, 2022
Fixes #957 #1007

## Motivation
Currently, the KoP use Kafka's implantation to check duplicate message, but it is hard to support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`.

However, we should reuse the Pulsar message duplication check in KoP by mapping `baseSequence` to `sequenceId` and `lastSequence` to `highestSequenceId`.

Pulsar is using producer name to persist sequenced, in KoP we want to follow the Kafka behavior. So we need to use a name role to build a producer name.

Because Kafka will reuse PID when the transaction ID is the same but will increase the producer Enoch. So we need to ensure the producer name is not the same.

So the producer name role is `PID_PREFIX-{producerId}-{producerEpoch}`.

## Modifications
Support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`, and reuse Pulsar message deuplication.

(cherry picked from commit 1841b82)
eolivelli pushed a commit to eolivelli/kop that referenced this pull request Feb 24, 2022
Fixes streamnative#957 streamnative#1007

Currently, the KoP use Kafka's implantation to check duplicate message, but it is hard to support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`.

However, we should reuse the Pulsar message duplication check in KoP by mapping `baseSequence` to `sequenceId` and `lastSequence` to `highestSequenceId`.

Pulsar is using producer name to persist sequenced, in KoP we want to follow the Kafka behavior. So we need to use a name role to build a producer name.

Because Kafka will reuse PID when the transaction ID is the same but will increase the producer Enoch. So we need to ensure the producer name is not the same.

So the producer name role is `PID_PREFIX-{producerId}-{producerEpoch}`.

Support `MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION > 1`, and reuse Pulsar message deuplication.

(cherry picked from commit 1841b82)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support idempotence with max.in.flight.requests.per.connection > 1

2 participants