Add Kafka message metadata fields (offset, partition, topic) to batch processing for deduplication support by selina415 · Pull Request #141 · turbolytics/sql-flow

selina415 · 2025-09-03T23:33:47Z

This PR exposes additional Kafka message metadata fields in the InferredMemBatch handler to enable idempotent consumer patterns and deduplication logic, addressing issue #140.

turbolytics · 2025-09-04T10:28:05Z

        return self

-    def write(self, bs):
+    def write(self, bs, offset=None, partition=None, topic=None):


I think this is great for right now, and love that you have concrete use cases for this!

We may have to rethink this in a future version. Kafka is, by far, the most popular source, but the dream is to have other sources available as well.

I think this is great, just commenting for future!

Copilot

Pull Request Overview

This PR adds Kafka message metadata fields (offset, partition, topic) to the Message class and handlers to enable idempotent consumer patterns and deduplication logic. The changes expose Kafka-specific metadata that can be used to track processed messages and avoid reprocessing.

Extended Message class to include topic, partition, and offset metadata fields
Updated Kafka source to populate new metadata fields when creating Message objects
Modified pipeline and handlers to pass and utilize Kafka metadata for deduplication
Added example configuration demonstrating idempotent consumption patterns

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
sqlflow/sources/base.py	Extended Message class constructor and added getter methods for Kafka metadata
sqlflow/sources/kafka.py	Updated Kafka message creation to include topic, partition, and offset metadata
sqlflow/pipeline.py	Modified message processing to pass Kafka metadata to handlers
sqlflow/handlers.py	Enhanced InferredMemBatch handler to inject Kafka metadata into decoded objects
dev/config/examples/kafka.motherduck.idempotent.yml	Added example configuration for idempotent Kafka consumption with deduplication

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-04T10:31:53Z


 class Message:
-    def __init__(self, value: bytes):
+    def __init__(self, value: bytes, topic: str | None, partition: int | None, offset: int | None):


The constructor signature introduces a breaking change by making previously optional parameters required. Consider using keyword-only arguments or providing default values (None) to maintain backward compatibility with existing Message instantiations.

Suggested change

def __init__(self, value: bytes, topic: str | None, partition: int | None, offset: int | None):

def __init__(self, value: bytes, topic: str | None = None, partition: int | None = None, offset: int | None = None):

Yes! I need to figure out why CI didn't run on a fork! 👀

Copilot · 2025-09-04T10:31:54Z

+      SELECT
+          kafka_partition AS partition,
+          MAX(kafka_offset) AS "offset",
+          kafka_topic AS topic,


Missing comma after 'kafka_topic AS topic' in the SELECT statement. This will cause a SQL syntax error.

Suggested change

kafka_topic AS topic,

kafka_topic AS topic

ooo

https://duckdb.org/docs/stable/sql/dialect/friendly_sql.html

<3 I had no idea that duckdb does this!

refs #140 refs #141

* feat(Kafka): Enables kafka metadata refs #140 refs #141 * fixes tests

add motherduck idempotent config

8506368

turbolytics requested a review from Copilot September 4, 2025 10:16

turbolytics reviewed Sep 4, 2025

View reviewed changes

Copilot AI reviewed Sep 4, 2025

View reviewed changes

turbolytics merged commit 02fceb1 into turbolytics:main Sep 4, 2025

turbolytics added a commit that referenced this pull request Sep 4, 2025

feat(Kafka): Enables kafka metadata

5c28ad0

refs #140 refs #141

turbolytics mentioned this pull request Sep 4, 2025

feat(Kafka): Enables kafka metadata #143

Merged

turbolytics added a commit that referenced this pull request Sep 4, 2025

feat(Kafka): Enables kafka metadata (#143)

93bae40

* feat(Kafka): Enables kafka metadata refs #140 refs #141 * fixes tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kafka message metadata fields (offset, partition, topic) to batch processing for deduplication support#141

Add Kafka message metadata fields (offset, partition, topic) to batch processing for deduplication support#141
turbolytics merged 1 commit intoturbolytics:mainfrom
motherduckdb:main

selina415 commented Sep 3, 2025

Uh oh!

turbolytics Sep 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 4, 2025

Uh oh!

turbolytics Sep 4, 2025

Uh oh!

Copilot AI Sep 4, 2025

Uh oh!

turbolytics Sep 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def __init__(self, value: bytes, topic: str \| None, partition: int \| None, offset: int \| None):
	def __init__(self, value: bytes, topic: str \| None = None, partition: int \| None = None, offset: int \| None = None):

Conversation

selina415 commented Sep 3, 2025

Uh oh!

turbolytics Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

turbolytics Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

turbolytics Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

turbolytics Sep 4, 2025 •

edited

Loading