-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add Apache Kafka integration #21767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apache Kafka integration #21767
Conversation
a4cfc56 to
a1bb546
Compare
d3e3e16 to
e86188b
Compare
618f1c6 to
7818cea
Compare
eladkal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add simple example dag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not an Airflow hook.. I think it would be best to use another name to avoid confusion?
Also this class is not covered with unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test this function?
airflow/ui/src/views/Docs.tsx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember we ever edited a UI file when adding a provider?
cc @bbovenzi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would ignore what's in /ui for now. It needs a refresh post 2.3.
|
Hi @eladkal. Thank you for the review There are implemented Hooks for PubSub and Kinesis, so I decided to create integrations for Kafka and Pulsar. It will be very convenient if they will be unified. |
7818cea to
ce7544f
Compare
ce7544f to
0a97314
Compare
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
There are a few high-level questions for this integration.
1. How to implement this hook?
The first way is similar to PubSubHook when all functionality is in the hook. Like
PubSubHook.publish().The second way is similar to FirehoseHook when the hook is just a wrapper of the
botoclient which is used to interact with Kinesis.Talking about the Apache Pulsar (#21618), it has a good client that manages and reuses sessions for producers and consumers. That's why for Pulsar second way is better.
But for Kafka, there are a few separate classes for producer and consumer.
2. What python package to use?
The kafka-python is more popular, but there are no new commits last 2 years.
On the other hand, confluent-kafka-python is actively developing but
less popular and developing by Confluent company.