Add avro_ocf to supported Kafka/Kinesis InputFormats#11865
Add avro_ocf to supported Kafka/Kinesis InputFormats#11865a2l007 merged 6 commits intoapache:masterfrom
Conversation
|
@jacobtolar LGTM. Could you please resolve the conflicts? |
|
Ah, looks like a later PR (#11912) entirely reworked the Kafka ingestion docs. |
|
I'm a bit curious, Avro OCF is a file format, is it common to put these files in streaming ingest messages? There is no technical reason this wouldn't work if the files were small enough to fit in the messages since it is all just binary blobs in the end, but was mostly wondering if this is a common use case compared to the streaming oriented avro formats we support (inline schema, multi-inline-schema, schema repo, schema registry). |
|
I don't know that it's a common use case...but we have some scenarios where we do this. There's obviously some overhead to providing the schema in every message (cost is amortized somewhat by providing many records in a single Kafka message), but it's nice not needing to have an extra component (schema registry). The avro_ocf support works right now by writing every message to a file on localhost...which isn't ideal for streaming in one 'file' per message (but technically works, if your disks are fast enough or your data volume is low enough 🙃). When I get some time I plan to submit a PR so you can configure that to happen in memory which should make it more usable. |
… (Revert changes from apache#11865) (apache#16807)
Description
Update docs to add
avro_ocfto list of supported input formats for Kafka/Kinesis. Also, updated Kinesis docs to more closely match Kafka (importing some of the changes from this PR: https://github.com/apache/druid/pull/11624/files).The
avro_ocfinput format was added here: #9671This PR has:
avro_ocf, so I know that's working as documented here. I haven't tested with Kinesis but have no reason to believe it would not also work.