Skip to content

Add thrift input format#11360

Closed
bananaaggle wants to merge 7 commits intoapache:masterfrom
bananaaggle:add_thrift_input_format
Closed

Add thrift input format#11360
bananaaggle wants to merge 7 commits intoapache:masterfrom
bananaaggle:add_thrift_input_format

Conversation

@bananaaggle
Copy link
Copy Markdown
Contributor

Because of deprecated of parseSpec, I develop ThriftInputFormat for new interface, which supports stream ingestion for data encoded by Thrift.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@bananaaggle
Copy link
Copy Markdown
Contributor Author

Hi, @clintropolis! I think thrift is the last extension which use parser. When this inputformat finished, we can remove parser's implementations from code and fix all document about it. Do you think we should open an issue for it?

@bananaaggle bananaaggle mentioned this pull request Aug 6, 2021
9 tasks
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very sorry for the delayed review, the changes overall lgtm 👍

|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract nested values from a Parquet file. Note that only 'path' expression are supported ('jq' is unavailable).| no (default will auto-discover 'root' level properties) |
| binaryAsString | Boolean | Specifies if the bytes parquet column which is not logically marked as a string or enum type should be treated as a UTF-8 encoded string. | no (default = false) |

### Thrift Stream
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not sure we have any other 'contrib' extensions described in this section, it might be best if this lives in https://github.com/apache/druid/blob/master/docs/development/extensions-contrib/thrift.md for now. On the other hand, thrift i think is the only data format that isn't a core extension (maybe in the future we should just consider adding integration tests and making it a core extension?), so maybe it is ok to be here. @techdocsmith do you have any thoughts?

Also, looking closer at the code, I guess this might also work with batch ingestion too since the deserializer detects the format based on the bytes given to it, though I haven't personally used this extension or tested this scenario. I'll see if I can find some time to pull your branch and test it out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @clintropolis suggestion to keep the doc in /docs/development/extensions-contrib/thrift.md until the extension is made core.

<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.10.2</version>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think versions on a lot of these should be already defined in the top level pom (the dependency checker in travis sometimes suggests more than is necessary to fix the issue)

@stale
Copy link
Copy Markdown

stale Bot commented Apr 27, 2022

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Apr 27, 2022
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

@bananaaggle - can you resolve these conflicts? it will be great to get rid of the dependency on InputRowParser completely.

@stale
Copy link
Copy Markdown

stale Bot commented Jun 24, 2022

This issue is no longer marked as stale.

@stale stale Bot removed the stale label Jun 24, 2022
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 7, 2023

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Nov 7, 2023
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 5, 2023

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants