Skip to content

PARQUET-2355: Deprecate parquet-thrift#1175

Closed
Fokko wants to merge 7 commits intoapache:masterfrom
Fokko:fd-deprecate-thrift
Closed

PARQUET-2355: Deprecate parquet-thrift#1175
Fokko wants to merge 7 commits intoapache:masterfrom
Fokko:fd-deprecate-thrift

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Oct 18, 2023

Make sure you have checked all steps below.

parquet-thrift relies on Elephantbird: https://github.com/twitter/elephant-bird

The current version of elephant-bird that we use is still on Thrift 0.7.0, and this doesn't work with 0.19.0 that we're trying to upgrade to. Updating is non-trivial since a lot of the code that was used for testing has been made private: #1156

I would suggest removing this from the repository since it looks like no one is using it anymore:

apache-thrift does not seem to be used anymore on Maven Central (usages of 0 since 1.12.2):

image

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@Fokko Fokko force-pushed the fd-deprecate-thrift branch from e60d633 to bc38a73 Compare October 18, 2023 14:43
@Fokko Fokko force-pushed the fd-deprecate-thrift branch from a5b9f1c to 460edb3 Compare October 18, 2023 15:11
<exclude>org.apache.parquet.hadoop.ColumnChunkPageWriteStore</exclude>
<exclude>org.apache.parquet.hadoop.ParquetRecordWriter</exclude>
<!-- Already deprecated classes/methods/constants -->
<exclude>org.apache.parquet.hadoop.thrift.ThriftReadSupport#setProjectionPushdown(org.apache.hadoop.mapred.JobConf,java.lang.String)</exclude>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<exclude>org.apache.parquet.hadoop.ParquetRecordWriter</exclude>
<!-- Already deprecated classes/methods/constants -->
<exclude>org.apache.parquet.hadoop.thrift.ThriftReadSupport#setProjectionPushdown(org.apache.hadoop.mapred.JobConf,java.lang.String)</exclude>
<exclude>org.apache.parquet.hadoop.thrift.ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY</exclude>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<!-- Already deprecated classes/methods/constants -->
<exclude>org.apache.parquet.hadoop.thrift.ThriftReadSupport#setProjectionPushdown(org.apache.hadoop.mapred.JobConf,java.lang.String)</exclude>
<exclude>org.apache.parquet.hadoop.thrift.ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY</exclude>
<exclude>org.apache.parquet.thrift.projection.deprecated.DeprecatedFieldProjectionFilter</exclude>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pom.xml Outdated
Got this from:
org.apache.parquet.thrift.projection.FieldProjectionFilter[org.apache.parquet.thrift.projection.FieldProjectionFilter]:INTERFACE_REMOVED

But it didn't change, only the @Deprecated annotation was added
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is a bug, let's try to bump it: #1176

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the bump fixed it :)

@Fokko Fokko force-pushed the fd-deprecate-thrift branch 2 times, most recently from b67bf21 to 52472b3 Compare October 18, 2023 21:02
@Fokko Fokko force-pushed the fd-deprecate-thrift branch from 52472b3 to 857b84c Compare October 18, 2023 21:52
* This is used for parsing values assigned to ThriftReadSupport.THRIFT_COLUMN_FILTER_KEY
*/
@Deprecated
public class DeprecatedFieldProjectionFilter implements FieldProjectionFilter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes are marked as deprecated in 2015, and released in Parquet 1.8.0 :) Removing these was also suggested by @steveloughran in #979

* Deprecated. Use {@link #STRICT_THRIFT_COLUMN_FILTER_KEY}
* Accepts a ";" delimited list of globs in the syntax implemented by {@link DeprecatedFieldProjectionFilter}
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keeping them as-is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are marked as deprecated in 2015. I think removing these is a first step before removing the whole class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of my attempts to move up to later hadoop versions hit real problems here. and along with thrift in general -i had to grab a copy of an x86 MBP for my mac m1, but even there homebrew had decided to upgrade it, so it was a backup in a corner I ended up with. At which point I discovered that homebrew isn't a real package manager as you can't roll back...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also on a M1, and it takes quite a bit of effort to get Thrift compiled. I got a patch ready to update Thrift to the latest version. I'm not sure if we need to update parquet-format first, since the release is still on an older version. I'll give it a try in a moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that or stick the old version into the repo...

@Fokko
Copy link
Contributor Author

Fokko commented Oct 20, 2023

@gszadovszky
Copy link
Contributor

@Fokko,
If you want to deprecate the whole module I think, it is better to do something like we have done with parquet-tools and other before. See #851.
It might also worth a note on the dev list to raise awareness.

@Fokko Fokko force-pushed the fd-deprecate-thrift branch from a3961d4 to 09c3d41 Compare October 20, 2023 12:05
@SinghAsDev
Copy link
Contributor

Hi @Fokko thanks for bringing this up. We at Pinterest still use Parquet thrift quite extensively. How urgent is thrift 0.19.0 upgrade and how much time can we wait before we absolutely need to upgrade thrift? Also, since elephantbird is not maintained anymore, one option would be to just get rid of elephantbird dependency completely, even if it means moving some code in parquet-mr. Thoughts?

@SinghAsDev
Copy link
Contributor

@tlazaro, @ttim, @piyushnarang, @isnotinvain in case you all still use parquet-thrift, this discussion maybe of interest to you as well.

@tlazaro
Copy link

tlazaro commented Oct 21, 2023

Thanks so much for pinging, though I'm no longer at Twitter, I will try to find someone who might be interested in the conversation.

@Fokko
Copy link
Contributor Author

Fokko commented Nov 8, 2023

@SinghAsDev and @tlazaro I think we can make it work with the latest version of Elephantbird. Let me close this for now.

@Fokko Fokko closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants