Skip to content

Utility method for length estimation of utf8#2661

Merged
fjy merged 1 commit intoapache:masterfrom
navis:utf8-estimated-length
Mar 31, 2016
Merged

Utility method for length estimation of utf8#2661
fjy merged 1 commit intoapache:masterfrom
navis:utf8-estimated-length

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Mar 15, 2016

Simple estimation on length of utf8 string

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 15, 2016

👍

@himanshug
Copy link
Copy Markdown
Contributor

can you pls have a short description in the PR about why it is needed?

@drcrallen
Copy link
Copy Markdown
Contributor

Before merging, it would be nice to be able to document this change more.

If people are relying on the prior behavior to estimate data "size", then this change could make subsequent computations of that size to be different from prior versions.

Support for the prior methodology should be retained (and potentially deprecated) unless tests are in place to ensure they return the same results.

(optional) The static method would probably fit under java-util StringUtils better than GuavaUtils

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 15, 2016

@himanshug to remove overhead of encoding? UTF8 makes byte[] which is 3x of string.length() and copies again to make final result. If it's just for estimation, we can skip that part.

@drcrallen Yes, some UTs would be good and I also agree on StringUtils is right place to be. Really really wish someone move api and utils included in druid itself.

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 16, 2016

Moved to java-util (metamx/java-util#45)

@himanshug
Copy link
Copy Markdown
Contributor

this is possibly going to break the integration tests which make segmentMetadata queries, pls check.

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 16, 2016

@himanshug confirmed result is not changed. testcase is included.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we put this in GuavaUtils? It does not really seem to be related to Guava.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl moved to StringUtils in java-util (metamx/java-util#45)

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 16, 2016

@navis I think it makes sense adding a string utility class to druid to make it easier to do those kinds of changes. A lot of String optimizations in that area are going to be druid specific anyway.

@xvrl xvrl added this to the 0.9.1 milestone Mar 16, 2016
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 22, 2016

@xvrl Could I understand that as this would be better in druid, not in java-util?

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 25, 2016

@navis yes, I think it doesn't need to live in java-util since this is only used by Druid itself.

@navis navis force-pushed the utf8-estimated-length branch from 3d1711a to 00dd9eb Compare March 30, 2016 07:26
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 30, 2016

@xvrl rebased and added test cases

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to estimatedBinaryLengthAsUTF8 if this should only be used for estimation; will make things much clearer for future developers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that. consider it done.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 30, 2016

👍 after the renaming comment.

@navis out of curiosity do you use this "size" analysis type for anything? I had thought it was not really that useful and I assumed most people issuing segment metadata queries disable it.

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Mar 31, 2016

@gianm Yes, I don't think we can use it meaningfully but it can still make someone happy (UI guys want to show numbers whenever it's possible). By the way I'm thinking of using this to estimate the size of incremental index to be flushed automatically rather than row count. Ah, I've forgot I've already done that in #2459. You can check that if interested.

@navis navis force-pushed the utf8-estimated-length branch from 00dd9eb to e0cfd9e Compare March 31, 2016 01:07
@fjy fjy merged commit 3d68da9 into apache:master Mar 31, 2016
@navis navis deleted the utf8-estimated-length branch March 31, 2016 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants