Utility method for length estimation of utf8#2661
Conversation
|
👍 |
|
can you pls have a short description in the PR about why it is needed? |
|
Before merging, it would be nice to be able to document this change more. If people are relying on the prior behavior to estimate data "size", then this change could make subsequent computations of that size to be different from prior versions. Support for the prior methodology should be retained (and potentially deprecated) unless tests are in place to ensure they return the same results. (optional) The static method would probably fit under java-util StringUtils better than GuavaUtils |
|
@himanshug to remove overhead of encoding? UTF8 makes byte[] which is 3x of string.length() and copies again to make final result. If it's just for estimation, we can skip that part. @drcrallen Yes, some UTs would be good and I also agree on |
|
Moved to java-util (metamx/java-util#45) |
|
this is possibly going to break the integration tests which make segmentMetadata queries, pls check. |
|
@himanshug confirmed result is not changed. testcase is included. |
There was a problem hiding this comment.
any reason we put this in GuavaUtils? It does not really seem to be related to Guava.
There was a problem hiding this comment.
@xvrl moved to StringUtils in java-util (metamx/java-util#45)
|
@navis I think it makes sense adding a string utility class to druid to make it easier to do those kinds of changes. A lot of String optimizations in that area are going to be druid specific anyway. |
|
@xvrl Could I understand that as this would be better in druid, not in java-util? |
|
@navis yes, I think it doesn't need to live in java-util since this is only used by Druid itself. |
3d1711a to
00dd9eb
Compare
|
@xvrl rebased and added test cases |
There was a problem hiding this comment.
Please rename to estimatedBinaryLengthAsUTF8 if this should only be used for estimation; will make things much clearer for future developers.
There was a problem hiding this comment.
missed that. consider it done.
|
👍 after the renaming comment. @navis out of curiosity do you use this "size" analysis type for anything? I had thought it was not really that useful and I assumed most people issuing segment metadata queries disable it. |
|
@gianm Yes, I don't think we can use it meaningfully but it can still make someone happy (UI guys want to show numbers whenever it's possible). By the way I'm thinking of using this to estimate the size of incremental index to be flushed automatically rather than row count. Ah, I've forgot I've already done that in #2459. You can check that if interested. |
00dd9eb to
e0cfd9e
Compare
Simple estimation on length of utf8 string