Utility method for length estimation of utf8#45
Conversation
| } | ||
| } | ||
|
|
||
| // should be used only for estimation.. does not check validity of format |
There was a problem hiding this comment.
Can you add javadoc that this is usually equivalent to StringUtils.toUtf8(value).length?
There was a problem hiding this comment.
it would also be good to document in the cases in which it is not equal to StringUtils.toUtf8(value).length. Does it tend to under-estimate or does it tend to over-estimate?
There was a problem hiding this comment.
Basically the same but it does not validate whether the string has valid UTF format. For invalid chars, this method over-estimates the length(length of invalid char = 1 for encoding).
|
@navis there's a lot of fixes in here, can you either split by commit or split them into different PRs? |
| Assert.assertEquals(StringUtils.toUtf8(string).length, StringUtils.binaryLengthAsUTF8(string)); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
let's also add a tests where the length are not equal to clarify that this is the expected behavior
There was a problem hiding this comment.
ok. I'll add invalid char cases.
72c6e67 to
64445d0
Compare
|
It's merged into druid |
From apache/druid#2661