Skip to content

Utility method for length estimation of utf8#45

Closed
navis wants to merge 3 commits intometamx:masterfrom
navis:utf8-estimated-length
Closed

Utility method for length estimation of utf8#45
navis wants to merge 3 commits intometamx:masterfrom
navis:utf8-estimated-length

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Mar 16, 2016

}
}

// should be used only for estimation.. does not check validity of format
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add javadoc that this is usually equivalent to StringUtils.toUtf8(value).length?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would also be good to document in the cases in which it is not equal to StringUtils.toUtf8(value).length. Does it tend to under-estimate or does it tend to over-estimate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the same but it does not validate whether the string has valid UTF format. For invalid chars, this method over-estimates the length(length of invalid char = 1 for encoding).

@drcrallen
Copy link
Copy Markdown
Contributor

@navis there's a lot of fixes in here, can you either split by commit or split them into different PRs?

Assert.assertEquals(StringUtils.toUtf8(string).length, StringUtils.binaryLengthAsUTF8(string));
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add a tests where the length are not equal to clarify that this is the expected behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'll add invalid char cases.

@navis navis force-pushed the utf8-estimated-length branch from 72c6e67 to 64445d0 Compare March 17, 2016 01:44
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Apr 7, 2016

It's merged into druid

@navis navis closed this Apr 7, 2016
@navis navis deleted the utf8-estimated-length branch April 7, 2016 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants