Skip to content

Features: ClosableCharSequence and support for custom charset decoder + 2 fixes#9

Open
akhomchenko wants to merge 8 commits intofge:masterfrom
akhomchenko:feature/custom-charset-decoder
Open

Features: ClosableCharSequence and support for custom charset decoder + 2 fixes#9
akhomchenko wants to merge 8 commits intofge:masterfrom
akhomchenko:feature/custom-charset-decoder

Conversation

@akhomchenko
Copy link
Copy Markdown

@akhomchenko akhomchenko commented Apr 26, 2021

Evening.

I used you library and it works great. I found it lacking 2 features and I decided to contribute them back. Also I found 2 issues that I wrote patches for and decided to contribute them back as well. I can split this into multiple PRs if you want to include only parts of it.

Feature 1: ClosableCharSequence

This change enabled me to hide implementation detail and back reading different sources behind single interface. It is breaking as I have replaced Closable, CharSequence with ClosableCharSequence. I am willing to work on a better strategy for this one.

Feature 2: support for custom charset decoder

This change enabled me to handle possible decoding errors better and provide similar behavior for different file-reading implementations. I am not an expert in encodings so please double-check.

Concurrent issues

Discovered both issues at the time I was running tests multiple times while working on "support for custom charset decoder" feature.

Notes

Change includes:
* https instead of http for spring repo as http endpoint is deprecated
* move properties after plugins as otherwise some properties are not resolved
* replace '<<' with `doLast` as '<<' is deprecated since Gradle 5.0
* remove osgi plugin as it is deprecated and I am not sure it does something
  useful
* remove wrapper task as updates can be done via built-in gradle
  wrapper task
* replace broken javadoc links with working
ClosableCharSequence maintains same properties as Closable & CharSequence
and is more convenient for cases when not only LargeText is used.
For example: for files < 10Mb code can just read to CharSequence while
LargeText is used for anything above threshold.
Default (REPORT) action of CharsetDecoder is not always desired. With new API
supplier of CharsetDecoder can be provided. This allows to customize
CharsetDecoder to user needs.

TextDecoder#nextRange was update to account for different CodingErrorActions.
endOfInput for decode method is now determined dynamically. This allows to
get access to UNDERFLOW CoderResult and avoid replacement of 1+ byte
character that happened to be at the upper bound of range.

NotThreadSafeLargeTextTest was updated to test all supported
CodingErrorActions.

Strings.repeat in tests were replaced with RandomStringUtils from
apache-commons lang3. RandomStringUtils generate string with characters of
various byte length (1-4 for UTF8) that is better for randomized testing
than fixed 1 byte string that was used before.

Also:

* updated guava to 30.1-android (guava versions 21+ require jdk8, android
  flavor is compatible with jdk7, see:
  https://github.com/google/guava/wiki/Compatibility#older-jdks)
* updated assertj-core to latest 2.x version (3.x is not compatible with jdk8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant