-
Notifications
You must be signed in to change notification settings - Fork 658
Closed
Milestone
Description
When using UTF-8 encoding and insert a character where the unicode point is higher then u+FFFF. The Java ResultSet#getString() method return for this characters the same incorrect character. The Byte Representation seems to be correct.
I guess that that bug need to be in the C++ JNI Implementation https://github.com/xerial/sqlite-jdbc/blob/master/src/main/java/org/sqlite/core/NativeDB.c#L503
How to reproduce
- Linux x64
- org.xerial:sqlite-jdbc:3.8.11.2
- java version 1.8.0_45 - Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
# HINTS:
# - LANG=en_US.UTF-8
echo "CREATE TABLE TEST (id INTEGER PRIMARY KEY, name CHARSET);" | sqlite3 /tmp/sqlite.db -batch
echo "PRAGMA encoding = \"UTF-8\";" | sqlite3 /tmp/sqlite.db -batch
# LATIN CAPITAL LETTER A
# http://unicode-table.com/de/0041/
# http://www.fileformat.info/info/unicode/char/0041/index.htm
echo -e "INSERT INTO TEST (name) VALUES ('\x41');" | sqlite3 /tmp/sqlite.db -batch
# Miao Letter Archaic Ma
# http://unicode-table.com/de/16F06/
# http://www.fileformat.info/info/unicode/char/16F06/index.htm
echo -e "INSERT INTO TEST (name) VALUES ('\xF0\x96\xBC\x86');" | sqlite3 /tmp/sqlite.db -batchProperties properties = new Properties();
properties.setProperty("characterEncoding", "UTF-8");
properties.setProperty("encoding", "\"UTF-8\"");
try (Connection connection = DriverManager.getConnection("jdbc:sqlite:/tmp/sqlite.db", properties)){
Statement statement = connection.createStatement();
ResultSet rs = statement.executeQuery("select name from TEST");
while (rs.next()) {
byte[] b = rs.getBytes("name");
System.out.println("VALUE A = " + new String(b, java.nio.charset.StandardCharsets.UTF_8));
String value = rs.getString("name");
System.out.println("VALUE B = " + value);
}
}Hints
Maybe it's a CESU-8 (Modified UTF-8) vs UTF-8 encoding issue. According the JNI specification:
...Characters with code points above U+FFFF (so-called supplementary characters) are represented
by separately encoding the two surrogate code units of their UTF-16 representation...
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/types.html#modified_utf_8_strings
http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542
Ideas
- Probably the C++ JNI Code need to verify, if the database column value is in CESU-8 or UTF-8 encoded. And according that information it need to be converterd to a Java String (UTF-16) Object.
- use C++ sqlite3_column_text16 or sqlite3_column_bytes16 function and NewString to create a Java String Object
- Use PRAGMA encoding setting to decode VARCHAR Fields to Java String.
FYI
I reported that bug already on https://bitbucket.org/xerial/sqlite-jdbc/issues/200
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels