Skip to content

Added serialization of the trie into a memory buffer.#12

Merged
thep merged 1 commit intotlwg:masterfrom
File2Package:fileless
Jan 23, 2021
Merged

Added serialization of the trie into a memory buffer.#12
thep merged 1 commit intotlwg:masterfrom
File2Package:fileless

Conversation

@KOLANICH
Copy link
Copy Markdown
Contributor

No description provided.

@KOLANICH
Copy link
Copy Markdown
Contributor Author

KOLANICH commented Jan 19, 2020

@thep, @tlwg

Copy link
Copy Markdown
Contributor

@thep thep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbols in libdatrie.map are versioned. Instead of inserting new symbols to old version block, please create a new version block (DATRIE_0.2.13), deriving from DATRIE_0.2.7, and add new symbols there. For libdatrie.def, simple insertion is fine.

Copy link
Copy Markdown
Contributor

@thep thep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer function naming to start with target object type, so they look like OOP object methods. Please rename get_trie_serialized_size(). get_alpha_map_serialized_size(), get_tail_serialized_size() to trie_get_serialized_size(), alpha_map_get_serialized_size(), tail_get_serialized_size() respectively.

With this, please also update documentation for trie_get_serialized_size(), as it's not for file creation.

Comment thread datrie/tail.c Outdated
Comment thread datrie/tail.c Outdated
Comment thread datrie/tail.c
Comment thread datrie/tail.c
Comment thread tests/test_serialization.c Outdated
@KOLANICH KOLANICH force-pushed the fileless branch 3 times, most recently from 3c8d12f to 4d01b86 Compare January 19, 2021 11:17
@KOLANICH KOLANICH requested a review from thep January 19, 2021 11:23
@thep
Copy link
Copy Markdown
Contributor

thep commented Jan 20, 2021

Sorry for some comments that might introduce some confusion. TrieChar is supposed to be the type of raw characters used in trie node transitions, as opposed to AlphaChar for the Unicode characters in the real world. TrieChar happens to be unsigned char in the current code, but it's not guaranteed to always be so. For example, there is a request in Issue #1 to increase TrieChar size.

In low-level code like serialization, I think we need to be strict on bytes and use unsigned char where byte is actually meant. So, please revert the TrieChar replacements and use the explicit unsigned char instead. You can assume TrieChar to be of one-byte size for now. We can fix this later if TrieChar enhancement actually happens. Sorry for the confusion.

@KOLANICH
Copy link
Copy Markdown
Contributor Author

I think we need to be strict on bytes and use unsigned char

unsigned char is not guaranteed to be a byte. uint8_t is.

@thep
Copy link
Copy Markdown
Contributor

thep commented Jan 21, 2021

I think we need to be strict on bytes and use unsigned char

unsigned char is not guaranteed to be a byte. uint8_t is.

Yeah, we can do so to the whole project once we decide to move to C99. However, this reminds me of the similar typedefs in datrie/typedefs.h. We can use uint8 for this purpose, then.

@KOLANICH KOLANICH force-pushed the fileless branch 5 times, most recently from bf760ef to fa19e1a Compare January 23, 2021 06:45
@thep
Copy link
Copy Markdown
Contributor

thep commented Jan 23, 2021

Please also add this change set to tests/Makefile.am, to get the test run.

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 1c32dc2..41b55cb 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -7,6 +7,7 @@ TESTS = \
 	test_iterator \
 	test_store-retrieve \
 	test_file \
+	test_serialization \
 	test_nonalpha \
 	test_null_trie \
 	test_term_state \
@@ -19,6 +20,7 @@ check_PROGRAMS = \
 	test_iterator \
 	test_store-retrieve \
 	test_file \
+	test_serialization \
 	test_nonalpha \
 	test_null_trie \
 	test_term_state \
@@ -54,6 +56,12 @@ test_file_SOURCES = \
 	$(NULL)
 test_file_LDADD = $(top_builddir)/datrie/libdatrie.la
 
+test_serialization_SOURCES = \
+	test_serialization.c \
+	utils.c \
+	$(NULL)
+test_serialization_LDADD = $(top_builddir)/datrie/libdatrie.la
+
 test_nonalpha_SOURCES = \
 	test_nonalpha.c \
 	utils.c \

Co-Authored-By: Theppitak Karoonboonyanan <theppitak@gmail.com>
@thep thep merged commit 11f28b7 into tlwg:master Jan 23, 2021
@thep
Copy link
Copy Markdown
Contributor

thep commented Jan 23, 2021

Merged. Thank you very much for your contribution!

@KOLANICH KOLANICH deleted the fileless branch January 23, 2021 19:05
@KOLANICH
Copy link
Copy Markdown
Contributor Author

And thank you for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants