Orjson serialize set to list #5267

guzzijones · 2021-05-13T20:44:40Z

orjson not serializing set from Yaql .toSet() function when publishing a variable.
Fixes issue #5625

Kami · 2021-05-15T19:32:08Z

Thanks for the contribution.

Per #5265 (comment), this is a performance critical change and will need more work.

Since none of the existing Orquesta and other tests caught this issue it looks like this functionality is not that widely used so we should understand how much overhead it adds (if any) and then decide how to proceed.

If it turns out it adds non-trivial amount of overhead, we should consider some other approach instead of using default for every single orjson.dumps call (aka only fall back to orjson.dumps with defaults if regular call without default raises an exception or similar) - especially, because, afaik, orquesta workflows are only place where set may be valid type, in other places it's always just native JSON types which doesn't include sets.

Another alternative would be to modify this field DB field type class to only support sets where they are valid (for Orquesta context. But again, we need to first understand the impact. Maybe that won't be needed, or similar.

And to understand how much / if any overhead it adds, we need to add some micro-benchmarks at the very least.

There are already some examples you can use as a starting point. New micro benchmarks need to cover multiple scenarios so we can understand how it affects performance (small dict size, medium dict size, large dict with and having a set item in various places in the dict - e.g. deeply nested, top level attribute value, etc.).

Kami · 2021-05-15T19:34:13Z

st2common/tests/unit/test_db_fields.py

 class JSONDictFieldTestCase(unittest2.TestCase):
+    def test_set_to_mongo(self):
+        field = JSONDictField(use_header=False)
+        result = field.to_mongo({"test": {1, 2}})


Round trip test would also be good - aka to ensure that when we unserialize the value, we get a list back.

guzzijones · 2021-05-15T19:38:21Z

Pretty sure it only runs this code if it bumps into a set. Can you point me to the existing benchmark code? The major issue is fixed with your zstd compression. But alas I agree benchmarks will prove it.

…

On Sat, May 15, 2021, 3:34 PM Tomaz Muraus ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In st2common/tests/unit/test_db_fields.py <#5267 (comment)>: > @@ -73,6 +73,16 @@ class ModelWithJSONDictFieldDB(stormbase.StormFoundationDB): class JSONDictFieldTestCase(unittest2.TestCase): + def test_set_to_mongo(self): + field = JSONDictField(use_header=False) + result = field.to_mongo({"test": {1, 2}}) Round trip test would also be good - aka to ensure that when we unserialize the value, we get a list back. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5267 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZ5TIJC2NLD36UESNDXOXTTN3EEHANCNFSM443HAL5Q> .

Kami · 2021-05-15T19:43:18Z

Can add it here - https://github.com/StackStorm/st2/blob/master/st2common/benchmarks/micro/test_json_serialization_and_deserialization.py#L32

Also, not sure what you mean with zstandard - that code is not actually used in prod.

It was one of the proposed approaches, but in the end we decided to go with a simpler approach which doesn't include field level compression (since MongoDB server already handles compression on the server aka storage size).

guzzijones · 2021-05-15T19:45:41Z

The blob storage for running workflows. That fixed most of my problems

…

On Sat, May 15, 2021, 3:43 PM Tomaz Muraus ***@***.***> wrote: Can add it here - https://github.com/StackStorm/st2/blob/master/st2common/benchmarks/micro/test_json_serialization_and_deserialization.py#L32 Also, not sure what you mean with zstandard - that code is not actually used in prod. It was one of the proposed approaches, but in the end we decided to go with a simpler approach which doesn't include field level compression (since MongoDB server already handles compression on the server aka storage size). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZ5TIN2NSW5ATUJHUJLURTTN3FGHANCNFSM443HAL5Q> .

function for orjson.dumps.

Kami · 2021-05-22T19:08:31Z

I've added a micro benchmark (fb2dce4) and results show that the overhead is indeed very small / negligible.

Having said that, since set is only valid in Orquesta workflow it still makes sense to only support it for Orquesta DB models.

Kami · 2021-05-22T19:25:48Z

I added a round-trip test case and will ago ahead and merge it into master as-is.

I added a comment to the code and if it turns out if indeed adds more overhead in some other scenarios micro benchmarks don't cover, we can change the code then to only use default for Workflow related models.

Kami · 2021-05-22T19:56:31Z

Merged, thanks again for catching this.

guzzijones added 2 commits May 13, 2021 20:08

orjson handle set

7ff2b1f

unit test for set fix to to_mongo

b182108

pull-request-size bot added the size/S PR that changes 10-29 lines. Very easy to review. label May 13, 2021

guzzijones mentioned this pull request May 13, 2021

SET Type is not JSON serializable #5265

Closed

guzzijones added 2 commits May 14, 2021 13:07

black fixes

eb810ef

add no header test for serialize set

62f7e35

arm4b added the bug label May 14, 2021

Kami reviewed May 15, 2021

View reviewed changes

Kami added 2 commits May 22, 2021 11:49

Merge branch 'master' into set_fix

33a3af5

Add micro-benchmark which measures the overhead of using default

fb2dce4

function for orjson.dumps.

pull-request-size bot added size/M PR that changes 30-99 lines. Good size to review. and removed size/S PR that changes 10-29 lines. Very easy to review. labels May 22, 2021

Add a comment on why we need custom default function for orjson.dumps.

eec022e

Kami added the mongodb label May 22, 2021

Kami added this to the 3.5.0 milestone May 22, 2021

Kami linked an issue May 22, 2021 that may be closed by this pull request

SET Type is not JSON serializable #5265

Closed

Also add round-trip test.

bdbce41

Kami approved these changes May 22, 2021

View reviewed changes

Kami merged commit c8eb15d into StackStorm:master May 22, 2021

arm4b mentioned this pull request Aug 23, 2021

Add AJ (@guzzijones) to the TSC Maintainers #5340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Orjson serialize set to list #5267

Orjson serialize set to list #5267

Uh oh!

guzzijones commented May 13, 2021

Uh oh!

Kami commented May 15, 2021 •

edited

Loading

Uh oh!

Kami May 15, 2021

Uh oh!

guzzijones commented May 15, 2021 via email

Uh oh!

Kami commented May 15, 2021

Uh oh!

guzzijones commented May 15, 2021 via email •

edited

Loading

Uh oh!

Kami commented May 22, 2021

Uh oh!

Kami commented May 22, 2021

Uh oh!

Kami commented May 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Orjson serialize set to list #5267

Orjson serialize set to list #5267

Uh oh!

Conversation

guzzijones commented May 13, 2021

Uh oh!

Kami commented May 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kami May 15, 2021

Choose a reason for hiding this comment

Uh oh!

guzzijones commented May 15, 2021 via email

Uh oh!

Kami commented May 15, 2021

Uh oh!

guzzijones commented May 15, 2021 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kami commented May 22, 2021

Uh oh!

Kami commented May 22, 2021

Uh oh!

Kami commented May 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kami commented May 15, 2021 •

edited

Loading

guzzijones commented May 15, 2021 via email •

edited

Loading