Skip to content

Conversation

@guzzijones
Copy link
Contributor

orjson not serializing set from Yaql .toSet() function when publishing a variable.
Fixes issue #5625

@pull-request-size pull-request-size bot added the size/S PR that changes 10-29 lines. Very easy to review. label May 13, 2021
@arm4b arm4b added the bug label May 14, 2021
@Kami
Copy link
Member

Kami commented May 15, 2021

Thanks for the contribution.

Per #5265 (comment), this is a performance critical change and will need more work.

Since none of the existing Orquesta and other tests caught this issue it looks like this functionality is not that widely used so we should understand how much overhead it adds (if any) and then decide how to proceed.

If it turns out it adds non-trivial amount of overhead, we should consider some other approach instead of using default for every single orjson.dumps call (aka only fall back to orjson.dumps with defaults if regular call without default raises an exception or similar) - especially, because, afaik, orquesta workflows are only place where set may be valid type, in other places it's always just native JSON types which doesn't include sets.

Another alternative would be to modify this field DB field type class to only support sets where they are valid (for Orquesta context. But again, we need to first understand the impact. Maybe that won't be needed, or similar.

And to understand how much / if any overhead it adds, we need to add some micro-benchmarks at the very least.

There are already some examples you can use as a starting point. New micro benchmarks need to cover multiple scenarios so we can understand how it affects performance (small dict size, medium dict size, large dict with and having a set item in various places in the dict - e.g. deeply nested, top level attribute value, etc.).

class JSONDictFieldTestCase(unittest2.TestCase):
def test_set_to_mongo(self):
field = JSONDictField(use_header=False)
result = field.to_mongo({"test": {1, 2}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round trip test would also be good - aka to ensure that when we unserialize the value, we get a list back.

@guzzijones
Copy link
Contributor Author

guzzijones commented May 15, 2021 via email

@Kami
Copy link
Member

Kami commented May 15, 2021

Can add it here - https://github.com/StackStorm/st2/blob/master/st2common/benchmarks/micro/test_json_serialization_and_deserialization.py#L32

Also, not sure what you mean with zstandard - that code is not actually used in prod.

It was one of the proposed approaches, but in the end we decided to go with a simpler approach which doesn't include field level compression (since MongoDB server already handles compression on the server aka storage size).

@guzzijones
Copy link
Contributor Author

guzzijones commented May 15, 2021 via email

@pull-request-size pull-request-size bot added size/M PR that changes 30-99 lines. Good size to review. and removed size/S PR that changes 10-29 lines. Very easy to review. labels May 22, 2021
@Kami
Copy link
Member

Kami commented May 22, 2021

I've added a micro benchmark (fb2dce4) and results show that the overhead is indeed very small / negligible.

Having said that, since set is only valid in Orquesta workflow it still makes sense to only support it for Orquesta DB models.

@Kami Kami added the mongodb label May 22, 2021
@Kami Kami added this to the 3.5.0 milestone May 22, 2021
@Kami Kami linked an issue May 22, 2021 that may be closed by this pull request
@Kami
Copy link
Member

Kami commented May 22, 2021

I added a round-trip test case and will ago ahead and merge it into master as-is.

I added a comment to the code and if it turns out if indeed adds more overhead in some other scenarios micro benchmarks don't cover, we can change the code then to only use default for Workflow related models.

@Kami Kami merged commit c8eb15d into StackStorm:master May 22, 2021
@Kami
Copy link
Member

Kami commented May 22, 2021

Merged, thanks again for catching this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug mongodb size/M PR that changes 30-99 lines. Good size to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SET Type is not JSON serializable

3 participants