Skip to content

Conversation

@dstandish
Copy link
Contributor

@dstandish dstandish commented Sep 27, 2022

We can make the serializer handle arbitrary custom objects provided they implement methods airflow_serialize and airflow_deserialize.

Why?

Well..... one potential use is it could provide flexibility to handle user-defined objects through XCom.

Additionally, it could be desirable to manage serialization protocol in the objects themselves rather than always having to update the base serialization class.

We can make the serializer handle arbitrary custom objects provided they implement methods `airflow_serialize` and `airflow_deserialize`.
@dstandish dstandish marked this pull request as draft September 27, 2022 22:50
@ashb
Copy link
Member

ashb commented Sep 27, 2022

One thought: We need to be careful we don't open up artibrarty object inflation vulnerabilities this way.

(There were security problems in Rails where you could give it some session data and it would treat it as YAML, and due to oddness in YAML spec, end up creating arbitrary ruby objects which was used to pop reverse shells on Rails installs.)

@uranusjr
Copy link
Member

Yes, preventing security issues would be the biggest thing. Currently we do that by requiring custom types to be registered via a plugin (e.g. timetables), but if we were to do that for any custom types, it may be easier to use a custom serialiser pattern instead, similar to how json.dumps handles this. A plugin can provide a set of serislise/deserialise hooks that would be called for any unknown object is encountered by the (de)serialiser.

@dstandish
Copy link
Contributor Author

dstandish commented Sep 28, 2022

but if we were to do that for any custom types, it may be easier to use a custom serialiser pattern instead, similar to how json.dumps handles this. A plugin can provide a set of serislise/deserialise hooks that would be called for any unknown object is encountered by the (de)serialiser.

can you add more detail? i'm interested in what you're talking about but don't follow.

separately, concerning security risks... perhaps we need to be specific about the context. suppose we allow custom serialization in the xcom context, not in the base serialization code which is used in many places. if someone wanted to do something malicious, and they had the ability to write a task that sent this malicious object through xcom, why would they need to bother sending it through xcom -- they could do whatever malicious work they wanted in the task itself? we're not talking about e.g. taking user input strings from the web UI for example... and if it's just in the task execution context, it's not run in the scheduler or webserver.

@uranusjr
Copy link
Member

Something like this

class MyAirflowPlugin(AirflowPlugin):
    custom_serde = MyCustomSerialization()

class MyCustomSerialization:
    def serialize(self, obj):
        if isinstance(obj, MyObj):
            return "MyObj", {"a": obj.a, "b": obj.get_b()}

    def deserialize(self, type_, data):
        if type_ == "MyObj":
            return MyObj(a=data["a"], ...)

When the serializer encounters something it doesn’t recognise, it’d call the custom serialiser. Similarly, if the serialised data contains a __type that’s not in the known list, it calls the custom deserialiser to see if something can be produced.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 13, 2022
@github-actions github-actions bot closed this Nov 18, 2022
@ashb
Copy link
Member

ashb commented Nov 18, 2022

This is handled for some use cases by #27540 I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants