Skip to content

Avoid MV recreation in prod: implement instantaneous version switching #5365

@xardasos

Description

@xardasos

Consider the following DAG: A (materialized view) <- B (materialized view).

Suppose these models are deployed in prod and got assigned physical tables A__1 and B__1, respectively. If we make a Non-Breaking change to A, a new table A__2 will be assigned to it. Additionally, view B must be recreated (otherwise it would continue pointing at the old A__1 table). This means that B__1 gets deleted and then created again, this time pointing at A__2.

The problem with this solution is that the data from B is not available in prod during the recreation time (which can be very long in the case of bigger MVs). Moreover, rollbacks also require another change in the physical layer.

A possible improvement would be to instead create a new version of the physical table B__2 first and then do an almost instantaneous switch in the virtual layer, avoiding the long downtime.

This solution requires changing the way snapshot versions (e.g. __1 and __2) are calculated. We need to calculate the MV version based on both parent data hash and data hash, similarly to the current dev versions.

Perhaps this new way of calculating the versions of MVs could be optional/configurable?

Feel free to share your thoughts and objections - I'd love to hear your perspective.
/cc @izeigerman

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions