Skip to content

mssql: FULL materialization strategy is extremely slow using MERGE INTO existing table ... ON 1 = 0 #4472

@martinburch

Description

@martinburch

Following SQLMesh's initial (fast) plan of a FULL model on the mssql adapter, subsequent SQLMesh runs, which rebuild the model completely, are very slow (hours or more).

This is caused by using the pattern MERGE INTO ... ON (1 = 0) existing table, like this

MERGE INTO [schema].[schema__table__1460139764] AS [__MERGE_TARGET__]
USING ( ... ) AS [__MERGE_SOURCE__]
ON (1 = 0)
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED THEN INSERT ( ... ) VALUES ( ... );

I can see this query is suspended in the MS SQL query engine with 100% CPU use for hours.

The materialisation strategy for FULL mssql models is not specified in the docs, but no other database uses MERGE for FULL.
https://sqlmesh.readthedocs.io/en/stable/concepts/models/model_kinds/?h=merge#materialization-strategy_2
I suggest using the Postgres method for mssql: DROP TABLE, CREATE TABLE, INSERT

How do I know the MERGE pattern is the problem?

When I terminate the stuck run process and then

TRUNCATE TABLE [schema].[schema__table__1460139764];

and re-run sqlmesh run then the run is as fast as the initial plan. So SQLMesh is still doing the MERGE INTO except now it's comparing 0 records.

The problem is one of squares: if I have a table of 2 million records, and I'm doing a full rebuild, I have 2 million new records which must be compared, and that's 4 trillion comparisons. Even if each comparison only took 1 millisecond, the comparisons would take over 100 years to complete!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions