Skip to content

chore: refactor to remove merge mixin#5380

Merged
eakmanrq merged 1 commit intomainfrom
eakmanrq/chore_remove_merge_mixin
Sep 15, 2025
Merged

chore: refactor to remove merge mixin#5380
eakmanrq merged 1 commit intomainfrom
eakmanrq/chore_remove_merge_mixin

Conversation

@eakmanrq
Copy link
Collaborator

Deprecating InsertOverwriteWithMergeMixin and instead rely solely on INSERT_OVERWRITE_STRATEGY. Basically before we had two ways of doing the same thing so this consolidates into a single way.

Comment on lines -345 to -382
def test_insert_overwrite_by_time_partition_replace_where_pandas(
make_mocked_engine_adapter: t.Callable, mocker: MockerFixture, make_temp_table_name: t.Callable
):
mocker.patch(
"sqlmesh.core.engine_adapter.mssql.MSSQLEngineAdapter.table_exists",
return_value=False,
)

adapter = make_mocked_engine_adapter(MSSQLEngineAdapter)
adapter.INSERT_OVERWRITE_STRATEGY = InsertOverwriteStrategy.REPLACE_WHERE

temp_table_mock = mocker.patch("sqlmesh.core.engine_adapter.EngineAdapter._get_temp_table")
table_name = "test_table"
temp_table_id = "abcdefgh"
temp_table_mock.return_value = make_temp_table_name(table_name, temp_table_id)

df = pd.DataFrame({"a": [1, 2], "ds": ["2022-01-01", "2022-01-02"]})
adapter.insert_overwrite_by_time_partition(
table_name,
df,
start="2022-01-01",
end="2022-01-02",
time_formatter=lambda x, _: exp.Literal.string(to_ds(x)),
time_column="ds",
target_columns_to_types={
"a": exp.DataType.build("INT"),
"ds": exp.DataType.build("STRING"),
},
)
adapter._connection_pool.get().bulk_copy.assert_called_with(
f"__temp_test_table_{temp_table_id}", [(1, "2022-01-01"), (2, "2022-01-02")]
)

assert to_sql_calls(adapter) == [
f"""IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = '__temp_test_table_{temp_table_id}') EXEC('CREATE TABLE [__temp_test_table_{temp_table_id}] ([a] INTEGER, [ds] VARCHAR(MAX))');""",
f"""MERGE INTO [test_table] AS [__MERGE_TARGET__] USING (SELECT [a] AS [a], [ds] AS [ds] FROM (SELECT CAST([a] AS INTEGER) AS [a], CAST([ds] AS VARCHAR(MAX)) AS [ds] FROM [__temp_test_table_{temp_table_id}]) AS [_subquery] WHERE [ds] BETWEEN '2022-01-01' AND '2022-01-02') AS [__MERGE_SOURCE__] ON (1 = 0) WHEN NOT MATCHED BY SOURCE AND [ds] BETWEEN '2022-01-01' AND '2022-01-02' THEN DELETE WHEN NOT MATCHED THEN INSERT ([a], [ds]) VALUES ([a], [ds]);""",
f"DROP TABLE IF EXISTS [__temp_test_table_{temp_table_id}];",
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test didn't make sense. It was changing the MSSQL engine adapter to use REPLACE_WHERE and then showing that it did nothing. It would now do something but that is expected. I think it will likely a copy/paste mistake from other tests without understanding the intent.

@eakmanrq eakmanrq force-pushed the eakmanrq/chore_remove_merge_mixin branch from f3c9410 to 9fd4e8e Compare September 15, 2025 19:50
@eakmanrq eakmanrq requested a review from erindru September 15, 2025 20:47
Copy link
Collaborator

@erindru erindru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was implemented separately to begin with because it's pretty much only MSSQL that supports WHEN NOT MATCHED [BY SOURCE] that makes this actually work in practice.

But consolidating it does simplify things somewhat

@eakmanrq eakmanrq merged commit 2462298 into main Sep 15, 2025
36 checks passed
@eakmanrq eakmanrq deleted the eakmanrq/chore_remove_merge_mixin branch September 15, 2025 22:45
@MarcusRisanger
Copy link

MarcusRisanger commented Sep 18, 2025

Hey @eakmanrq ! I believe this change is causing my plan to attempt creation of tables twice - which isn't a huge deal, but it means that my macros that create indexes on the tables fire off twice and cause failures (since .. that index already exists on the table!)

Not entirely sure what causes it, but it seems in my project that all queries without dependencies to partitioned queries are run first, and then a second "executing model batches" runs with the partition models and downstream dependencies. If I recall correctly this has been the behavior for a good while. However (starting with 0.219.0) some tables are attempted created again in this second batch. This causes my macro to fire off (evaluation stage "create"), even though the SQL didn't execute due to the IF EXISTS clause.

So I get log outputs like this:

Log row 245:

2025-09-18 19:30:02,556 - ThreadPoolExecutor-3_2 - sqlmesh.core.engine_adapter.base - INFO - Executing SQL: /* SQLMESH_PLAN: 8152d1322f594e14ac1e48f1a3822cf0 */ IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'edm__stg_bha_component__2951525291' AND TABLE_SCHEMA = 'sqlmesh__edm') EXEC('SELECT * INTO [sqlmesh__edm].[edm__stg_bha_component__2951525291] ....

Log row 827:

2025-09-18 19:30:24,707 - ThreadPoolExecutor-5_0 - sqlmesh.core.engine_adapter.base - INFO - Executing SQL: /* SQLMESH_PLAN: 8152d1322f594e14ac1e48f1a3822cf0 */ IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'edm__stg_bha_component__2951525291' AND TABLE_SCHEMA = 'sqlmesh__edm') EXEC('CREATE TABLE [sqlmesh__edm].[edm__stg_bha_component__2951525291] .... 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants