Skip to content

Feat: Allow some control of table naming at the physical layer#4982

Merged
erindru merged 4 commits intomainfrom
erin/physical-table-naming-convention
Jul 27, 2025
Merged

Feat: Allow some control of table naming at the physical layer#4982
erindru merged 4 commits intomainfrom
erin/physical-table-naming-convention

Conversation

@erindru
Copy link
Collaborator

@erindru erindru commented Jul 17, 2025

Fixes #4403

Up until now, there has been no control over how SQLMesh generates table names for the physical layer. This presented some issues on engines like Postgres that by default have very short identifier length limits that get triggered unexpectedly because SQLMesh generates table names that contain some extra metadata alongside the model name.

This PR introduces a project-level property called physical_table_naming_convention that has the following values:

  • schema_and_table - include both the schema name and table name from the model in the snapshot table name (current behaviour; default)
  • table_only - include the table name only from the model in the snapshot table name with the assumption that the table is being created within a physical schema that already contains the schema name from the model
  • hash_md5 - produce snapshot table names based on a MD5 hash of what we would have called the table if we were using schema_and_table. This leads to predictable identifier lengths at the expense of readability

warehouse.sqlmesh__finance_mart.transaction_events_over_threshold__<fingerprint>
```

Notice that the model schema name is no longer part of the physical table name. This allows for slightly longer model names on engines with low identifier length limits, which may be useful for your project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the docs right now doesn't make it obvious why the schema is included in the physical table's name by default. Should we include an example to explain the rationale behind that choice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know what the rationale behind the original choice was.

The only thing I can think of is to allow someone to set:

physical_schema_mapping:
  '.*': some_schema

Which would map every model to the same physical schema regardless of the model schema. In this situation, including the model schema within the model's table name would be helpful to disambiguate foo.model_a and bar.model_a if they would both be written as some_schema.model_a at the physical layer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe the default exists because it was the lowest common denominator that worked across any rendition of physical_schema_mapping.

I've updated the docs

@georgesittas
Copy link
Contributor

Did a quick first pass– the implementation seems reasonable.

@erindru erindru force-pushed the erin/physical-table-naming-convention branch from ce93b54 to 911bcfc Compare July 23, 2025 21:06
@erindru erindru marked this pull request as ready for review July 23, 2025 21:54
| `environment_suffix_target` | Whether SQLMesh views should append their environment name to the `schema` or `table` - [additional details](../guides/configuration.md#view-schema-override). (Default: `schema`) | string | N |
| `gateway_managed_virtual_layer` | Whether SQLMesh views of the virtual layer will be created by the default gateway or model specified gateways - [additional details](../guides/multi_engine.md#gateway-managed-virtual-layer). (Default: False) | boolean | N |
| `infer_python_dependencies` | Whether SQLMesh will statically analyze Python code to automatically infer Python package requirements. (Default: True) | boolean | N |
| `physical_table_naming_convention`| Sets which parts of the model name are included in the physical table names. Options are `schema_and_table` or `table_only` - [additional details](../guides/configuration.md#physical-table-naming-convention). (Default: `schema_and_table`) | string | N |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The section you're adding this to is called Environment. I don't think this is the right place for it, or we should rename the section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here because the other physical_* properties were here, but you're right this could use a refactor. I've:

  • renamed this section to Environments (Virtual Layer)
  • moved the project properties like log_limit to the Projects section
  • moved the model properties like time_column_format to the Models section
  • added a section called Database (Physical Layer) to house the physical_* properties

environment_suffix_target: EnvironmentSuffixTarget = Field(
default=EnvironmentSuffixTarget.default
)
physical_table_naming_convention: t.Optional[TableNamingConvention] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use schema_and_table as a default?

# This can be removed from this model once Pydantic 1 support is dropped (must remain in `Snapshot` though)
base_table_name_override: t.Optional[str]
dev_table_suffix: str
table_naming_convention: t.Optional[TableNamingConvention] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: I believe we have an appropriate default for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about using None is that it makes the serialized JSON payloads smaller. But I dont have a strong opinion on this, so i've changed these to default to TableNamingConvention.default

base_table_name_override: t.Optional[str] = None
next_auto_restatement_ts: t.Optional[int] = None
dev_table_suffix: str = "dev"
table_naming_convention_: t.Optional[TableNamingConvention] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reusing the physical version (eg. FORWARD_ONLY, METADATA) we should inherit the convention of the previous snapshot just like we do the physical schema: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/definition.py#L1015

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks. I had a suspicion something like this was lurking.

I've added a test to show that categorizing a snapshot as FORWARD_ONLY inherits the table_naming_convention from the previous_version

@erindru erindru force-pushed the erin/physical-table-naming-convention branch from 911bcfc to a5496e3 Compare July 25, 2025 01:15
@erindru erindru merged commit 9b04f45 into main Jul 27, 2025
27 checks passed
@erindru erindru deleted the erin/physical-table-naming-convention branch July 27, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to create physical layer tables using a hash function

3 participants