Feat: Allow some control of table naming at the physical layer#4982
Feat: Allow some control of table naming at the physical layer#4982
Conversation
| warehouse.sqlmesh__finance_mart.transaction_events_over_threshold__<fingerprint> | ||
| ``` | ||
|
|
||
| Notice that the model schema name is no longer part of the physical table name. This allows for slightly longer model names on engines with low identifier length limits, which may be useful for your project. |
There was a problem hiding this comment.
Reading the docs right now doesn't make it obvious why the schema is included in the physical table's name by default. Should we include an example to explain the rationale behind that choice?
There was a problem hiding this comment.
To be honest, I don't know what the rationale behind the original choice was.
The only thing I can think of is to allow someone to set:
physical_schema_mapping:
'.*': some_schema
Which would map every model to the same physical schema regardless of the model schema. In this situation, including the model schema within the model's table name would be helpful to disambiguate foo.model_a and bar.model_a if they would both be written as some_schema.model_a at the physical layer
There was a problem hiding this comment.
So I believe the default exists because it was the lowest common denominator that worked across any rendition of physical_schema_mapping.
I've updated the docs
|
Did a quick first pass– the implementation seems reasonable. |
ce93b54 to
911bcfc
Compare
docs/reference/configuration.md
Outdated
| | `environment_suffix_target` | Whether SQLMesh views should append their environment name to the `schema` or `table` - [additional details](../guides/configuration.md#view-schema-override). (Default: `schema`) | string | N | | ||
| | `gateway_managed_virtual_layer` | Whether SQLMesh views of the virtual layer will be created by the default gateway or model specified gateways - [additional details](../guides/multi_engine.md#gateway-managed-virtual-layer). (Default: False) | boolean | N | | ||
| | `infer_python_dependencies` | Whether SQLMesh will statically analyze Python code to automatically infer Python package requirements. (Default: True) | boolean | N | | ||
| | `physical_table_naming_convention`| Sets which parts of the model name are included in the physical table names. Options are `schema_and_table` or `table_only` - [additional details](../guides/configuration.md#physical-table-naming-convention). (Default: `schema_and_table`) | string | N | |
There was a problem hiding this comment.
Note: The section you're adding this to is called Environment. I don't think this is the right place for it, or we should rename the section.
There was a problem hiding this comment.
I put it here because the other physical_* properties were here, but you're right this could use a refactor. I've:
- renamed this section to
Environments (Virtual Layer) - moved the project properties like
log_limitto theProjectssection - moved the model properties like
time_column_formatto theModelssection - added a section called
Database (Physical Layer)to house thephysical_*properties
sqlmesh/core/config/root.py
Outdated
| environment_suffix_target: EnvironmentSuffixTarget = Field( | ||
| default=EnvironmentSuffixTarget.default | ||
| ) | ||
| physical_table_naming_convention: t.Optional[TableNamingConvention] = None |
There was a problem hiding this comment.
Why not use schema_and_table as a default?
sqlmesh/core/snapshot/definition.py
Outdated
| # This can be removed from this model once Pydantic 1 support is dropped (must remain in `Snapshot` though) | ||
| base_table_name_override: t.Optional[str] | ||
| dev_table_suffix: str | ||
| table_naming_convention: t.Optional[TableNamingConvention] = None |
There was a problem hiding this comment.
Ditto: I believe we have an appropriate default for this
There was a problem hiding this comment.
The nice thing about using None is that it makes the serialized JSON payloads smaller. But I dont have a strong opinion on this, so i've changed these to default to TableNamingConvention.default
sqlmesh/core/snapshot/definition.py
Outdated
| base_table_name_override: t.Optional[str] = None | ||
| next_auto_restatement_ts: t.Optional[int] = None | ||
| dev_table_suffix: str = "dev" | ||
| table_naming_convention_: t.Optional[TableNamingConvention] = Field( |
There was a problem hiding this comment.
When reusing the physical version (eg. FORWARD_ONLY, METADATA) we should inherit the convention of the previous snapshot just like we do the physical schema: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/definition.py#L1015
There was a problem hiding this comment.
Ahh, thanks. I had a suspicion something like this was lurking.
I've added a test to show that categorizing a snapshot as FORWARD_ONLY inherits the table_naming_convention from the previous_version
911bcfc to
a5496e3
Compare
Fixes #4403
Up until now, there has been no control over how SQLMesh generates table names for the physical layer. This presented some issues on engines like Postgres that by default have very short identifier length limits that get triggered unexpectedly because SQLMesh generates table names that contain some extra metadata alongside the model name.
This PR introduces a project-level property called
physical_table_naming_conventionthat has the following values:schema_and_table- include both the schema name and table name from the model in the snapshot table name (current behaviour; default)table_only- include the table name only from the model in the snapshot table name with the assumption that the table is being created within a physical schema that already contains the schema name from the modelhash_md5- produce snapshot table names based on a MD5 hash of what we would have called the table if we were usingschema_and_table. This leads to predictable identifier lengths at the expense of readability