From fb621f7a1695a9f59db3dd3ac3e6462b296ebc53 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Thu, 6 Mar 2025 11:52:17 +0200 Subject: [PATCH 1/7] Chore: Add Linter docs --- docs/concepts/linter.md | 83 +++++++++++++++++++++++++++ docs/concepts/models/overview.md | 4 +- docs/guides/configuration.md | 57 ++++++++++++++++++ docs/reference/model_configuration.md | 2 +- 4 files changed, 143 insertions(+), 3 deletions(-) create mode 100644 docs/concepts/linter.md diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md new file mode 100644 index 0000000000..087be5b913 --- /dev/null +++ b/docs/concepts/linter.md @@ -0,0 +1,83 @@ +# Linter + +Linting enables you to validate the model definition, ensuring it adheres to best practices. When a new [plan](plans.md) is created, each model is checked against a set of built-in and user-defined rules to verify that its attributes comply with the defined standards; This improves code quality, consistency, and helps to detect issues early in the development cycle. + +For more information regarding linter configuration visit the relevant [guide here](../guides/configuration.md). + +# Rules + +Each rule is responsible for detecting a specific issue or pattern within a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule` (redacted form): + +```Python3 +class Rule: + """The base class for a rule.""" + + @abc.abstractmethod + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + """The evaluation function that'll check for a violation of this rule.""" + + @property + def summary(self) -> str: + """A summary of what this rule checks for.""" + return self.__doc__ or "" + + def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: + """Create a RuleViolation instance for this rule""" + return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) + +``` + +Thus, each `Rule` can be broken down to it's vital components: +- The name (or code) of the rule is it's class name in lowercase. +- The core logic is implemented in `Rule::check_model(...)` which can examine any `Model` attribute. +- If an issue is detected, a `RuleViolation` object should be returned with the proper context. This can be created manually or through the `Rule::violation()` helper. +- A short explanation of the Rule's workings should be added in the form of a class docstring or by subclassing `Rule::summary`. + + + +## Built-in +SQLMesh comes with a set of predefined rules which check for potential SQL errors or enforce stylistic opinions. An example of the latter the `NoSelectStar` rule which prohibits users from writing `SELECT *` in the outer-most select: + + +```Python +class NoSelectStar(Rule): + """Query should not contain SELECT * on its outer most projections, even if it can be expanded.""" + + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + if not isinstance(model, SqlModel): + return None + + return self.violation() if model.query.is_star else None +``` + + +The list of all built-in rules is the following: + + +| Name | Explanation +|--------------------------------------|--------------------------------------------------------------------------------------------------------| +| ambiguousorinvalidcolumns | The optimizer was unable to trace or found duplicate columns | +| invalidselectstarexpansion | The optimizer was unable to expand the top-level `SELECT *` | +| noselectstar | The query's top-level projection should not be `SELECT *`, even if it can be expanded by the optimizer | + + +## User defined rules +Users can implement their own custom rules in a similar fashion. For that matter, SQLMesh will attempt to load any subclass of `Rule` under the `linter/` directory. + +For instance, if an organization wanted to ensure all models are owned by an engineer, one solution would be to implement the following rule: + +```Python +# linter/user.py + +import typing as t + +from sqlmesh.core.linter.rule import Rule, RuleViolation, RuleSet +from sqlmesh.core.model import Model + +class NoMissingOwner(Rule): + """Model owner always should be specified.""" + + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + return self.violation() if not model.owner else None + +``` diff --git a/docs/concepts/models/overview.md b/docs/concepts/models/overview.md index 6546e6edfa..6156348e5e 100644 --- a/docs/concepts/models/overview.md +++ b/docs/concepts/models/overview.md @@ -443,8 +443,8 @@ to `false` causes SQLMesh to disable query canonicalization & simplification. Th !!! warning Turning off the optimizer may prevent column-level lineage from working for the affected model and its descendants, unless all columns in the model's query are qualified and it contains no star projections (e.g. `SELECT *`). -### validate_query -: Whether the model's query will be validated at compile time. This attribute is `false` by default. Setting it to `true` causes SQLMesh to raise an error instead of emitting warnings. This will display invalid columns in your SQL statements along with models containing `SELECT *` that cannot be automatically expanded to list out all columns. This ensures SQL is verified locally before time and money are spent running the SQL in your data warehouse. +### ignored_rules +: Specifies which linter rules should be ignored/excluded for this model. ## Incremental Model Properties diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md index d96980c427..ff3d1ff3e6 100644 --- a/docs/guides/configuration.md +++ b/docs/guides/configuration.md @@ -1109,6 +1109,63 @@ def grant_schema_usage(evaluator): As demonstrated in these examples, the `environment_naming_info` is available within the macro evaluator for macros invoked within the `before_all` and `after_all` statements. Additionally, the macro `this_env` provides access to the current environment name, which can be helpful for more advanced use cases that require fine-grained control over their behaviour. +### Linter + +The [linter](../concepts/linter.md) utilizes rules to analyze `Model` definitions (e.g its query) in order to flag errors, enforce stylistic opinions or find suspicious constructs. + +It can be configured under the `linter` key, with the rules being defined as lists of rule names or `"ALL"`. An example: + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + + rules: ["rule1", "rule2"] + warn_rules: ["rule3"] + ignored_rules: ["rule4"] + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + rules=["rule1", "rule2"], + warn_rules=["rule3"], + ignored_rules=["rule4"] + ) + ) + ``` + +To enable different levels of escalation, SQLMesh defines the following keys: +- `rules`: Any violation will raise an error, essentially halting execution until it's fixed +- `warning_rules`: Violations will only log warnings for the user +- `ignored_rules`: Violations will not have any visual or actual effect + + +By default, the linter configuration is disabled and all of the rules are excluded. + + +The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` are defined as `'ALL'`, thus avoiding listing out individual rules. However, SQLMesh will detect if there's overlap in `rules` and `warning_rules`, since these should be mutually exclusive. An example of this: + + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + + rules: "ALL" + ignored_rules: ["rule4"] + ``` + +Users can also use `ignored_rules` as a model attribute to exclude certain rules on a per-model basis. + + ### Debug mode To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y". diff --git a/docs/reference/model_configuration.md b/docs/reference/model_configuration.md index 54d14900b1..a74640b4a5 100644 --- a/docs/reference/model_configuration.md +++ b/docs/reference/model_configuration.md @@ -39,7 +39,7 @@ Configuration options for SQLMesh model properties. Supported by all model kinds | `enabled` | Whether the model is enabled. This attribute is `true` by default. Setting it to `false` causes SQLMesh to ignore this model when loading the project. | bool | N | | `gateway` | Specifies the gateway to use for the execution of this model. When not specified, the default gateway is used. | str | N | | `optimize_query` | Whether the model's query should be optimized. This attribute is `true` by default. Setting it to `false` causes SQLMesh to disable query canonicalization & simplification. This should be turned off only if the optimized query leads to errors such as surpassing text limit. | bool | N | -| `validate_query` | Whether the model's query will be strictly validated at compile time. This attribute is `false` by default. Setting it to `true` causes SQLMesh to raise an error instead of emitting warnings. This will display invalid columns in your SQL statements along with models containing `SELECT *` that cannot be automatically expanded to list out all columns. | bool | N | +| `ignored_rules` | A list of linter rule names (or "ALL") to be ignored/excluded for this model | str \| array[str] | N | ### Model defaults From 9a18ecd8c6ef391e404dab8447565de19d204428 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Fri, 7 Mar 2025 11:59:55 +0200 Subject: [PATCH 2/7] PR Feedback 1 --- docs/concepts/linter.md | 55 ++++++++++++++++++++++------ docs/concepts/models/overview.md | 6 ++++ docs/guides/configuration.md | 62 +++++++++++++++++++++++++------- 3 files changed, 100 insertions(+), 23 deletions(-) diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index 087be5b913..e9ab42dfbc 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -27,8 +27,8 @@ class Rule: ``` -Thus, each `Rule` can be broken down to it's vital components: -- The name (or code) of the rule is it's class name in lowercase. +Thus, each `Rule` can be broken down to its vital components: +- The name (or code) of the rule is defined as its class name in lowercase. - The core logic is implemented in `Rule::check_model(...)` which can examine any `Model` attribute. - If an issue is detected, a `RuleViolation` object should be returned with the proper context. This can be created manually or through the `Rule::violation()` helper. - A short explanation of the Rule's workings should be added in the form of a class docstring or by subclassing `Rule::summary`. @@ -54,24 +54,24 @@ class NoSelectStar(Rule): The list of all built-in rules is the following: -| Name | Explanation -|--------------------------------------|--------------------------------------------------------------------------------------------------------| -| ambiguousorinvalidcolumns | The optimizer was unable to trace or found duplicate columns | -| invalidselectstarexpansion | The optimizer was unable to expand the top-level `SELECT *` | -| noselectstar | The query's top-level projection should not be `SELECT *`, even if it can be expanded by the optimizer | +| Name | Check | Explanation +|--------------------------------------|----------------------------------------------------------------------------------------------------------------------| +| ambiguousorinvalidcolumns | Correctness | The optimizer was unable to trace or found duplicate columns | +| invalidselectstarexpansion | Correctness | The optimizer was unable to expand the top-level `SELECT *` | +| noselectstar | Stylistic | The query's top-level projection should not be `SELECT *`, even if it can be expanded by the optimizer | ## User defined rules -Users can implement their own custom rules in a similar fashion. For that matter, SQLMesh will attempt to load any subclass of `Rule` under the `linter/` directory. +Users can implement their own custom rules in a similar fashion. For that matter, SQLMesh will load any subclass of `Rule` under the `linter/` directory. -For instance, if an organization wanted to ensure all models are owned by an engineer, one solution would be to implement the following rule: +For instance, if an organization wanted to ensure all models are owned by an engineer, one solution would be to implement the following check: ```Python # linter/user.py import typing as t -from sqlmesh.core.linter.rule import Rule, RuleViolation, RuleSet +from sqlmesh.core.linter.rule import Rule, RuleViolation from sqlmesh.core.model import Model class NoMissingOwner(Rule): @@ -81,3 +81,38 @@ class NoMissingOwner(Rule): return self.violation() if not model.owner else None ``` + +This can then be configured to raise an error (or log a warning): + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + + rules: [..., "nomissingowner"] + ``` + +When models are loaded again (e.g through a `sqlmesh plan`) the linter will run, yielding an error if a violation occurs: + +``` +$ sqlmesh plan + +Linter errors for .../models/full_model.sql: + - nomissingowner: Model owner always should be specified. + +Error: Linter detected errors in the code. Please fix them before proceeding. +``` + +This helps users trace the offending model(s) during compilation time i.e models that are not owned for this case: + +=== "YAML" + + ```sql linenums="1" + MODEL( + name docs_example.full_model, + kind FULL, + cron '@daily', + grain item_id, + ); + ``` \ No newline at end of file diff --git a/docs/concepts/models/overview.md b/docs/concepts/models/overview.md index 6156348e5e..eaa3ce7393 100644 --- a/docs/concepts/models/overview.md +++ b/docs/concepts/models/overview.md @@ -443,6 +443,12 @@ to `false` causes SQLMesh to disable query canonicalization & simplification. Th !!! warning Turning off the optimizer may prevent column-level lineage from working for the affected model and its descendants, unless all columns in the model's query are qualified and it contains no star projections (e.g. `SELECT *`). +### validate_query +: Whether the model's query will be validated at compile time. This attribute is `false` by default. Setting it to `true` causes SQLMesh to raise an error instead of emitting warnings. This will display invalid columns in your SQL statements along with models containing `SELECT *` that cannot be automatically expanded to list out all columns. This ensures SQL is verified locally before time and money are spent running the SQL in your data warehouse. + +!!! warning + This flag is deprecated as of v.0.159.7+ in favor of the [linter](../linter.md). To preserve validation during compilation, the [built-in rules](../linter.md#built-in) that check for correctness should be [configured](../../guides/configuration.md#linter) to error severity. + ### ignored_rules : Specifies which linter rules should be ignored/excluded for this model. diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md index ff3d1ff3e6..add70d905c 100644 --- a/docs/guides/configuration.md +++ b/docs/guides/configuration.md @@ -1113,7 +1113,7 @@ As demonstrated in these examples, the `environment_naming_info` is available wi The [linter](../concepts/linter.md) utilizes rules to analyze `Model` definitions (e.g its query) in order to flag errors, enforce stylistic opinions or find suspicious constructs. -It can be configured under the `linter` key, with the rules being defined as lists of rule names or `"ALL"`. An example: +It can be configured under the `linter` key, with the rules being defined as lists of rule names: === "YAML" @@ -1121,9 +1121,9 @@ It can be configured under the `linter` key, with the rules being defined as lis linter: enabled: True - rules: ["rule1", "rule2"] - warn_rules: ["rule3"] - ignored_rules: ["rule4"] + rules: ["ambiguousorinvalidcolumns"] + warn_rules: ["invalidselectstarexpansion"] + ignored_rules: ["noselectstar"] ``` === "Python" @@ -1134,23 +1134,49 @@ It can be configured under the `linter` key, with the rules being defined as lis config = Config( linter=LinterConfig( enabled=True, - rules=["rule1", "rule2"], - warn_rules=["rule3"], - ignored_rules=["rule4"] + rules=["ambiguousorinvalidcolumns"] + warn_rules=["invalidselectstarexpansion"] + ignored_rules=["noselectstar"] ) ) ``` -To enable different levels of escalation, SQLMesh defines the following keys: -- `rules`: Any violation will raise an error, essentially halting execution until it's fixed +Or through the `"ALL"` specifier: + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + + rules: "ALL" + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + rules="all", + ) + ) + ``` + +#### Rule severity +To enable different levels of severity, SQLMesh defines the following keys: +- `rules`: Violations will raise an error, essentially halting execution until they're fixed - `warning_rules`: Violations will only log warnings for the user -- `ignored_rules`: Violations will not have any visual or actual effect +- `ignored_rules`: The linter will exclude these rules from running completely By default, the linter configuration is disabled and all of the rules are excluded. +SQLMesh will detect if there's overlap in `rules` and `warning_rules`, since these should be mutually exclusive. -The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` are defined as `'ALL'`, thus avoiding listing out individual rules. However, SQLMesh will detect if there's overlap in `rules` and `warning_rules`, since these should be mutually exclusive. An example of this: +The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` are defined as `'ALL'`, thus avoiding listing out individual rules. An example of this: === "YAML" @@ -1160,10 +1186,20 @@ The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` ar enabled: True rules: "ALL" - ignored_rules: ["rule4"] + ignored_rules: ["noselectstar"] + ``` + +Users can also override the global configuration on a per-model basis by using `ignored_rules` as a model attribute: + +=== "YAML" + + ```sql linenums="1" + MODEL( + name docs_example.full_model, + ignored_rules: ["invalidselectstarexpansion"] # or "ALL" + ); ``` -Users can also use `ignored_rules` as a model attribute to exclude certain rules on a per-model basis. ### Debug mode From 0c3be80751e52daa300bdc2b667094db0ea31790 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Fri, 7 Mar 2025 15:21:26 +0200 Subject: [PATCH 3/7] Fix wording --- docs/concepts/linter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index e9ab42dfbc..d6f62e9880 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -29,9 +29,9 @@ class Rule: Thus, each `Rule` can be broken down to its vital components: - The name (or code) of the rule is defined as its class name in lowercase. -- The core logic is implemented in `Rule::check_model(...)` which can examine any `Model` attribute. +- The core logic is implemented in `Rule::check_model(...)` which can analyze any `Model` attribute. - If an issue is detected, a `RuleViolation` object should be returned with the proper context. This can be created manually or through the `Rule::violation()` helper. -- A short explanation of the Rule's workings should be added in the form of a class docstring or by subclassing `Rule::summary`. +- A short explanation of the rule's purpose should be added in the form of a class docstring or by subclassing `Rule::summary`. @@ -58,7 +58,7 @@ The list of all built-in rules is the following: |--------------------------------------|----------------------------------------------------------------------------------------------------------------------| | ambiguousorinvalidcolumns | Correctness | The optimizer was unable to trace or found duplicate columns | | invalidselectstarexpansion | Correctness | The optimizer was unable to expand the top-level `SELECT *` | -| noselectstar | Stylistic | The query's top-level projection should not be `SELECT *`, even if it can be expanded by the optimizer | +| noselectstar | Stylistic | The query's top-level selection should not be `SELECT *`, even if it can be expanded by the optimizer | ## User defined rules @@ -104,7 +104,7 @@ Linter errors for .../models/full_model.sql: Error: Linter detected errors in the code. Please fix them before proceeding. ``` -This helps users trace the offending model(s) during compilation time i.e models that are not owned for this case: +This helps users trace the offending model(s) during compilation time i.e models that are not owned: === "YAML" From 55949ce7e248d9be7d3dc52fcf4827da5aad2a47 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Fri, 7 Mar 2025 17:46:47 +0200 Subject: [PATCH 4/7] Remove validate_query --- docs/reference/model_configuration.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/reference/model_configuration.md b/docs/reference/model_configuration.md index a74640b4a5..31a874acc7 100644 --- a/docs/reference/model_configuration.md +++ b/docs/reference/model_configuration.md @@ -123,7 +123,6 @@ The SQLMesh project-level `model_defaults` key supports the following options, d - on_destructive_change (described [below](#incremental-models)) - audits (described [here](../concepts/audits.md#generic-audits)) - optimize_query -- validate_query - allow_partials - enabled - interval_unit From e58f03150f1b897b83d98a36c237fd9e7c20e4d1 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Fri, 7 Mar 2025 19:10:17 +0200 Subject: [PATCH 5/7] PR Feedback 2 --- docs/concepts/linter.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index d6f62e9880..cc6390881d 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -1,12 +1,12 @@ # Linter -Linting enables you to validate the model definition, ensuring it adheres to best practices. When a new [plan](plans.md) is created, each model is checked against a set of built-in and user-defined rules to verify that its attributes comply with the defined standards; This improves code quality, consistency, and helps to detect issues early in the development cycle. +Linting enables you to validate the model definition, ensuring it adheres to best practices. When a project is loaded in SQLMesh, each model is checked against a set of rules to verify that its definitions complies with the project's standards; This improves code quality, consistency, and helps to detect issues early in the development cycle. For more information regarding linter configuration visit the relevant [guide here](../guides/configuration.md). # Rules -Each rule is responsible for detecting a specific issue or pattern within a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule` (redacted form): +Each rule is responsible for detecting a specific issue or pattern in a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule` (redacted form): ```Python3 class Rule: @@ -36,7 +36,7 @@ Thus, each `Rule` can be broken down to its vital components: ## Built-in -SQLMesh comes with a set of predefined rules which check for potential SQL errors or enforce stylistic opinions. An example of the latter the `NoSelectStar` rule which prohibits users from writing `SELECT *` in the outer-most select: +SQLMesh comes with a set of predefined rules which check for potential SQL errors or enforce stylistic opinions. An example of the latter is the `NoSelectStar` rule, prohibiting users from writing `SELECT *` in the outer-most select: ```Python @@ -62,9 +62,9 @@ The list of all built-in rules is the following: ## User defined rules -Users can implement their own custom rules in a similar fashion. For that matter, SQLMesh will load any subclass of `Rule` under the `linter/` directory. +Users can implement their own custom rules in a similar fashion. SQLMesh will load any subclass of `Rule` under the `linter/` directory. -For instance, if an organization wanted to ensure all models are owned by an engineer, one solution would be to implement the following check: +For instance, if an organization wanted to ensure all models have an owner, one solution would be to implement the following check: ```Python # linter/user.py From 3cad6f672e506121568427229fb47ae015a6bef4 Mon Sep 17 00:00:00 2001 From: vaggelisd Date: Fri, 7 Mar 2025 20:00:10 +0200 Subject: [PATCH 6/7] Wrap rule impl to snippet --- docs/concepts/linter.md | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index cc6390881d..7f38ebd261 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -1,31 +1,34 @@ # Linter -Linting enables you to validate the model definition, ensuring it adheres to best practices. When a project is loaded in SQLMesh, each model is checked against a set of rules to verify that its definitions complies with the project's standards; This improves code quality, consistency, and helps to detect issues early in the development cycle. +Linting enables you to validate the model definition, ensuring it adheres to best practices. When a project is loaded in SQLMesh, each model is checked against a set of rules to verify that its definitions complies with the project's standards. This improves code quality, consistency, and helps to detect issues early in the development cycle. For more information regarding linter configuration visit the relevant [guide here](../guides/configuration.md). # Rules -Each rule is responsible for detecting a specific issue or pattern in a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule` (redacted form): +Each rule is responsible for detecting a specific issue or pattern in a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule`: -```Python3 -class Rule: - """The base class for a rule.""" +??? "Rule class implementation" + This is an outline of the `Rule` class and it's critical parts, the actual implementation can be found [here](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py): - @abc.abstractmethod - def check_model(self, model: Model) -> t.Optional[RuleViolation]: - """The evaluation function that'll check for a violation of this rule.""" + ```Python3 + class Rule: + """The base class for a rule.""" - @property - def summary(self) -> str: - """A summary of what this rule checks for.""" - return self.__doc__ or "" + @abc.abstractmethod + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + """The evaluation function that'll check for a violation of this rule.""" - def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: - """Create a RuleViolation instance for this rule""" - return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) + @property + def summary(self) -> str: + """A summary of what this rule checks for.""" + return self.__doc__ or "" -``` + def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: + """Create a RuleViolation instance for this rule""" + return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) + + ``` Thus, each `Rule` can be broken down to its vital components: - The name (or code) of the rule is defined as its class name in lowercase. From 4fd38795ff7ae31addd6dec71d33ca860946c6fa Mon Sep 17 00:00:00 2001 From: Trey Spiller <1831878+treysp@users.noreply.github.com> Date: Fri, 7 Mar 2025 14:33:54 -0600 Subject: [PATCH 7/7] Docs: trey linter doc edits (#3957) --- docs/concepts/linter.md | 244 ++++++++++++++++++++++++++--------- docs/guides/configuration.md | 94 +------------- mkdocs.yml | 1 + 3 files changed, 187 insertions(+), 152 deletions(-) diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index 7f38ebd261..14b28bd0ab 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -1,121 +1,243 @@ -# Linter +# Linter guide -Linting enables you to validate the model definition, ensuring it adheres to best practices. When a project is loaded in SQLMesh, each model is checked against a set of rules to verify that its definitions complies with the project's standards. This improves code quality, consistency, and helps to detect issues early in the development cycle. +Linting is a powerful tool for improving code quality and consistency. It enables you to automatically validate model definition, ensuring they adhere to your team's best practices. -For more information regarding linter configuration visit the relevant [guide here](../guides/configuration.md). +When a SQLMesh command is executed and the project is loaded, each model's code is checked for compliance with a set of rules you choose. -# Rules +SQLMesh provides built-in rules, and you can define custom rules. This improves code quality and helps detect issues early in the development cycle when they are simpler to debug. -Each rule is responsible for detecting a specific issue or pattern in a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule`: +## Rules -??? "Rule class implementation" - This is an outline of the `Rule` class and it's critical parts, the actual implementation can be found [here](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py): +Each linting rule is responsible for identifying a pattern in a model's code. - ```Python3 - class Rule: - """The base class for a rule.""" +Some rules validate that a pattern is *not* present, such as not allowing `SELECT *` in a model's outermost query. Other rules validate that a pattern *is* present, like ensuring that every model's `owner` field is specified. We refer to both of these below as "validating a pattern". - @abc.abstractmethod - def check_model(self, model: Model) -> t.Optional[RuleViolation]: - """The evaluation function that'll check for a violation of this rule.""" +Rules are defined in Python. Each rule is an individual Python class that inherits from SQLMesh's `Rule` base class and defines the logic for validating a pattern. - @property - def summary(self) -> str: - """A summary of what this rule checks for.""" - return self.__doc__ or "" +We display a portion of the `Rule` base class's code below ([full source code](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py)). Its methods and properties illustrate the most important components of the subclassed rules you define. - def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: - """Create a RuleViolation instance for this rule""" - return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) +Each rule class you create has four vital components: - ``` +1. Name: the class's name is used as the rule's name. +2. Description: the class should define a docstring that provides a short explanation of the rule's purpose. +3. Pattern validation logic: the class should define a `check_model()` method containing the core logic that validates the rule's pattern. The method can access any `Model` attribute. +4. Rule violation logic: if a rule's pattern is not validated, the rule is "violated" and the class should return a `RuleViolation` object. The `RuleViolation` object should include the contextual information a user needs to understand and fix the problem. -Thus, each `Rule` can be broken down to its vital components: -- The name (or code) of the rule is defined as its class name in lowercase. -- The core logic is implemented in `Rule::check_model(...)` which can analyze any `Model` attribute. -- If an issue is detected, a `RuleViolation` object should be returned with the proper context. This can be created manually or through the `Rule::violation()` helper. -- A short explanation of the rule's purpose should be added in the form of a class docstring or by subclassing `Rule::summary`. +``` python linenums="1" +# Class name used as rule's name +class Rule: + # Docstring provides rule's description + """The base class for a rule.""" + # Pattern validation logic goes in `check_model()` method + @abc.abstractmethod + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + """The evaluation function that checks for a violation of this rule.""" + + # Rule violation object returned by `violation()` method + def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: + """Return a RuleViolation instance if this rule is violated""" + return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) +``` + +### Built-in rules +SQLMesh includes a set of predefined rules that check for potential SQL errors or enforce code style. -## Built-in -SQLMesh comes with a set of predefined rules which check for potential SQL errors or enforce stylistic opinions. An example of the latter is the `NoSelectStar` rule, prohibiting users from writing `SELECT *` in the outer-most select: +An example of the latter is the `NoSelectStar` rule, which prohibits a model from using `SELECT *` in its query's outer-most select statement. +Here is code for the built-in `NoSelectStar` rule class, with the different components annotated: -```Python +``` python linenums="1" +# Rule's name is the class name `NoSelectStar` class NoSelectStar(Rule): + # Docstring explaining rule """Query should not contain SELECT * on its outer most projections, even if it can be expanded.""" def check_model(self, model: Model) -> t.Optional[RuleViolation]: + # If this model does not contain a SQL query, there is nothing to validate if not isinstance(model, SqlModel): return None + # Use the query's `is_star` property to detect the `SELECT *` pattern. + # If present, call the `violation()` method to return a `RuleViolation` object. return self.violation() if model.query.is_star else None ``` +Here are all of SQLMesh's built-in linting rules: -The list of all built-in rules is the following: +| Name | Check type | Explanation | +| -------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------ | +| ambiguousorinvalidcolumn | Correctness | SQLMesh found duplicate columns or was unable to determine whether a column is duplicated or not | +| invalidselectstarexpansion | Correctness | The query's top-level selection may be `SELECT *`, but only if SQLMesh can expand the `SELECT *` into individual columns | +| noselectstar | Stylistic | The query's top-level selection may not be `SELECT *`, even if SQLMesh can expand the `SELECT *` into individual columns | -| Name | Check | Explanation -|--------------------------------------|----------------------------------------------------------------------------------------------------------------------| -| ambiguousorinvalidcolumns | Correctness | The optimizer was unable to trace or found duplicate columns | -| invalidselectstarexpansion | Correctness | The optimizer was unable to expand the top-level `SELECT *` | -| noselectstar | Stylistic | The query's top-level selection should not be `SELECT *`, even if it can be expanded by the optimizer | +### User-defined rules +You may define custom rules to implement your team's best practices. -## User defined rules -Users can implement their own custom rules in a similar fashion. SQLMesh will load any subclass of `Rule` under the `linter/` directory. - -For instance, if an organization wanted to ensure all models have an owner, one solution would be to implement the following check: - -```Python -# linter/user.py +For instance, you could ensure all models have an `owner` by defining the following linting rule: +``` python linenums="1" title="linter/user.py" import typing as t from sqlmesh.core.linter.rule import Rule, RuleViolation from sqlmesh.core.model import Model class NoMissingOwner(Rule): - """Model owner always should be specified.""" + """Model owner should always be specified.""" def check_model(self, model: Model) -> t.Optional[RuleViolation]: + # Rule violated if the model's owner field (`model.owner`) is not specified return self.violation() if not model.owner else None ``` -This can then be configured to raise an error (or log a warning): +Place a rule's code in the project's `linter/` directory. SQLMesh will load all subclasses of `Rule` from that directory. + +If the rule is specified in the project's [configuration file](#applying-linting-rules), SQLMesh will run it when the project is loaded. All SQLMesh commands will load the project, except for `create_external_models`, `migrate`, `rollback`, `run`, `environments`, and `invalidate`. + +SQLMesh will error if a model violates the rule, informing you which model(s) violated the rule. In this example, `full_model.sql` violated the `NoMissingOwner` rule: + +``` bash +$ sqlmesh plan + +Linter errors for .../models/full_model.sql: + - nomissingowner: Model owner should always be specified. + +Error: Linter detected errors in the code. Please fix them before proceeding. +``` + +## Applying linting rules + +Specify which linting rules a project should apply in the project's [configuration file](../guides/configuration.md). + +Rules are specified as lists of rule names under the `linter` key. Globally enable or disable linting with the `enabled` key, which is `false` by default. + +NOTE: you **must** set the `enabled` key to `true` key to apply the project's linting rules. + +### Specific linting rules + +This example specifies that the `"ambiguousorinvalidcolumn"` and `"invalidselectstarexpansion"` linting rules should be enforced: === "YAML" ```yaml linenums="1" linter: - enabled: True + enabled: true + rules: ["ambiguousorinvalidcolumn", "invalidselectstarexpansion"] + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig - rules: [..., "nomissingowner"] + config = Config( + linter=LinterConfig( + enabled=True, + rules=["ambiguousorinvalidcolumn", "invalidselectstarexpansion"] + ) + ) ``` -When models are loaded again (e.g through a `sqlmesh plan`) the linter will run, yielding an error if a violation occurs: +### All linting rules -``` -$ sqlmesh plan +Apply every built-in and user-defined rule by specifying `"ALL"` instead of a list of rules: -Linter errors for .../models/full_model.sql: - - nomissingowner: Model owner always should be specified. +=== "YAML" -Error: Linter detected errors in the code. Please fix them before proceeding. + ```yaml linenums="1" + linter: + enabled: True + rules: "ALL" + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + rules="all", + ) + ) + ``` + +If you want to apply all rules except for a few, you can specify `"ALL"` and list the rules to ignore in the `ignored_rules` key: + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + rules: "ALL" # apply all built-in and user-defined rules and error if violated + ignored_rules: ["noselectstar"] # but don't run the `noselectstar` rule + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + # apply all built-in and user-defined linting rules and error if violated + rules="all", + # but don't run the `noselectstar` rule + ignored_rules=["noselectstar"] + ) + ) + ``` + +### Exclude a model from linting + +You can specify that a specific *model* ignore a linting rule by specifying `ignored_rules` in its `MODEL` block. + +This example specifies that the model `docs_example.full_model` should not run the `invalidselectstarexpansion` rule: + +```sql linenums="1" +MODEL( + name docs_example.full_model, + ignored_rules: ["invalidselectstarexpansion"] # or "ALL" to turn off linting completely +); ``` -This helps users trace the offending model(s) during compilation time i.e models that are not owned: +### Rule violation behavior + +Linting rule violations raise an error by default, preventing the project from running until the violation is addressed. + +You may specify that a rule's violation should not error and only log a warning by specifying it in the `warning_rules` key instead of the `rules` key. === "YAML" - ```sql linenums="1" - MODEL( - name docs_example.full_model, - kind FULL, - cron '@daily', - grain item_id, - ); - ``` \ No newline at end of file + ```yaml linenums="1" + linter: + enabled: True + # error if `ambiguousorinvalidcolumn` rule violated + rules: ["ambiguousorinvalidcolumn"] + # but only warn if "invalidselectstarexpansion" is violated + warning_rules: ["invalidselectstarexpansion"] + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + # error if `ambiguousorinvalidcolumn` rule violated + rules=["ambiguousorinvalidcolumn"], + # but only warn if "invalidselectstarexpansion" is violated + warning_rules=["invalidselectstarexpansion"], + ) + ) + ``` + +SQLMesh will raise an error if the same rule is included in more than one of the `rules`, `warning_rules`, and `ignored_rules` keys since they should be mutually exclusive. \ No newline at end of file diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md index add70d905c..a54b0661c8 100644 --- a/docs/guides/configuration.md +++ b/docs/guides/configuration.md @@ -1108,99 +1108,11 @@ def grant_schema_usage(evaluator): As demonstrated in these examples, the `environment_naming_info` is available within the macro evaluator for macros invoked within the `before_all` and `after_all` statements. Additionally, the macro `this_env` provides access to the current environment name, which can be helpful for more advanced use cases that require fine-grained control over their behaviour. +### Linting -### Linter - -The [linter](../concepts/linter.md) utilizes rules to analyze `Model` definitions (e.g its query) in order to flag errors, enforce stylistic opinions or find suspicious constructs. - -It can be configured under the `linter` key, with the rules being defined as lists of rule names: - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: ["ambiguousorinvalidcolumns"] - warn_rules: ["invalidselectstarexpansion"] - ignored_rules: ["noselectstar"] - ``` - -=== "Python" - - ```python linenums="1" - from sqlmesh.core.config import Config, LinterConfig - - config = Config( - linter=LinterConfig( - enabled=True, - rules=["ambiguousorinvalidcolumns"] - warn_rules=["invalidselectstarexpansion"] - ignored_rules=["noselectstar"] - ) - ) - ``` - -Or through the `"ALL"` specifier: - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: "ALL" - ``` - -=== "Python" - - ```python linenums="1" - from sqlmesh.core.config import Config, LinterConfig - - config = Config( - linter=LinterConfig( - enabled=True, - rules="all", - ) - ) - ``` - -#### Rule severity -To enable different levels of severity, SQLMesh defines the following keys: -- `rules`: Violations will raise an error, essentially halting execution until they're fixed -- `warning_rules`: Violations will only log warnings for the user -- `ignored_rules`: The linter will exclude these rules from running completely - - -By default, the linter configuration is disabled and all of the rules are excluded. - -SQLMesh will detect if there's overlap in `rules` and `warning_rules`, since these should be mutually exclusive. - -The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` are defined as `'ALL'`, thus avoiding listing out individual rules. An example of this: - - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: "ALL" - ignored_rules: ["noselectstar"] - ``` - -Users can also override the global configuration on a per-model basis by using `ignored_rules` as a model attribute: - -=== "YAML" - - ```sql linenums="1" - MODEL( - name docs_example.full_model, - ignored_rules: ["invalidselectstarexpansion"] # or "ALL" - ); - ``` - +SQLMesh provides a linter that checks for potential issues in your models' code. Enable it and specify which linting rules to apply in the configuration file's `linter` key. +Learn more about linting configuration on the [linting concepts page](../concepts/linter.md). ### Debug mode diff --git a/mkdocs.yml b/mkdocs.yml index 8b68731325..fbac53aa99 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,6 +32,7 @@ nav: - SQLMesh tools: - guides/ui.md - guides/tablediff.md + - concepts/linter.md - guides/observer.md - Concepts: - concepts/overview.md