diff --git a/docs/concepts/linter.md b/docs/concepts/linter.md index 7f38ebd261..14b28bd0ab 100644 --- a/docs/concepts/linter.md +++ b/docs/concepts/linter.md @@ -1,121 +1,243 @@ -# Linter +# Linter guide -Linting enables you to validate the model definition, ensuring it adheres to best practices. When a project is loaded in SQLMesh, each model is checked against a set of rules to verify that its definitions complies with the project's standards. This improves code quality, consistency, and helps to detect issues early in the development cycle. +Linting is a powerful tool for improving code quality and consistency. It enables you to automatically validate model definition, ensuring they adhere to your team's best practices. -For more information regarding linter configuration visit the relevant [guide here](../guides/configuration.md). +When a SQLMesh command is executed and the project is loaded, each model's code is checked for compliance with a set of rules you choose. -# Rules +SQLMesh provides built-in rules, and you can define custom rules. This improves code quality and helps detect issues early in the development cycle when they are simpler to debug. -Each rule is responsible for detecting a specific issue or pattern in a model. Rules are defined as classes that implement the logic for validation by subclassing `Rule`: +## Rules -??? "Rule class implementation" - This is an outline of the `Rule` class and it's critical parts, the actual implementation can be found [here](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py): +Each linting rule is responsible for identifying a pattern in a model's code. - ```Python3 - class Rule: - """The base class for a rule.""" +Some rules validate that a pattern is *not* present, such as not allowing `SELECT *` in a model's outermost query. Other rules validate that a pattern *is* present, like ensuring that every model's `owner` field is specified. We refer to both of these below as "validating a pattern". - @abc.abstractmethod - def check_model(self, model: Model) -> t.Optional[RuleViolation]: - """The evaluation function that'll check for a violation of this rule.""" +Rules are defined in Python. Each rule is an individual Python class that inherits from SQLMesh's `Rule` base class and defines the logic for validating a pattern. - @property - def summary(self) -> str: - """A summary of what this rule checks for.""" - return self.__doc__ or "" +We display a portion of the `Rule` base class's code below ([full source code](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py)). Its methods and properties illustrate the most important components of the subclassed rules you define. - def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: - """Create a RuleViolation instance for this rule""" - return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) +Each rule class you create has four vital components: - ``` +1. Name: the class's name is used as the rule's name. +2. Description: the class should define a docstring that provides a short explanation of the rule's purpose. +3. Pattern validation logic: the class should define a `check_model()` method containing the core logic that validates the rule's pattern. The method can access any `Model` attribute. +4. Rule violation logic: if a rule's pattern is not validated, the rule is "violated" and the class should return a `RuleViolation` object. The `RuleViolation` object should include the contextual information a user needs to understand and fix the problem. -Thus, each `Rule` can be broken down to its vital components: -- The name (or code) of the rule is defined as its class name in lowercase. -- The core logic is implemented in `Rule::check_model(...)` which can analyze any `Model` attribute. -- If an issue is detected, a `RuleViolation` object should be returned with the proper context. This can be created manually or through the `Rule::violation()` helper. -- A short explanation of the rule's purpose should be added in the form of a class docstring or by subclassing `Rule::summary`. +``` python linenums="1" +# Class name used as rule's name +class Rule: + # Docstring provides rule's description + """The base class for a rule.""" + # Pattern validation logic goes in `check_model()` method + @abc.abstractmethod + def check_model(self, model: Model) -> t.Optional[RuleViolation]: + """The evaluation function that checks for a violation of this rule.""" + + # Rule violation object returned by `violation()` method + def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation: + """Return a RuleViolation instance if this rule is violated""" + return RuleViolation(rule=self, violation_msg=violation_msg or self.summary) +``` + +### Built-in rules +SQLMesh includes a set of predefined rules that check for potential SQL errors or enforce code style. -## Built-in -SQLMesh comes with a set of predefined rules which check for potential SQL errors or enforce stylistic opinions. An example of the latter is the `NoSelectStar` rule, prohibiting users from writing `SELECT *` in the outer-most select: +An example of the latter is the `NoSelectStar` rule, which prohibits a model from using `SELECT *` in its query's outer-most select statement. +Here is code for the built-in `NoSelectStar` rule class, with the different components annotated: -```Python +``` python linenums="1" +# Rule's name is the class name `NoSelectStar` class NoSelectStar(Rule): + # Docstring explaining rule """Query should not contain SELECT * on its outer most projections, even if it can be expanded.""" def check_model(self, model: Model) -> t.Optional[RuleViolation]: + # If this model does not contain a SQL query, there is nothing to validate if not isinstance(model, SqlModel): return None + # Use the query's `is_star` property to detect the `SELECT *` pattern. + # If present, call the `violation()` method to return a `RuleViolation` object. return self.violation() if model.query.is_star else None ``` +Here are all of SQLMesh's built-in linting rules: -The list of all built-in rules is the following: +| Name | Check type | Explanation | +| -------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------ | +| ambiguousorinvalidcolumn | Correctness | SQLMesh found duplicate columns or was unable to determine whether a column is duplicated or not | +| invalidselectstarexpansion | Correctness | The query's top-level selection may be `SELECT *`, but only if SQLMesh can expand the `SELECT *` into individual columns | +| noselectstar | Stylistic | The query's top-level selection may not be `SELECT *`, even if SQLMesh can expand the `SELECT *` into individual columns | -| Name | Check | Explanation -|--------------------------------------|----------------------------------------------------------------------------------------------------------------------| -| ambiguousorinvalidcolumns | Correctness | The optimizer was unable to trace or found duplicate columns | -| invalidselectstarexpansion | Correctness | The optimizer was unable to expand the top-level `SELECT *` | -| noselectstar | Stylistic | The query's top-level selection should not be `SELECT *`, even if it can be expanded by the optimizer | +### User-defined rules +You may define custom rules to implement your team's best practices. -## User defined rules -Users can implement their own custom rules in a similar fashion. SQLMesh will load any subclass of `Rule` under the `linter/` directory. - -For instance, if an organization wanted to ensure all models have an owner, one solution would be to implement the following check: - -```Python -# linter/user.py +For instance, you could ensure all models have an `owner` by defining the following linting rule: +``` python linenums="1" title="linter/user.py" import typing as t from sqlmesh.core.linter.rule import Rule, RuleViolation from sqlmesh.core.model import Model class NoMissingOwner(Rule): - """Model owner always should be specified.""" + """Model owner should always be specified.""" def check_model(self, model: Model) -> t.Optional[RuleViolation]: + # Rule violated if the model's owner field (`model.owner`) is not specified return self.violation() if not model.owner else None ``` -This can then be configured to raise an error (or log a warning): +Place a rule's code in the project's `linter/` directory. SQLMesh will load all subclasses of `Rule` from that directory. + +If the rule is specified in the project's [configuration file](#applying-linting-rules), SQLMesh will run it when the project is loaded. All SQLMesh commands will load the project, except for `create_external_models`, `migrate`, `rollback`, `run`, `environments`, and `invalidate`. + +SQLMesh will error if a model violates the rule, informing you which model(s) violated the rule. In this example, `full_model.sql` violated the `NoMissingOwner` rule: + +``` bash +$ sqlmesh plan + +Linter errors for .../models/full_model.sql: + - nomissingowner: Model owner should always be specified. + +Error: Linter detected errors in the code. Please fix them before proceeding. +``` + +## Applying linting rules + +Specify which linting rules a project should apply in the project's [configuration file](../guides/configuration.md). + +Rules are specified as lists of rule names under the `linter` key. Globally enable or disable linting with the `enabled` key, which is `false` by default. + +NOTE: you **must** set the `enabled` key to `true` key to apply the project's linting rules. + +### Specific linting rules + +This example specifies that the `"ambiguousorinvalidcolumn"` and `"invalidselectstarexpansion"` linting rules should be enforced: === "YAML" ```yaml linenums="1" linter: - enabled: True + enabled: true + rules: ["ambiguousorinvalidcolumn", "invalidselectstarexpansion"] + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig - rules: [..., "nomissingowner"] + config = Config( + linter=LinterConfig( + enabled=True, + rules=["ambiguousorinvalidcolumn", "invalidselectstarexpansion"] + ) + ) ``` -When models are loaded again (e.g through a `sqlmesh plan`) the linter will run, yielding an error if a violation occurs: +### All linting rules -``` -$ sqlmesh plan +Apply every built-in and user-defined rule by specifying `"ALL"` instead of a list of rules: -Linter errors for .../models/full_model.sql: - - nomissingowner: Model owner always should be specified. +=== "YAML" -Error: Linter detected errors in the code. Please fix them before proceeding. + ```yaml linenums="1" + linter: + enabled: True + rules: "ALL" + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + rules="all", + ) + ) + ``` + +If you want to apply all rules except for a few, you can specify `"ALL"` and list the rules to ignore in the `ignored_rules` key: + +=== "YAML" + + ```yaml linenums="1" + linter: + enabled: True + rules: "ALL" # apply all built-in and user-defined rules and error if violated + ignored_rules: ["noselectstar"] # but don't run the `noselectstar` rule + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + # apply all built-in and user-defined linting rules and error if violated + rules="all", + # but don't run the `noselectstar` rule + ignored_rules=["noselectstar"] + ) + ) + ``` + +### Exclude a model from linting + +You can specify that a specific *model* ignore a linting rule by specifying `ignored_rules` in its `MODEL` block. + +This example specifies that the model `docs_example.full_model` should not run the `invalidselectstarexpansion` rule: + +```sql linenums="1" +MODEL( + name docs_example.full_model, + ignored_rules: ["invalidselectstarexpansion"] # or "ALL" to turn off linting completely +); ``` -This helps users trace the offending model(s) during compilation time i.e models that are not owned: +### Rule violation behavior + +Linting rule violations raise an error by default, preventing the project from running until the violation is addressed. + +You may specify that a rule's violation should not error and only log a warning by specifying it in the `warning_rules` key instead of the `rules` key. === "YAML" - ```sql linenums="1" - MODEL( - name docs_example.full_model, - kind FULL, - cron '@daily', - grain item_id, - ); - ``` \ No newline at end of file + ```yaml linenums="1" + linter: + enabled: True + # error if `ambiguousorinvalidcolumn` rule violated + rules: ["ambiguousorinvalidcolumn"] + # but only warn if "invalidselectstarexpansion" is violated + warning_rules: ["invalidselectstarexpansion"] + ``` + +=== "Python" + + ```python linenums="1" + from sqlmesh.core.config import Config, LinterConfig + + config = Config( + linter=LinterConfig( + enabled=True, + # error if `ambiguousorinvalidcolumn` rule violated + rules=["ambiguousorinvalidcolumn"], + # but only warn if "invalidselectstarexpansion" is violated + warning_rules=["invalidselectstarexpansion"], + ) + ) + ``` + +SQLMesh will raise an error if the same rule is included in more than one of the `rules`, `warning_rules`, and `ignored_rules` keys since they should be mutually exclusive. \ No newline at end of file diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md index add70d905c..a54b0661c8 100644 --- a/docs/guides/configuration.md +++ b/docs/guides/configuration.md @@ -1108,99 +1108,11 @@ def grant_schema_usage(evaluator): As demonstrated in these examples, the `environment_naming_info` is available within the macro evaluator for macros invoked within the `before_all` and `after_all` statements. Additionally, the macro `this_env` provides access to the current environment name, which can be helpful for more advanced use cases that require fine-grained control over their behaviour. +### Linting -### Linter - -The [linter](../concepts/linter.md) utilizes rules to analyze `Model` definitions (e.g its query) in order to flag errors, enforce stylistic opinions or find suspicious constructs. - -It can be configured under the `linter` key, with the rules being defined as lists of rule names: - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: ["ambiguousorinvalidcolumns"] - warn_rules: ["invalidselectstarexpansion"] - ignored_rules: ["noselectstar"] - ``` - -=== "Python" - - ```python linenums="1" - from sqlmesh.core.config import Config, LinterConfig - - config = Config( - linter=LinterConfig( - enabled=True, - rules=["ambiguousorinvalidcolumns"] - warn_rules=["invalidselectstarexpansion"] - ignored_rules=["noselectstar"] - ) - ) - ``` - -Or through the `"ALL"` specifier: - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: "ALL" - ``` - -=== "Python" - - ```python linenums="1" - from sqlmesh.core.config import Config, LinterConfig - - config = Config( - linter=LinterConfig( - enabled=True, - rules="all", - ) - ) - ``` - -#### Rule severity -To enable different levels of severity, SQLMesh defines the following keys: -- `rules`: Violations will raise an error, essentially halting execution until they're fixed -- `warning_rules`: Violations will only log warnings for the user -- `ignored_rules`: The linter will exclude these rules from running completely - - -By default, the linter configuration is disabled and all of the rules are excluded. - -SQLMesh will detect if there's overlap in `rules` and `warning_rules`, since these should be mutually exclusive. - -The usage of `ignored_rules` can prove useful when `rules` or `warning_rules` are defined as `'ALL'`, thus avoiding listing out individual rules. An example of this: - - -=== "YAML" - - ```yaml linenums="1" - linter: - enabled: True - - rules: "ALL" - ignored_rules: ["noselectstar"] - ``` - -Users can also override the global configuration on a per-model basis by using `ignored_rules` as a model attribute: - -=== "YAML" - - ```sql linenums="1" - MODEL( - name docs_example.full_model, - ignored_rules: ["invalidselectstarexpansion"] # or "ALL" - ); - ``` - +SQLMesh provides a linter that checks for potential issues in your models' code. Enable it and specify which linting rules to apply in the configuration file's `linter` key. +Learn more about linting configuration on the [linting concepts page](../concepts/linter.md). ### Debug mode diff --git a/mkdocs.yml b/mkdocs.yml index 8b68731325..fbac53aa99 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,6 +32,7 @@ nav: - SQLMesh tools: - guides/ui.md - guides/tablediff.md + - concepts/linter.md - guides/observer.md - Concepts: - concepts/overview.md