Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Conversation

@adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Aug 26, 2022

Purpose

Make our strategies more extensible by using a unified strategy factory that allows for decorator-based registration.

Before this change, many of our strategies (with the exception of MaskingStrategys, which had been refactored in #560) were registered by means of a hardcoded enums in the core fidesops codebase. If a developer wanted to implement their own strategy, it required an update to the core fidesops codebase.

With this change, developers outside of core fidesops can implement their own strategy (whether that's an AuthenticationStrategy, MaskingStrategy, PaginationStrategy, or PostProcessorStrategy) and leverage it in the system by simply including the @register decorator on their strategy class, and ensuring it has the expected class variables and constructor as specified in the Strategy abstract base class. As an example:

from fidesops.ops.schemas.saas.strategy_configuration import StrategyConfiguration
from fidesops.ops.service.pagination.pagination_strategy import PaginationStrategy
from fidesops.ops.service.strategy_factory import register

class SomeStrategyConfiguration(StrategyConfiguration):
            some_key: str = "default value"

@register
class SomePaginationStrategy(PaginationStrategy):
    name = "some strategy"
    configuration_model = SomeStrategyConfiguration

    def __init__(self, configuration: SomeStrategyConfiguration):
        self.some_config = configuration.some_key
        super().__init__(configuration)

Additionally, new abstract Strategy subtypes (e.g. a PreProcessorStrategy module, if the need arises) can be defined and hooked into this framework simply by subclassing the newly added Strategy abstract base class. A new factory does not need to be defined.

Changes

  • a unified strategy factory that provides a register class decorator that can be used as a hook to dynamically register Strategy implementations
    • remove existing *StrategyFactory classes as they are no longer needed
  • a Strategy abstract base class that provides the general specification for what's needed in a Strategy implementation, which at this point is just:
    • class variables name and configuration_model which define the Strategy implementations name to be registered under, and pydantic model configuration class to use, respectively
    • a constructor that takes an argument whose type is the pydantic model configuration class specified by the configuration_model class variable
  • updates to existing Strategy implementations so that they are compatible with this new framework. there should be no user-facing effects of these changes, just rewiring on the backend.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #562

…act Strategy class with class variables to declare strategy name and configuration model class
@adamsachs adamsachs requested a review from galvana August 26, 2022 12:49
@adamsachs adamsachs self-assigned this Aug 26, 2022
@adamsachs adamsachs added the run unsafe ci checks Triggers running of unsafe CI checks label Aug 26, 2022
strategy = MaskingStrategyFactory.get_strategy(
masking_strategy.strategy, masking_strategy.configuration
masking_strategy_spec = request.masking_strategy
masking_strategy: MaskingStrategy = strategy( # type: ignore
Copy link
Contributor Author

@adamsachs adamsachs Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like having to suppress the type checker here, but i couldn't find a better way to get around this. by using a single, global factory, we lose the ability to strongly type the return type of the strategy getter -- since the factory only knows about the abstract Strategy class, it must return a Strategy here and not a more specific subclass. i couldn't find a clean way for the calling ("client") code to specify what subtype of Strategy it was expecting back.

also noting that in previous iterations discussed with @PSalant726, i'd had an approach where there was not a single global factory, but instead an abstract factory with generics, and then that factory was subclassed by each corresponding Strategy subclass (e.g. MaskingStrategyFactory, PaginationStrategyFactory). the subclassed factory just provided a specific type to the generic that was declared in the abstract factory. i believe that approach would allow us to strongly type the return types here, as we'd be specifically retrieving a strategy from (in this case) the MaskingStrategyFactory (rather than a global strategy factory). but we decided against that approach, since it led to bloat and a somewhat confusing implementation of StrategyFactory subclasses that did not provide any additional functionality beyond typing. for more information, e6416f4 shows what this approach would generally look like

… is no longer needed with updated pylint version
@adamsachs adamsachs requested a review from PSalant726 August 26, 2022 13:16
@adamsachs
Copy link
Contributor Author

@ethyca/docs-authors i updated the Extensibility section of the "masking strategies" guide page to incorporate the changes in this PR - would appreciate your review on those changes!

i don't think these changes affect any other documentation - i don't believe we've yet documented how a developer can "implement their own" AuthencationStrategy, PaginationStrategy, or PostProcessorStrategy. This change does make it significantly easier for a developer to do so outside of the core fidesops repo, so I wonder if it is worth us creating a backlog ticket to provide some documented guidance on how a developer can do that (similar to what we have in the Extensibility section of the masking strategies page). I don't think it's a requirement for this PR, though, since at this point it's still likely going to be internal developers who are creating more strategies, and that knowledge can be shared appropriately internally.

@adamsachs adamsachs requested a review from pattisdr August 26, 2022 15:38
@adamsachs adamsachs marked this pull request as ready for review August 26, 2022 15:38
Copy link
Contributor

@conceptualshark conceptualshark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good with the docs updates - agree that a new ticket for docs to cover some of the ways this can be used/extended is a good idea, if you haven't made it yet!

@adamsachs
Copy link
Contributor Author

Good with the docs updates - agree that a new ticket for docs to cover some of the ways this can be used/extended is a good idea, if you haven't made it yet!

created #1169 to track this doc request. no real priority on it afaik

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pausing review here, there are issues in bringing up the webserver itself with this change and generally I favor some of the alternatives you mentioned in the original issue. I also don't know if all of these strategies should be lumped together.

ConnectorTemplate,
create_connection_config_from_template_no_save,
create_dataset_config_from_template,
load_registry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_registry runs when we bring up the webserver - this now gives you a lot of validation errors:

webserver_1         | Traceback (most recent call last):
webserver_1         |   File "/usr/local/bin/fidesops", line 8, in <module>
webserver_1         |     sys.exit(cli())
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
webserver_1         |     return self.main(*args, **kwargs)
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
webserver_1         |     rv = self.invoke(ctx)
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
webserver_1         |     return _process_result(sub_ctx.command.invoke(sub_ctx))
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
webserver_1         |     return ctx.invoke(self.callback, **ctx.params)
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
webserver_1         |     return __callback(*args, **kwargs)
webserver_1         |   File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
webserver_1         |     return f(get_current_context(), *args, **kwargs)
webserver_1         |   File "/fidesops/src/fidesops/ops/cli.py", line 25, in webserver
webserver_1         |     start_webserver()
webserver_1         |   File "/fidesops/src/fidesops/main.py", line 234, in start_webserver
webserver_1         |     load_registry(registry_file)
webserver_1         |   File "/fidesops/src/fidesops/ops/service/connectors/saas/connector_registry_service.py", line 133, in load_registry
webserver_1         |     _registry = ConnectorRegistry.parse_obj(load_toml([config_file]))
webserver_1         |   File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj
webserver_1         |   File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
webserver_1         | pydantic.error_wrappers.ValidationError: 29 validation errors for ConnectorRegistry
webserver_1         | __root__ -> datadog -> config -> endpoints -> 0 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> hubspot -> config -> endpoints -> 0 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> hubspot -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> hubspot -> config -> endpoints -> 3 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> mailchimp -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'offset' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> outreach -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> segment -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> segment -> config -> endpoints -> 2 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> segment -> config -> endpoints -> 3 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> sentry -> config -> endpoints -> 0 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> sentry -> config -> endpoints -> 2 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> sentry -> config -> endpoints -> 3 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> sentry -> config -> endpoints -> 4 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> sentry -> config -> endpoints -> 5 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 2 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 3 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 4 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 5 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 6 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 7 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 8 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 9 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 10 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 11 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> stripe -> config -> endpoints -> 12 -> requests -> read -> __root__
webserver_1         |   Strategy 'cursor' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> zendesk -> config -> endpoints -> 1 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> zendesk -> config -> endpoints -> 2 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
webserver_1         | __root__ -> zendesk -> config -> endpoints -> 3 -> requests -> read -> __root__
webserver_1         |   Strategy 'link' does not exist. Valid strategies are [aes_encrypt, hash, hmac, null_rewrite, random_string_rewrite, string_rewrite, oauth2] (type=value_error.nosuchstrategyexception)
fidesops_webserver_1 exited with code 1
mongodb_example_1   | {"t":{"$date":"2022-08-29T16:19:07.403+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1661789947:403539][1:0xffff7d17ad00], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 36, snapshot max: 36 snapshot count: 

strategy_factory = StrategyFactory()
register = strategy_factory.register()
strategy = strategy_factory.strategy
strategies = strategy_factory.strategies
Copy link
Contributor

@pattisdr pattisdr Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you're probably aware, strategies are only available here if they have been imported into a file that was already loaded prior to calling this function.  I would expect "strategies" to include all classes marked by the "register" decorator. 

As an example, if you open up a fidesops shell, start python and import the strategies callable, the strategies are empty. 

Another example, if you look at what strategies are available in the strategies() call in tests/ops/api/v1/endpoints/test_masking_endpoints.py::TestGetMaskingStrategies::test_read_strategies, it shows 7 while it looks like you have about 15 decorated.  These seven come from strategies imported in application fixtures and in src/fidesops/ops/service/masking/__init__.py.

At minimum, I might import the strategies in the individual init files, similar to what was done in fidesops/ops/service/masking/init.py, (so add imports to fidesops/ops/service/processors/post_processor_strategy/init.py, fidesops/ops/service/pagination/init.py, fidesops/ops/service/authentication/init.py, etc.). However, this still seems brittle.


def get_strategy_name(self) -> str:
return STRATEGY_NAME
super().__init__(configuration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it feels like we're trying to force all of these disparate strategies to all be registered in the same way, when they are distinct types of objects. Their individual configurations are hardly related and I think it would cause confusion for someone trying to create their own strategy.

I might recommend keeping them separate, and allowing a user to creating their own "masking strategy", "pagination strategy", "post processor strategy", or "authentication strategy".

Comment on lines +14 to 15
# this constant is kept around because it is referenced in generated alembic migrations
STRING_REWRITE_STRATEGY_NAME = "string_rewrite"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of importing this into that migration then, I might just move this string directly into the migration.

Comment on lines +18 to +42
def register(self) -> Callable[[Type[Strategy]], Type[Strategy]]:
def wrapper(
strategy_class: Type[Strategy],
) -> Type[Strategy]:

_validate_strategy_class(strategy_class)

name = strategy_class.name
logger.debug(
("Registering new strategy '%s' under name '%s'", strategy_class, name)
)

if name in self.registry:
logger.warning(
(
"Strategy with name '%s' already exists. It previously referred to class '%s', but will now refer to '%s'",
name,
self.registry[name],
strategy_class,
)
)

self.registry[name] = strategy_class
self.valid_strategies = ", ".join(self.registry.keys())
return self.registry[name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this register decorator, you might explore Metaclasses, or the __subclasses__ method that returns all child classes. I see you mentioned these as alternatives in the original issue. I might lean on something more built-in, the register decorator adds an extra step and makes me think on first glance that it's magically going to add the strategy to the registry with its usage alone, when the strategy still needs to be imported to be registered.

Comment on lines +89 to +90
strategy = strategy_factory.strategy
strategies = strategy_factory.strategies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel very Pythonic to me, but if we're exposing it, I'd at least show they are callable types, and rename to something like get_strategy and get_strategies.

@adamsachs
Copy link
Contributor Author

closing this PR, as we went with a different approach and opened up #1254 instead.

@adamsachs adamsachs closed this Sep 2, 2022
@galvana galvana deleted the 562-consider-refactoring-factories-for-increased-extensibility-merge branch September 23, 2022 22:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Needs doc review run unsafe ci checks Triggers running of unsafe CI checks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider refactoring factories for increased extensibility

4 participants