Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Conversation

@pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Jun 10, 2022

❗ Contains migration; verify downrev before merging.

Purpose

Support deleted/disabled collection execution requirements. A lot of this behavior was already true, we're just adding the new concept of a "disabled" collection, and then adding integration tests to confirm the behavior below:

  • New privacy requests: Deleting a collection will prevent it from being inserted into the graph altogether, while disabling a collection will include it in the graph, but skip running any queries, and just return default values downstream.
  • In progress privacy requests: If a privacy request is in progress, the graph has already been built and we are in the middle of executing individual tasks. If a collection is disabled or deleted at this point, we will still attempt to access that collection because we are using collection details in memory.
  • Restarting from failure: A privacy request has failed, and a user is attempting a restart that will try to re-process the failed collection and any remaining collections. Deleting a collection before reprocessing will cause it to be omitted from the new graph, and we won't attempt to connect. Disabling a collection will still cause the collection to be a part of graph, but we will skip it, and just return default values to dependent collections.
  • Completed privacy requests: Disabling or deleting a collection after the privacy request has completed will have no effect. Execution logs on disabled or deleted collections will remain intact.
  • Denied privacy requests: Disabling or deleting a collection after the privacy request has was denied will have no effect. Execution logs on disabled or deleted collections will remain intact.

Changes

  • Adds disabled field to ConnectionConfig (default value of False), as well as disabled_at.
    • disabled_at is automatically set if disabled is updated.
    • This field can be set through the existing PATCH ConnectionConfig endpoint.
  • Adds a skipped status to ExecutionLog.
  • Still make disabled collections part of the graph, but skip them if they are disabled.
    • This is accomplished by adding a skip_if_disabled check to the retry decorator which wraps access and erasure requests. If a collection's associated ConnectionConfig is disabled, created a skipped execution log and return empty data to downstream collections. This way, we still get the logs, and the overall graph doesn't have to change if a collection is disabled, but no queries are run for that collection.
  • Add tests for both deleted and disabled collections for when a collection is deleted/disabled before starting a new request, while a request is in progress, or when we restart a deleted/disabled collection from a failed state.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #602

pattisdr added 11 commits June 8, 2022 17:17
…nlogstatus.skipped.

- Allow setting the disabled field in the PATCH ConnectionConfig endpoint
- Set disabled_at if disabled is updated
- Skip running a collection and return the default value if its associated connectionconfig is disable.
… deleting a collection while the privacy request is in progress.
- Update copy/pasted docstring
- Add test for deleting collection and then restarting from failure.
- Add new connection config disabled key to api response
@pattisdr
Copy link
Contributor Author

@ethyca/docs-authors minor one-line change added to docs about the disabled field for a connectionconfig

@pattisdr pattisdr marked this pull request as ready for review June 13, 2022 14:58
Comment on lines +153 to +163
def integration_mongodb_config(db) -> ConnectionConfig:
connection_config = ConnectionConfig(
key="mongo_example",
connection_type=ConnectionType.mongodb,
access=AccessLevel.write,
secrets=integration_secrets["mongo_example"],
name="mongo_example",
)
connection_config.save(db)
yield connection_config
connection_config.delete(db)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connectionconfig wasn't being saved to the db previously, which was fine, but now I'd like to be able to persist it to test disabling it mid-privacy request, for example, and I'm changing the fixture scope so updates to disabled are reset for another test.

Comment on lines +495 to +505
@pytest.mark.integration
def test_restart_graph_from_failure(
db,
policy,
example_datasets,
integration_postgres_config,
integration_mongodb_config,
mongo_postgres_dataset_graph,
) -> None:
"""Run a failed privacy request and restart from failure"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't new, I've just renamed the file and added more execution-related tests

f"ConnectionConfig {connection_config.key} is disabled.",
)

@retry(action_type=ActionType.access, default_return=[])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have this default_return parameter, and it was removed, but I'm restoring it now because it's handy for when a collection is skipped, we just return the "default_return".

Comment on lines +47 to +48
* `disabled` determines whether the ConnectionConfig is active. If True, we skip running queries for any collection associated with that ConnectionConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 all good from me - thank you for including!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @pattisdr ! I especially appreciate the thorough test additions / edits here, as well as clear naming of methods/vars throughout the PR. Thanks!

)
op.add_column(
"connectionconfig",
sa.Column("disabled_at", sa.DateTime(timezone=True), nullable=True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good foresight to add this disabled_at column

return super().delete(db=db)


@event.listens_for(ConnectionConfig.disabled, "set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this sqlalchemy feature, awesome!

@pattisdr
Copy link
Contributor Author

thanks for the quick turnaround @eastandwestwind!

@eastandwestwind eastandwestwind merged commit 3828b5b into main Jun 13, 2022
@eastandwestwind eastandwestwind deleted the fidesops_602_disable_delete_datastore branch June 13, 2022 19:22
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* WIP Add disabled and disabled_at to connectionconfig and add executionlogstatus.skipped.

- Allow setting the disabled field in the PATCH ConnectionConfig endpoint
- Set disabled_at if disabled is updated
- Skip running a collection and return the default value if its associated connectionconfig is disable.

* Add tests for disabling collections in progress and skipping collections on restart.

* Remove unused var.

* Add tests for deleting a collection before you make a new request and deleting a collection while the privacy request is in progress.

* Move migration.

* - Remove unused property
- Update copy/pasted docstring
- Add test for deleting collection and then restarting from failure.
- Add new connection config disabled key to api response

* Add missing mocked function.

* Update scopes to match.

* Add tests that execution logs are untouched if a connection config is deleted or disabled for completeness.

* Log request id instead of request object.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Datastore Management] Disable/Delete datastore BACKEND

4 participants