Skip to content

Prototype loading privacy declarations directly from source code #209

@ThomasLaPiana

Description

@ThomasLaPiana

The separate system declarations are a potential burden for users. A good middle-ground between code analysis and what we have now is to co-locate the declarations and the code.

The two implementation methods I can think of for the POC are as follows:

  1. A very python-specific implementation where we ingest the python code, extract the docstrings and then extract the system declarations from there
    • A major issue here is that this is not generalizable to other languages
  2. We go for a more general approach, and treat each source code file as a txt file. We then use regex to look for matching cases and attempt to load it into a system declaration
    • Because we would still expect it to be yaml-like, this would only work in languages with multi-line comments

Option 1: Declaration inside of the docstring

def some_func(some_parameter: str) -> None:
    """
    Do something important with user data.

    system:
      - fides_key: demo_analytics_system
        name: Demo Analytics System
        description: A system used for analyzing customer behaviour.
        system_type: Service
        privacy_declarations:
          - name: Analyze customer behaviour for improvements.
            data_categories:
              - user.provided.identifiable.contact
              - user.derived.identifiable.device.cookie_id
            data_use: improve.system
            data_subjects:
              - customer
            data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
            dataset_references:
              - demo_users_dataset
    """"

    user_data = get_user_data(some_parameter)
    advertise_to(user_data)

Option 2: Declaration as a multi-line comment:

"""
system:
  - fides_key: demo_analytics_system
    name: Demo Analytics System
    description: A system used for analyzing customer behaviour.
    system_type: Service
    privacy_declarations:
      - name: Analyze customer behaviour for improvements.
        data_categories:
          - user.provided.identifiable.contact
          - user.derived.identifiable.device.cookie_id
        data_use: improve.system
        data_subjects:
          - customer
        data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
        dataset_references:
          - demo_users_dataset
"""

def some_func(some_parameter: str) -> None:
    """
    Do something important with user data.
    """

    user_data = get_user_data(some_parameter)
    advertise_to(user_data)

An additional caveat here is that it would be extremely difficult if not impossible for a plugin to help with these annotations, as they're embedded in other source code.

Additional questions to think about:

  • Do we have the user define a system in a system.yaml file, and then attribute all of the nearby code declarations to that?
  • Do they need to define a system-per-declaration? that seems weird, so this ^ option seems better
  • How should this be handled during evaluations? Should it be done at apply/evaluate time, or should there be a separate command that generates a full system.yaml file from the source code declarations?

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionPlease consider using Discussions instead; label use for implementation discussion only

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions