The separate system declarations are a potential burden for users. A good middle-ground between code analysis and what we have now is to co-locate the declarations and the code.
The two implementation methods I can think of for the POC are as follows:
- A very python-specific implementation where we ingest the python code, extract the docstrings and then extract the system declarations from there
- A major issue here is that this is not generalizable to other languages
- We go for a more general approach, and treat each source code file as a
txt file. We then use regex to look for matching cases and attempt to load it into a system declaration
- Because we would still expect it to be yaml-like, this would only work in languages with multi-line comments
Option 1: Declaration inside of the docstring
def some_func(some_parameter: str) -> None:
"""
Do something important with user data.
system:
- fides_key: demo_analytics_system
name: Demo Analytics System
description: A system used for analyzing customer behaviour.
system_type: Service
privacy_declarations:
- name: Analyze customer behaviour for improvements.
data_categories:
- user.provided.identifiable.contact
- user.derived.identifiable.device.cookie_id
data_use: improve.system
data_subjects:
- customer
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
dataset_references:
- demo_users_dataset
""""
user_data = get_user_data(some_parameter)
advertise_to(user_data)
Option 2: Declaration as a multi-line comment:
"""
system:
- fides_key: demo_analytics_system
name: Demo Analytics System
description: A system used for analyzing customer behaviour.
system_type: Service
privacy_declarations:
- name: Analyze customer behaviour for improvements.
data_categories:
- user.provided.identifiable.contact
- user.derived.identifiable.device.cookie_id
data_use: improve.system
data_subjects:
- customer
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
dataset_references:
- demo_users_dataset
"""
def some_func(some_parameter: str) -> None:
"""
Do something important with user data.
"""
user_data = get_user_data(some_parameter)
advertise_to(user_data)
An additional caveat here is that it would be extremely difficult if not impossible for a plugin to help with these annotations, as they're embedded in other source code.
Additional questions to think about:
- Do we have the user define a system in a system.yaml file, and then attribute all of the nearby code declarations to that?
- Do they need to define a system-per-declaration? that seems weird, so this ^ option seems better
- How should this be handled during evaluations? Should it be done at apply/evaluate time, or should there be a separate command that generates a full system.yaml file from the source code declarations?
The separate system declarations are a potential burden for users. A good middle-ground between code analysis and what we have now is to co-locate the declarations and the code.
The two implementation methods I can think of for the POC are as follows:
txtfile. We then use regex to look for matching cases and attempt to load it into a system declarationOption 1: Declaration inside of the docstring
Option 2: Declaration as a multi-line comment:
An additional caveat here is that it would be extremely difficult if not impossible for a plugin to help with these annotations, as they're embedded in other source code.
Additional questions to think about: