[MS-408] Fixing the lack of tokenization once the project is refreshed #684
Conversation
…okenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.
|
Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO. |
The reason the Config Repository is being substituted for the Config Manager is simple - avoiding the circular dependency issues. I'm also not in favor of how things are currently being done, but the suggested solution tokenizes all records once the project is refreshed in 100% of cases. When the ConfigManager was removed a couple of months ago, the tokenization part was left out, and the unencrypted records were not tokenized once the tokenization key became available. Ideally, I would like us to refactor the hierarchies in the future to make them consistent across the project, but this needs to be its own separate epic. For now, I find this solution a good compromise. Feel free to share any concerns and we will address them |
|
@luhmirin-s @BurningAXE @meladRaouf @alex-vt |
Yes, getting the architecture right is a little tricky in this case but I still think we can achieve it without rerouting all config access. For example we can have the ConfigRepo send a broadcast or indirectly launch a service when configuration is refreshed. I'm not insisting we change this but to me it feels wrong to change all config access to accommodate a rare edge case. |
Unfortunately, it is not an edge case. The tokenization of previously unencrypted records was removed entirely from SID. The only place where the The tokenization was initially added in this commit as a part of the CORE-2502 epic. To provide a better understanding of the changes in this PR it may be useful to take a look at the chronological order of the changes to the tokenization and
|
|
What confuses me is that we now have a repo and manager with identical API. On top of that - it wouldn't be my first thought to add "sync" module if I needed to access project config in some new feature. I am not sure how to improve it at the moment, bit it made me to do a double take while reviewing the code. |
| // Store Firebase token so it can be used by ConfigManager | ||
| authStore.storeFirebaseToken(token) | ||
| configRepository.refreshProject(projectId) | ||
| configManager.refreshProject(projectId) |
There was a problem hiding this comment.
Wouldn't it be possible to simply add the tokenizeExistingRecords(project) here similar to the DebugFragment instead of adding back the whole module?
There was a problem hiding this comment.
Unfortunately, it will not be enough due to refreshProject being called from the ConfigManager::getProjectConfiguration (previously ConfigRepository::getProjectConfiguration). The ::getProjectConfiguration is called in 100+ places in the project, and we need to ensure that the untokenized records are encrypted any time the project is successfully refreshed.

That's the reason I've started a slack on thread regarding the dependency hierarchy for data-soruces, repos, managers etc. Ideally, all top-level data access should go through the managers who can handle both the data access and the data logic (i.e. refresh if not found, etc.) but this is something we should refactoring later on its own. Yes, that might lead to a bit of delegation duplication (when |
I was thinking about the bug that prompted this PR (the first session after login being untokenized) but did not consider the upgrade case. Am I getting it right that with this solution we bet on the configuration being updated (and consequently records being tokenized) before they are uploaded but that is not guaranteed? |
Sorry, I'm confused with the "bet on the configuration being updated" part. In general, this PR reverts the state of the tokenization logic to what it used to be prior the removal of the |
If I got it right even with the fix in this PR the following scenario is possible:
|
1 similar comment
If I got it right even with the fix in this PR the following scenario is possible:
|
Long story short: that won't happen as there are guardrails preventing this exact scenario.
At this point the
You can see this in the BFSID does not treat them as tokenized
Fortunately, this is not happening because the tokenized fields are always explicitly specified. |
…er-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module
cbd7ab9 to
85a1689
Compare
|



Adding back the Config Manager. Its main responsibility is tokenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.
The
ConfigManagerwas removed in this PR, but the tokenization during the project refresh was overlooked. This leads to the issue when the untokenized data isn't tokenized once the project is refreshed. This PR addresses this issue