Skip to content

[MS-408] Fixing the lack of tokenization once the project is refreshed #684

Merged
alexandr-simprints merged 10 commits into
release/2024.1.0from
MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module
May 13, 2024
Merged

[MS-408] Fixing the lack of tokenization once the project is refreshed #684
alexandr-simprints merged 10 commits into
release/2024.1.0from
MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module

Conversation

@alexandr-simprints
Copy link
Copy Markdown
Contributor

@alexandr-simprints alexandr-simprints commented Apr 18, 2024

Adding back the Config Manager. Its main responsibility is tokenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.

The ConfigManager was removed in this PR, but the tokenization during the project refresh was overlooked. This leads to the issue when the untokenized data isn't tokenized once the project is refreshed. This PR addresses this issue

…okenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.
@BurningAXE
Copy link
Copy Markdown
Contributor

BurningAXE commented Apr 22, 2024

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

@alexandr-simprints alexandr-simprints marked this pull request as ready for review April 22, 2024 12:01
@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

The reason the Config Repository is being substituted for the Config Manager is simple - avoiding the circular dependency issues.

I'm also not in favor of how things are currently being done, but the suggested solution tokenizes all records once the project is refreshed in 100% of cases. When the ConfigManager was removed a couple of months ago, the tokenization part was left out, and the unencrypted records were not tokenized once the tokenization key became available.

Ideally, I would like us to refactor the hierarchies in the future to make them consistent across the project, but this needs to be its own separate epic. For now, I find this solution a good compromise. Feel free to share any concerns and we will address them

@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

alexandr-simprints commented Apr 22, 2024

@luhmirin-s @BurningAXE @meladRaouf @alex-vt
To save you some time reviewing the 100+ files in the PR I'm asking you to take a closer look at only two major changes:

  1. The ConfigManager is added (which was previously removed). Link. Its purpose is to tokenize the unencrypted records every time the project is refreshed.
  2. The ConfigRepository which functionality was simplified in order not to duplicate the ConfigManager functionality. Link

@BurningAXE
Copy link
Copy Markdown
Contributor

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

The reason the Config Repository is being substituted for the Config Manager is simple - avoiding the circular dependency issues.

I'm also not in favor of how things are currently being done, but the suggested solution tokenizes all records once the project is refreshed in 100% of cases. When the ConfigManager was removed a couple of months ago, the tokenization part was left out, and the unencrypted records were not tokenized once the tokenization key became available.

Ideally, I would like us to refactor the hierarchies in the future to make them consistent across the project, but this needs to be its own separate epic. For now, I find this solution a good compromise. Feel free to share any concerns and we will address them

Yes, getting the architecture right is a little tricky in this case but I still think we can achieve it without rerouting all config access. For example we can have the ConfigRepo send a broadcast or indirectly launch a service when configuration is refreshed. I'm not insisting we change this but to me it feels wrong to change all config access to accommodate a rare edge case.

@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

Yes, getting the architecture right is a little tricky in this case but I still think we can achieve it without rerouting all config access. For example we can have the ConfigRepo send a broadcast or indirectly launch a service when configuration is refreshed. I'm not insisting we change this but to me it feels wrong to change all config access to accommodate a rare edge case.

Unfortunately, it is not an edge case. The tokenization of previously unencrypted records was removed entirely from SID. The only place where the EnrolmentRecordRepository::tokenizeExistingRecords is currently invoked is the DebugFragment. This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project. Once the keys become available (i.e., project refresh), we assume all previous records become tokenized.

The tokenization was initially added in this commit as a part of the CORE-2502 epic. To provide a better understanding of the changes in this PR it may be useful to take a look at the chronological order of the changes to the tokenization and ConfigManager:

  • November: ConfigManager utilizes the ConfigRepository and EnrolmentRecordRepository, and is now responsible for the tokenization of the existing records. Link.
  • January: The ConfigManager is removed in favor of ConfigRepository. ConfigRepostiory is now responsible for all of the manager's functionality, but the tokenization part is missing. Link
  • Now: ConfigManager is restored since without it, it becomes impossible to delegate the tokenization without having the circular dependency issue between the app's modules.

@luhmirin-s
Copy link
Copy Markdown
Contributor

What confuses me is that we now have a repo and manager with identical API. On top of that - it wouldn't be my first thought to add "sync" module if I needed to access project config in some new feature. I am not sure how to improve it at the moment, bit it made me to do a double take while reviewing the code.

// Store Firebase token so it can be used by ConfigManager
authStore.storeFirebaseToken(token)
configRepository.refreshProject(projectId)
configManager.refreshProject(projectId)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be possible to simply add the tokenizeExistingRecords(project) here similar to the DebugFragment instead of adding back the whole module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it will not be enough due to refreshProject being called from the ConfigManager::getProjectConfiguration (previously ConfigRepository::getProjectConfiguration). The ::getProjectConfiguration is called in 100+ places in the project, and we need to ensure that the untokenized records are encrypted any time the project is successfully refreshed.
image

@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

What confuses me is that we now have a repo and manager with identical API. On top of that - it wouldn't be my first thought to add "sync" module if I needed to access project config in some new feature. I am not sure how to improve it at the moment, bit it made me to do a double take while reviewing the code.

That's the reason I've started a slack on thread regarding the dependency hierarchy for data-soruces, repos, managers etc. Ideally, all top-level data access should go through the managers who can handle both the data access and the data logic (i.e. refresh if not found, etc.) but this is something we should refactoring later on its own. Yes, that might lead to a bit of delegation duplication (when manager.something() only invokes repo.something()) what we partially see here, but is a tradeoff for the managers' flexibility.

@BurningAXE
Copy link
Copy Markdown
Contributor

This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project.

I was thinking about the bug that prompted this PR (the first session after login being untokenized) but did not consider the upgrade case. Am I getting it right that with this solution we bet on the configuration being updated (and consequently records being tokenized) before they are uploaded but that is not guaranteed?

@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project.

I was thinking about the bug that prompted this PR (the first session after login being untokenized) but did not consider the upgrade case. Am I getting it right that with this solution we bet on the configuration being updated (and consequently records being tokenized) before they are uploaded but that is not guaranteed?

Sorry, I'm confused with the "bet on the configuration being updated" part.

In general, this PR reverts the state of the tokenization logic to what it used to be prior the removal of the ConfigManager. Every time the project is refreshed, the previously unencrypted records must be tokenized.

@BurningAXE
Copy link
Copy Markdown
Contributor

Sorry, I'm confused with the "bet on the configuration being updated" part.

If I got it right even with the fix in this PR the following scenario is possible:

  1. Old version of SID has unsynced (and untokenized) recrods
  2. SID is upgraded to 2024.1.0
  3. SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)
  4. SID updates configuration and tries to tokenize existing records but it's already too late

1 similar comment
@BurningAXE
Copy link
Copy Markdown
Contributor

Sorry, I'm confused with the "bet on the configuration being updated" part.

If I got it right even with the fix in this PR the following scenario is possible:

  1. Old version of SID has unsynced (and untokenized) recrods
  2. SID is upgraded to 2024.1.0
  3. SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)
  4. SID updates configuration and tries to tokenize existing records but it's already too late

@alexandr-simprints
Copy link
Copy Markdown
Contributor Author

alexandr-simprints commented Apr 24, 2024

If I got it right even with the fix in this PR the following scenario is possible:

  1. Old version of SID has unsynced (and untokenized) recrods
  2. SID is upgraded to 2024.1.0
  3. SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)
  4. SID updates configuration and tries to tokenize existing records but it's already too late

Long story short: that won't happen as there are guardrails preventing this exact scenario.

  1. Old version of SID has unsynced (and untokenized) records

At this point the userId, attendantId and moduleId are saved as String in the database

  1. SID is upgraded to 2024.1.0

EventMigration9to10 is launched and the above mentioned fields are mapped from String to Tokenized.Raw

  1. SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)

ApiEventclass has the val tokenizedFields: List<String> parameter that explicitly specifies what fields in the val payload: ApiEventPayload are tokenized (encrypted). If the all the fields in the ApiEventPayload are unencrypted as in the case above, then the tokenizedFields will be empty.

You can see this in the ApiEvent.kt file

internal fun Event.fromDomainToApi(): ApiEvent {
    // Key Types are just references to the attendantId, userId, moduleId field.
    // Taking them only if they are Tokenized
    val tokenizedKeyTypes =
        getTokenizedFields().filter { it.value is TokenizableString.Tokenized }.keys.toList()

    [...]

    // Tokenized Fields list is only populated with the payload field references that are encrypted
    val tokenizedFields = tokenizedKeyTypes.mapNotNull(apiPayload::getTokenizedFieldJsonPath)

    [...]
}

BFSID does not treat them as tokenized

  1. SID updates configuration and tries to tokenize existing records but it's already too late

Fortunately, this is not happening because the tokenized fields are always explicitly specified.

BurningAXE and others added 2 commits May 9, 2024 20:47
…er-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module
@BurningAXE BurningAXE force-pushed the MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module branch from cbd7ab9 to 85a1689 Compare May 9, 2024 18:40
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 9, 2024

@alexandr-simprints alexandr-simprints merged commit 3670b0c into release/2024.1.0 May 13, 2024
@alexandr-simprints alexandr-simprints deleted the MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module branch May 13, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants