[MS-408] Fixing the lack of tokenization once the project is refreshed by alexandr-simprints · Pull Request #684 · Simprints/Android-Simprints-ID

alexandr-simprints · 2024-04-18T15:16:41Z

Adding back the Config Manager. Its main responsibility is tokenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.

The ConfigManager was removed in this PR, but the tokenization during the project refresh was overlooked. This leads to the issue when the untokenized data isn't tokenized once the project is refreshed. This PR addresses this issue

…okenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.

…nfigRepository

BurningAXE · 2024-04-22T11:16:17Z

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

alexandr-simprints · 2024-04-22T12:04:00Z

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

The reason the Config Repository is being substituted for the Config Manager is simple - avoiding the circular dependency issues.

I'm also not in favor of how things are currently being done, but the suggested solution tokenizes all records once the project is refreshed in 100% of cases. When the ConfigManager was removed a couple of months ago, the tokenization part was left out, and the unencrypted records were not tokenized once the tokenization key became available.

Ideally, I would like us to refactor the hierarchies in the future to make them consistent across the project, but this needs to be its own separate epic. For now, I find this solution a good compromise. Feel free to share any concerns and we will address them

alexandr-simprints · 2024-04-22T12:07:45Z

@luhmirin-s @BurningAXE @meladRaouf @alex-vt
To save you some time reviewing the 100+ files in the PR I'm asking you to take a closer look at only two major changes:

The ConfigManager is added (which was previously removed). Link. Its purpose is to tokenize the unencrypted records every time the project is refreshed.
The ConfigRepository which functionality was simplified in order not to duplicate the ConfigManager functionality. Link

BurningAXE · 2024-04-22T12:29:28Z

Can we have a ConfigManager that takes care of tokenizing untokenized data once the key is available but not route all config requests through it? It looks like we are replacing a repository with a manager which is not the right direction IMO.

The reason the Config Repository is being substituted for the Config Manager is simple - avoiding the circular dependency issues.

I'm also not in favor of how things are currently being done, but the suggested solution tokenizes all records once the project is refreshed in 100% of cases. When the ConfigManager was removed a couple of months ago, the tokenization part was left out, and the unencrypted records were not tokenized once the tokenization key became available.

Ideally, I would like us to refactor the hierarchies in the future to make them consistent across the project, but this needs to be its own separate epic. For now, I find this solution a good compromise. Feel free to share any concerns and we will address them

Yes, getting the architecture right is a little tricky in this case but I still think we can achieve it without rerouting all config access. For example we can have the ConfigRepo send a broadcast or indirectly launch a service when configuration is refreshed. I'm not insisting we change this but to me it feels wrong to change all config access to accommodate a rare edge case.

alexandr-simprints · 2024-04-22T13:08:47Z

Yes, getting the architecture right is a little tricky in this case but I still think we can achieve it without rerouting all config access. For example we can have the ConfigRepo send a broadcast or indirectly launch a service when configuration is refreshed. I'm not insisting we change this but to me it feels wrong to change all config access to accommodate a rare edge case.

Unfortunately, it is not an edge case. The tokenization of previously unencrypted records was removed entirely from SID. The only place where the EnrolmentRecordRepository::tokenizeExistingRecords is currently invoked is the DebugFragment. This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project. Once the keys become available (i.e., project refresh), we assume all previous records become tokenized.

The tokenization was initially added in this commit as a part of the CORE-2502 epic. To provide a better understanding of the changes in this PR it may be useful to take a look at the chronological order of the changes to the tokenization and ConfigManager:

November: ConfigManager utilizes the ConfigRepository and EnrolmentRecordRepository, and is now responsible for the tokenization of the existing records. Link.
January: The ConfigManager is removed in favor of ConfigRepository. ConfigRepostiory is now responsible for all of the manager's functionality, but the tokenization part is missing. Link
Now: ConfigManager is restored since without it, it becomes impossible to delegate the tokenization without having the circular dependency issue between the app's modules.

luhmirin-s · 2024-04-22T13:30:50Z

What confuses me is that we now have a repo and manager with identical API. On top of that - it wouldn't be my first thought to add "sync" module if I needed to access project config in some new feature. I am not sure how to improve it at the moment, bit it made me to do a double take while reviewing the code.

luhmirin-s · 2024-04-22T13:33:14Z

            // Store Firebase token so it can be used by ConfigManager
            authStore.storeFirebaseToken(token)
-            configRepository.refreshProject(projectId)
+            configManager.refreshProject(projectId)


Wouldn't it be possible to simply add the tokenizeExistingRecords(project) here similar to the DebugFragment instead of adding back the whole module?

Unfortunately, it will not be enough due to refreshProject being called from the ConfigManager::getProjectConfiguration (previously ConfigRepository::getProjectConfiguration). The ::getProjectConfiguration is called in 100+ places in the project, and we need to ensure that the untokenized records are encrypted any time the project is successfully refreshed.

alexandr-simprints · 2024-04-22T13:54:47Z

What confuses me is that we now have a repo and manager with identical API. On top of that - it wouldn't be my first thought to add "sync" module if I needed to access project config in some new feature. I am not sure how to improve it at the moment, bit it made me to do a double take while reviewing the code.

That's the reason I've started a slack on thread regarding the dependency hierarchy for data-soruces, repos, managers etc. Ideally, all top-level data access should go through the managers who can handle both the data access and the data logic (i.e. refresh if not found, etc.) but this is something we should refactoring later on its own. Yes, that might lead to a bit of delegation duplication (when manager.something() only invokes repo.something()) what we partially see here, but is a tradeoff for the managers' flexibility.

BurningAXE · 2024-04-22T14:24:11Z

This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project.

I was thinking about the bug that prompted this PR (the first session after login being untokenized) but did not consider the upgrade case. Am I getting it right that with this solution we bet on the configuration being updated (and consequently records being tokenized) before they are uploaded but that is not guaranteed?

alexandr-simprints · 2024-04-23T04:32:07Z

This means that when any device updates from the SID version that doesn't have the tokenization functionality, it will send the corrupted data to the BFSID since we treat the records as tokenized if the tokenization keys are available in the project.

I was thinking about the bug that prompted this PR (the first session after login being untokenized) but did not consider the upgrade case. Am I getting it right that with this solution we bet on the configuration being updated (and consequently records being tokenized) before they are uploaded but that is not guaranteed?

Sorry, I'm confused with the "bet on the configuration being updated" part.

In general, this PR reverts the state of the tokenization logic to what it used to be prior the removal of the ConfigManager. Every time the project is refreshed, the previously unencrypted records must be tokenized.

BurningAXE · 2024-04-23T10:54:16Z

Sorry, I'm confused with the "bet on the configuration being updated" part.

If I got it right even with the fix in this PR the following scenario is possible:

Old version of SID has unsynced (and untokenized) recrods
SID is upgraded to 2024.1.0
SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)
SID updates configuration and tries to tokenize existing records but it's already too late

BurningAXE · 2024-04-23T10:54:23Z

Sorry, I'm confused with the "bet on the configuration being updated" part.

If I got it right even with the fix in this PR the following scenario is possible:

Old version of SID has unsynced (and untokenized) recrods
SID is upgraded to 2024.1.0
SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)
SID updates configuration and tries to tokenize existing records but it's already too late

alexandr-simprints · 2024-04-24T04:22:58Z

If I got it right even with the fix in this PR the following scenario is possible:

Old version of SID has unsynced (and untokenized) recrods

SID is upgraded to 2024.1.0

SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)

SID updates configuration and tries to tokenize existing records but it's already too late

Long story short: that won't happen as there are guardrails preventing this exact scenario.

Old version of SID has unsynced (and untokenized) records

At this point the userId, attendantId and moduleId are saved as String in the database

SID is upgraded to 2024.1.0

EventMigration9to10 is launched and the above mentioned fields are mapped from String to Tokenized.Raw

SID upsyncs the unsynced and untokenized records (which BFSID treats as tokenized)

ApiEventclass has the val tokenizedFields: List<String> parameter that explicitly specifies what fields in the val payload: ApiEventPayload are tokenized (encrypted). If the all the fields in the ApiEventPayload are unencrypted as in the case above, then the tokenizedFields will be empty.

You can see this in the ApiEvent.kt file

internal fun Event.fromDomainToApi(): ApiEvent {
    // Key Types are just references to the attendantId, userId, moduleId field.
    // Taking them only if they are Tokenized
    val tokenizedKeyTypes =
        getTokenizedFields().filter { it.value is TokenizableString.Tokenized }.keys.toList()

    [...]

    // Tokenized Fields list is only populated with the payload field references that are encrypted
    val tokenizedFields = tokenizedKeyTypes.mapNotNull(apiPayload::getTokenizedFieldJsonPath)

    [...]
}

BFSID does not treat them as tokenized

SID updates configuration and tries to tokenize existing records but it's already too late

Fortunately, this is not happening because the tokenized fields are always explicitly specified.

…er-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module

sonarqubecloud · 2024-05-09T18:51:40Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
81.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

alexandr-simprints added 2 commits April 18, 2024 18:38

[MS-408] Adding back the Config Manager. Its main responsibility is t…

c41feb0

…okenization of all the sensitive data once the project configuration is refreshed. It is necessary so that no data remains untokenized once the tokenization keys become available.

[MS-405] Merging with release/2024.1.0

4dbff95

alexandr-simprints requested review from BurningAXE, alex-vt, luhmirin-s and meladRaouf April 18, 2024 15:16

cla-bot Bot added the ... label Apr 18, 2024

alexandr-simprints added 3 commits April 22, 2024 14:39

[MS-405] Refactoring tests to utilize ConfigManager instead of the Co…

d2aedbd

…nfigRepository

[MS-405] Fixing ConfigRepositoryImplTest

288c14b

[MS-405] Fixing tests

93d3edd

alexandr-simprints added 2 commits April 22, 2024 15:37

[MS-405] Fixing dashboard tests

01b4ef8

[MS-405] Adding tests for the ConfigManager

2e63d30

alexandr-simprints marked this pull request as ready for review April 22, 2024 12:01

[MS-405] Adding config-sync module to the pr-checks.yml

f268ec1

luhmirin-s reviewed Apr 22, 2024

View reviewed changes

luhmirin-s approved these changes Apr 22, 2024

View reviewed changes

BurningAXE and others added 2 commits May 9, 2024 20:47

Merge branch 'release/2024.1.0' into MS-408-enroll-not-affected-to-us…

a26af97

…er-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module

Resolve ConfigManager conflicts

85a1689

BurningAXE force-pushed the MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module branch from cbd7ab9 to 85a1689 Compare May 9, 2024 18:40

alexandr-simprints merged commit 3670b0c into release/2024.1.0 May 13, 2024

alexandr-simprints deleted the MS-408-enroll-not-affected-to-user-nor-module-when-login-before-enroll-in-workflow-with-identification-pool-type-user-or-module branch May 13, 2024 14:16

Conversation

alexandr-simprints commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurningAXE commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandr-simprints commented Apr 22, 2024

Uh oh!

alexandr-simprints commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurningAXE commented Apr 22, 2024

Uh oh!

alexandr-simprints commented Apr 22, 2024

Uh oh!

luhmirin-s commented Apr 22, 2024

Uh oh!

luhmirin-s Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

alexandr-simprints Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

alexandr-simprints commented Apr 22, 2024

Uh oh!

BurningAXE commented Apr 22, 2024

Uh oh!

alexandr-simprints commented Apr 23, 2024

Uh oh!

BurningAXE commented Apr 23, 2024

Uh oh!

BurningAXE commented Apr 23, 2024

Uh oh!

alexandr-simprints commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 9, 2024

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexandr-simprints commented Apr 18, 2024 •

edited

Loading

BurningAXE commented Apr 22, 2024 •

edited

Loading

alexandr-simprints commented Apr 22, 2024 •

edited

Loading

alexandr-simprints commented Apr 24, 2024 •

edited

Loading