Skip to content

MS-1135 Combine per-modality duplication code#1466

Merged
luhmirin-s merged 6 commits into
mainfrom
feature/MS-1135-combine-per-modality-duplication
Nov 20, 2025
Merged

MS-1135 Combine per-modality duplication code#1466
luhmirin-s merged 6 commits into
mainfrom
feature/MS-1135-combine-per-modality-duplication

Conversation

@luhmirin-s
Copy link
Copy Markdown
Contributor

JIRA ticket
Will be released in: 2026.1.0

Notable changes

  • Combine most of the instances of parallel "face/fingerprint" parameters.
  • Implement a way to use the SDK configuration in a more generic way.
  • Simplify the local record repository interface to have a single method.

Testing guidance

  • Run all kinds of regular flows. Everything should works exactly as before.

Additional work checklist

  • Effect on other features and security has been considered
  • Design document marked as "In development" (if applicable)
  • External (Gitbook) and internal (Confluence) Documentation is up to date (or ticket created)
  • Test cases in Testiny are up to date (or ticket created)
  • Other teams notified about the changes (if applicable)

@cla-bot cla-bot Bot added the ... label Nov 13, 2025
@luhmirin-s luhmirin-s changed the title Feature/ms 1135 combine per modality duplication MS-1135 Combine per-modality duplication code Nov 13, 2025
@luhmirin-s luhmirin-s requested review from a team, BurningAXE, TristramN, alex-vt, alexandr-simprints, meladRaouf and ybourgery and removed request for a team November 13, 2025 15:15
@luhmirin-s luhmirin-s marked this pull request as ready for review November 13, 2025 15:15
Copy link
Copy Markdown
Contributor

@BurningAXE BurningAXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Much simpler this way!

val probeReferenceId: String?,
val faceSamples: List<CaptureSample>,
val fingerprintSamples: List<CaptureSample>,
val samples: Map<Modality, List<CaptureSample>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we had a dedicated model ModalitySamples with explicit properties instead of a Map?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would lose the ability to look up by modality key, which is used in several places.
I cannot think of a benefit other than a shorter declaration.

JsonSubTypes.Type(value = BiometricDataSource.CommCare::class, name = "BiometricDataSource.CommCare"),
JsonSubTypes.Type(value = BiometricDataSource.Simprints::class, name = "BiometricDataSource.Simprints"),
JsonSubTypes.Type(value = SubjectQuery::class, name = "SubjectQuery"),
JsonSubTypes.Type(value = AgeGroup::class, name = "AgeGroup"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, changes in this file seem suspicious - were the added ones missed previously? And the deleted ones unneeded?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was done in parallel with similar fixes in the release branch and later main, so it is messy. IIRC, all remaining records are required to pass the cache integration tests.

* Combines all of the matching results per SDK and returns up to [maxNbOfReturnedCandidates] results from the SDK with
* the highest overall score in descending order. Credential matches take precedence over direct matches.
*
* If there are any matches of [AppMatchConfidence.HIGH], only those will be returned,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of = above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppMatchConfidence is the enum of confidence bands, there is no band above HIGH.

// require format to be set for biometric templates query
val format = query.fingerprintSampleFormat ?: query.faceSampleFormat
require(format != null) {
require(query.format != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here but also in lots of other places - it was more work to maintain hardcoded modalities than to have it abstracted!

samples = (
dbSubject.faceSamples.map { it.toDomain() } +
dbSubject.fingerprintSamples.map { it.toDomain() }
).filter { it.format == query.format },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move the filter last? Not that it really matters in practice but it's less efficient this way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, don't remember. Most likely just to do it once on the combined domain samples list.

@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-simplifying-module-api branch from 59b38bf to 1650d12 Compare November 19, 2025 08:44
@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-combine-per-modality-duplication branch from 1466cce to 6460be2 Compare November 19, 2025 09:05
@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-simplifying-module-api branch from 1650d12 to b303c4d Compare November 19, 2025 09:14
@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-combine-per-modality-duplication branch 2 times, most recently from 9648303 to 3363a56 Compare November 19, 2025 09:26
scope.launch(dispatcher) {
ranges
.map { range ->
async { semaphore.withPermit { channel.send(load(range)) } }
Copy link
Copy Markdown
Contributor

@alexandr-simprints alexandr-simprints Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation, the semaphore is acquired first, and then then potentially blocks on channel.send().

If we have more ranges than available processor slots (i.e., 10 ranges and 4 permits), then all 10 async tasks are launched immediately. The first 4 acquire semaphore permits, and if the channel buffer fills, they block on send() while holding the permission. It seems to me that this defeats the purpose of the semaphore for concurrency control

Maybe we should change it to this:

async {
    val result = semaphore.withPermit { 
        // Only hold semaphore during actual work
        load(range)
    }
    // Release semaphore
    channel.send(result) 
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the issue - you'd have 10 coroutines launched "simultaneously". 4 of them would acquire semaphores, load and then send (the Channel also has capacity of 4), then those 4 semaphores would be released and taken by next 4 coroutines. Am I missing anything?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From memory, channel.send() can suspend when the buffer is full. If it suspends, it is still holding the semaphore permit (since it's inside withPermit).

Looking at the 10 ranges and 4 semaphore permits scenario, with channel capacity of 4:

  1. First 4 coroutines acquire permits, load, then hit channel.send() - buffer fills
  2. Those 4 are now blocked on send() while holding their permits
  3. Next 4 coroutines get permits, load, and also blocks on send()
  4. Now all permits are held by coroutines waiting on channel I/O, not doing actual work

That's why it seems that it's better for semaphore to surround only the expensive load() operation, and not the channel 'communication':

async {
    val result = semaphore.withPermit { load(range) }
    channel.send(result)  // Can block, but permit already released
}

With this modification, the permits are released immediately after loading, and it allows the next batch to start work (rather than being stuck behind channel operations).

HOWEVER, according to @luhmirin-s:

While this shows up as "added" code, it is exactly the same as the pre-existing loadIdentities methods. The only change from my side is the renaming and duplication removal.

So it's up to you to change the code or keep it this way.

@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-combine-per-modality-duplication branch from 3363a56 to e4f718a Compare November 20, 2025 08:11
@luhmirin-s luhmirin-s changed the base branch from feature/MS-1135-simplifying-module-api to main November 20, 2025 08:12
@luhmirin-s luhmirin-s force-pushed the feature/MS-1135-combine-per-modality-duplication branch from e4f718a to f628d6d Compare November 20, 2025 08:24
@sonarqubecloud
Copy link
Copy Markdown

@luhmirin-s luhmirin-s merged commit ab02f16 into main Nov 20, 2025
13 checks passed
@luhmirin-s luhmirin-s deleted the feature/MS-1135-combine-per-modality-duplication branch November 20, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants