MS-1135 Combine per-modality duplication code#1466
Conversation
BurningAXE
left a comment
There was a problem hiding this comment.
Nice! Much simpler this way!
| val probeReferenceId: String?, | ||
| val faceSamples: List<CaptureSample>, | ||
| val fingerprintSamples: List<CaptureSample>, | ||
| val samples: Map<Modality, List<CaptureSample>>, |
There was a problem hiding this comment.
Would it be better if we had a dedicated model ModalitySamples with explicit properties instead of a Map?
There was a problem hiding this comment.
We would lose the ability to look up by modality key, which is used in several places.
I cannot think of a benefit other than a shorter declaration.
| JsonSubTypes.Type(value = BiometricDataSource.CommCare::class, name = "BiometricDataSource.CommCare"), | ||
| JsonSubTypes.Type(value = BiometricDataSource.Simprints::class, name = "BiometricDataSource.Simprints"), | ||
| JsonSubTypes.Type(value = SubjectQuery::class, name = "SubjectQuery"), | ||
| JsonSubTypes.Type(value = AgeGroup::class, name = "AgeGroup"), |
There was a problem hiding this comment.
Hm, changes in this file seem suspicious - were the added ones missed previously? And the deleted ones unneeded?
There was a problem hiding this comment.
This part was done in parallel with similar fixes in the release branch and later main, so it is messy. IIRC, all remaining records are required to pass the cache integration tests.
| * Combines all of the matching results per SDK and returns up to [maxNbOfReturnedCandidates] results from the SDK with | ||
| * the highest overall score in descending order. Credential matches take precedence over direct matches. | ||
| * | ||
| * If there are any matches of [AppMatchConfidence.HIGH], only those will be returned, |
There was a problem hiding this comment.
AppMatchConfidence is the enum of confidence bands, there is no band above HIGH.
| // require format to be set for biometric templates query | ||
| val format = query.fingerprintSampleFormat ?: query.faceSampleFormat | ||
| require(format != null) { | ||
| require(query.format != null) { |
There was a problem hiding this comment.
Here but also in lots of other places - it was more work to maintain hardcoded modalities than to have it abstracted!
| samples = ( | ||
| dbSubject.faceSamples.map { it.toDomain() } + | ||
| dbSubject.fingerprintSamples.map { it.toDomain() } | ||
| ).filter { it.format == query.format }, |
There was a problem hiding this comment.
Why move the filter last? Not that it really matters in practice but it's less efficient this way.
There was a problem hiding this comment.
Tbh, don't remember. Most likely just to do it once on the combined domain samples list.
59b38bf to
1650d12
Compare
1466cce to
6460be2
Compare
1650d12 to
b303c4d
Compare
9648303 to
3363a56
Compare
| scope.launch(dispatcher) { | ||
| ranges | ||
| .map { range -> | ||
| async { semaphore.withPermit { channel.send(load(range)) } } |
There was a problem hiding this comment.
In this implementation, the semaphore is acquired first, and then then potentially blocks on channel.send().
If we have more ranges than available processor slots (i.e., 10 ranges and 4 permits), then all 10 async tasks are launched immediately. The first 4 acquire semaphore permits, and if the channel buffer fills, they block on send() while holding the permission. It seems to me that this defeats the purpose of the semaphore for concurrency control
Maybe we should change it to this:
async {
val result = semaphore.withPermit {
// Only hold semaphore during actual work
load(range)
}
// Release semaphore
channel.send(result)
}
There was a problem hiding this comment.
I don't really see the issue - you'd have 10 coroutines launched "simultaneously". 4 of them would acquire semaphores, load and then send (the Channel also has capacity of 4), then those 4 semaphores would be released and taken by next 4 coroutines. Am I missing anything?
There was a problem hiding this comment.
Please see the comment above.
There was a problem hiding this comment.
From memory, channel.send() can suspend when the buffer is full. If it suspends, it is still holding the semaphore permit (since it's inside withPermit).
Looking at the 10 ranges and 4 semaphore permits scenario, with channel capacity of 4:
- First 4 coroutines acquire permits, load, then hit
channel.send()- buffer fills - Those 4 are now blocked on
send()while holding their permits - Next 4 coroutines get permits, load, and also blocks on
send() - Now all permits are held by coroutines waiting on channel I/O, not doing actual work
That's why it seems that it's better for semaphore to surround only the expensive load() operation, and not the channel 'communication':
async {
val result = semaphore.withPermit { load(range) }
channel.send(result) // Can block, but permit already released
}With this modification, the permits are released immediately after loading, and it allows the next batch to start work (rather than being stuck behind channel operations).
HOWEVER, according to @luhmirin-s:
While this shows up as "added" code, it is exactly the same as the pre-existing loadIdentities methods. The only change from my side is the renaming and duplication removal.
So it's up to you to change the code or keep it this way.
3363a56 to
e4f718a
Compare
e4f718a to
f628d6d
Compare
|



JIRA ticket
Will be released in: 2026.1.0
Notable changes
Testing guidance
Additional work checklist