[MS-949] Improve concurrency in matching by BurningAXE · Pull Request #1194 · Simprints/Android-Simprints-ID

BurningAXE · 2025-05-20T17:02:35Z

JIRA ticket
Will be released in: 2025.2.0

Notable changes

This PR builds upon 1169 by making the following changes:

Ranges are created in such a way so that their count is multiples of the CPU cores. This way all ranges can be processed in parallel at all times, minimizing CPU idling and maximizing efficiency. Fore example for a CPU with 4 cores, totalCount will be divided into 4, 8, 12, etc. ranges depending on the size of totalCount. A maximum size of 2000 is imposed for range size.
MatchResultSet was refactored to handle concurrency as the previous implementation lead to crashes if modified concurrently
The loadedCandidates counter was also modified to be thread-safe as previously it had inconsistent behaviour
flows in the face/fingerprint matcher use cases are started on Dispatchers.IO instead of the caller's Dispatchers.Main

Testing guidance

I've tested this extensively with CoSync but would like someone to double check it with the internal DB. Trying out different totalCount's would also be nice to ensure no edge cases are left out.

Additional work checklist

Effect on other features and security has been considered
Design document marked as "In development" (if applicable)
External (Gitbook) and internal (Confluence) Documentation is up to date (or ticket created)
Test cases in Testiny are up to date (or ticket created)
Other teams notified about the changes (if applicable)

Copilot

Pull Request Overview

This PR enhances parallel matching by introducing more concurrent data structures and improving range generation for batch processing. Key changes include refining the range creation algorithm, updating match result storage to be thread-safe, and modifying use cases to track loaded candidates safely.

Swapped TreeSet for ConcurrentSkipListSet and used an AtomicReference in MatchResultSet for thread safety.
Replaced non-thread-safe counters with AtomicInteger in matcher use cases and updated range creation to distribute work evenly.
Expanded and updated tests for CreateRangesUseCase to cover various edge cases.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
CreateRangesUseCase.kt	Redesigned batch range logic to account for processor count and max size
MatchResultSet.kt	Switched to `ConcurrentSkipListSet` and atomic reference for lowest confidence
FingerprintMatcherUseCase.kt	Replaced shared var with `AtomicInteger` for loaded candidates
FaceMatcherUseCase.kt	Applied same `AtomicInteger` pattern for concurrency
CreateRangesUseCaseTest.kt	Added comprehensive tests for new range logic

Comments suppressed due to low confidence (2)

feature/matcher/src/main/java/com/simprints/matcher/usecases/FingerprintMatcherUseCase.kt:94

The log message interpolates the AtomicInteger directly, which will print its object representation instead of the count value; use loadedCandidates.get() to retrieve the actual count.

Simber.i("Matched $loadedCandidates candidates", tag = crashReportTag)

feature/matcher/src/main/java/com/simprints/matcher/usecases/CreateRangesUseCase.kt:13

[nitpick] The KDoc for the invoke function does not document the availableProcessors parameter or describe its behavior; consider updating the comment to include this parameter and its purpose.

operator fun invoke(

meladRaouf · 2025-05-29T10:12:59Z

-        send(MatcherState.Success(resultSet.toList(), loadedCandidates, bioSdk.matcherName))
-    }
+        send(MatcherState.Success(resultSet.toList(), loadedCandidates.get(), bioSdk.matcherName))
+    }.flowOn(dispatcherIO)


I don't think we need .flowOn(dispatcherIO), or at most we should use .flowOn(dispatcherBG). All read operations already run on the IO dispatcher using withContext(dispatcherIO), and all matching logic is executed within launch(dispatcherBG) { ... }.

All read operations already run on the IO dispatcher using withContext(dispatcherIO)

That's the problem - they don't! Reading was happening on the main thread because it used the scope passed from the ViewModel! This is why the UI wasn't updating counts until all reading was done.

I think the flow should run on a background dispatcher, with only the database read operations running on the I/O dispatcher.

That would be the official recommendation. However, in my testing IO provided better performance. I have no explanation why but I don't see any harm, either 🤷

…rsed to make it more reactive

sonarqubecloud · 2025-05-29T16:43:37Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
96.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

luhmirin-s · 2025-05-30T06:45:40Z

+        // Use a lock to ensure thread safety during the entire add operation
+        lock.withLock {
+            // Only perform this optimization when we know the set is at max capacity
+            if (skipListSet.size >= maxSize && lowestConfidence.get() > element.confidence) {


Are atomic reference for "lowestConfidence" and concurrent collection required if the whole block is synchronised?

For how we use it currently - no. But they should not lead to any further slowdown, either.

Likely splitting hairs, but concurrent data structures are typically much slower.

I'm tempted to dismiss this concern for structures with max 10 results. However, if you feel strongly about this, we can do a quick benchmark and verify TreeSet indeed works fine in real concurrent situations (CoSync reading with N threads)!?

cla-bot Bot added the ... label May 20, 2025

BurningAXE requested a review from Copilot May 20, 2025 17:14

Copilot AI reviewed May 20, 2025

View reviewed changes

Comment thread feature/matcher/src/main/java/com/simprints/matcher/usecases/MatchResultSet.kt Outdated

Comment thread feature/matcher/src/test/java/com/simprints/matcher/usecases/CreateRangesUseCaseTest.kt Outdated

BurningAXE marked this pull request as ready for review May 21, 2025 15:37

BurningAXE requested review from a team, TristramN, alex-vt, alexandr-simprints, luhmirin-s, meladRaouf and ybourgery and removed request for a team May 21, 2025 15:37

luhmirin-s reviewed May 22, 2025

View reviewed changes

Comment thread feature/matcher/src/test/java/com/simprints/matcher/usecases/CreateRangesUseCaseTest.kt Outdated

Comment thread feature/matcher/src/main/java/com/simprints/matcher/usecases/MatchResultSet.kt Outdated

BurningAXE force-pushed the feature/matching-optmization-2 branch from 24607d2 to 0124ec4 Compare May 26, 2025 09:52

BurningAXE mentioned this pull request May 26, 2025

[MS-949] Introduce Concurrency for Face and Fingerprint Matching #1169

Merged

5 tasks

BurningAXE changed the base branch from feature/matching-optmization to main May 28, 2025 09:07

meladRaouf reviewed May 29, 2025

View reviewed changes

Comment thread feature/matcher/src/main/java/com/simprints/matcher/usecases/CreateRangesUseCase.kt

meladRaouf reviewed May 29, 2025

View reviewed changes

luhmirin-s reviewed May 29, 2025

View reviewed changes

Comment thread feature/matcher/src/test/java/com/simprints/matcher/usecases/CreateRangesUseCaseTest.kt

luhmirin-s reviewed May 29, 2025

View reviewed changes

Comment thread feature/matcher/src/main/java/com/simprints/matcher/usecases/CreateRangesUseCase.kt Outdated

luhmirin-s reviewed May 29, 2025

View reviewed changes

Comment thread feature/matcher/src/main/java/com/simprints/matcher/usecases/FaceMatcherUseCase.kt Outdated

BurningAXE added 5 commits May 29, 2025 19:09

[MS-949] Optimize matching ranges for CPU efficiency

088d400

[MS-949] Refactor MatchResultSet to handle concurrency

deadb70

[MS-949] Make loadedCandidates thread-safe

3b95419

[MS-949] Move onCandidateLoaded() callback to after each record is pa…

841a2c4

…rsed to make it more reactive

[MS-949] Start matching flows on dispatcherIO

457264f

BurningAXE force-pushed the feature/matching-optmization-2 branch from 0124ec4 to f9f7b73 Compare May 29, 2025 16:09

[MS-949] Extract availableProcessors into an injectable parameter

ef741aa

BurningAXE force-pushed the feature/matching-optmization-2 branch from f9f7b73 to ef741aa Compare May 29, 2025 16:31

luhmirin-s approved these changes May 30, 2025

View reviewed changes

BurningAXE mentioned this pull request May 30, 2025

[MS-951]: Feature/add room module #1161

Merged

5 tasks

meladRaouf approved these changes Jun 2, 2025

View reviewed changes

alexandr-simprints approved these changes Jun 4, 2025

View reviewed changes

BurningAXE merged commit b7de4f8 into main Jun 4, 2025
12 checks passed

Conversation

BurningAXE commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable changes

Testing guidance

Additional work checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

meladRaouf May 29, 2025

Choose a reason for hiding this comment

Uh oh!

BurningAXE May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meladRaouf Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

BurningAXE Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 29, 2025

Quality Gate passed

Uh oh!

luhmirin-s May 30, 2025

Choose a reason for hiding this comment

Uh oh!

BurningAXE May 30, 2025

Choose a reason for hiding this comment

Uh oh!

luhmirin-s Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

BurningAXE Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BurningAXE commented May 20, 2025 •

edited

Loading

BurningAXE May 29, 2025 •

edited

Loading