[MS-949] Improve concurrency in matching#1194
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR enhances parallel matching by introducing more concurrent data structures and improving range generation for batch processing. Key changes include refining the range creation algorithm, updating match result storage to be thread-safe, and modifying use cases to track loaded candidates safely.
- Swapped
TreeSetforConcurrentSkipListSetand used anAtomicReferenceinMatchResultSetfor thread safety. - Replaced non-thread-safe counters with
AtomicIntegerin matcher use cases and updated range creation to distribute work evenly. - Expanded and updated tests for
CreateRangesUseCaseto cover various edge cases.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| CreateRangesUseCase.kt | Redesigned batch range logic to account for processor count and max size |
| MatchResultSet.kt | Switched to ConcurrentSkipListSet and atomic reference for lowest confidence |
| FingerprintMatcherUseCase.kt | Replaced shared var with AtomicInteger for loaded candidates |
| FaceMatcherUseCase.kt | Applied same AtomicInteger pattern for concurrency |
| CreateRangesUseCaseTest.kt | Added comprehensive tests for new range logic |
Comments suppressed due to low confidence (2)
feature/matcher/src/main/java/com/simprints/matcher/usecases/FingerprintMatcherUseCase.kt:94
- The log message interpolates the AtomicInteger directly, which will print its object representation instead of the count value; use loadedCandidates.get() to retrieve the actual count.
Simber.i("Matched $loadedCandidates candidates", tag = crashReportTag)
feature/matcher/src/main/java/com/simprints/matcher/usecases/CreateRangesUseCase.kt:13
- [nitpick] The KDoc for the invoke function does not document the
availableProcessorsparameter or describe its behavior; consider updating the comment to include this parameter and its purpose.
operator fun invoke(
24607d2 to
0124ec4
Compare
| send(MatcherState.Success(resultSet.toList(), loadedCandidates, bioSdk.matcherName)) | ||
| } | ||
| send(MatcherState.Success(resultSet.toList(), loadedCandidates.get(), bioSdk.matcherName)) | ||
| }.flowOn(dispatcherIO) |
There was a problem hiding this comment.
I don't think we need .flowOn(dispatcherIO), or at most we should use .flowOn(dispatcherBG). All read operations already run on the IO dispatcher using withContext(dispatcherIO), and all matching logic is executed within launch(dispatcherBG) { ... }.
There was a problem hiding this comment.
All read operations already run on the IO dispatcher using withContext(dispatcherIO)
That's the problem - they don't! Reading was happening on the main thread because it used the scope passed from the ViewModel! This is why the UI wasn't updating counts until all reading was done.
There was a problem hiding this comment.
I think the flow should run on a background dispatcher, with only the database read operations running on the I/O dispatcher.
There was a problem hiding this comment.
That would be the official recommendation. However, in my testing IO provided better performance. I have no explanation why but I don't see any harm, either 🤷
…rsed to make it more reactive
0124ec4 to
f9f7b73
Compare
f9f7b73 to
ef741aa
Compare
|
| // Use a lock to ensure thread safety during the entire add operation | ||
| lock.withLock { | ||
| // Only perform this optimization when we know the set is at max capacity | ||
| if (skipListSet.size >= maxSize && lowestConfidence.get() > element.confidence) { |
There was a problem hiding this comment.
Are atomic reference for "lowestConfidence" and concurrent collection required if the whole block is synchronised?
There was a problem hiding this comment.
For how we use it currently - no. But they should not lead to any further slowdown, either.
There was a problem hiding this comment.
Likely splitting hairs, but concurrent data structures are typically much slower.
There was a problem hiding this comment.
I'm tempted to dismiss this concern for structures with max 10 results. However, if you feel strongly about this, we can do a quick benchmark and verify TreeSet indeed works fine in real concurrent situations (CoSync reading with N threads)!?



JIRA ticket
Will be released in: 2025.2.0
Notable changes
This PR builds upon 1169 by making the following changes:
Dispatchers.IOinstead of the caller'sDispatchers.MainTesting guidance
I've tested this extensively with CoSync but would like someone to double check it with the internal DB. Trying out different totalCount's would also be nice to ensure no edge cases are left out.
Additional work checklist