fix: login/logout race causes subsequent calls to target previous user#2618
Open
fix: login/logout race causes subsequent calls to target previous user#2618
Conversation
added 5 commits
April 9, 2026 13:37
- Compare base.sha...head.sha on pull_request so the target branch matches the PR base. - Enforce >= threshold on aggregate touched executable lines; keep per-file detail informational. - Fail when main-source files in the diff cannot be mapped to JaCoCo. - Rename workflow step for clarity. Made-with: Cursor
- GET /apps/:app_id/sdk/features/android/:sdk_version with validated version label - FeatureFlagsJsonParser (kotlinx.serialization), metadata as sibling JsonObjects - FeatureFlagsRefreshService with ~10 min foreground polling; ConfigModelChangeTags.REMOTE_FEATURE_FLAGS - FeatureManager merges remote flags; IFeatureManager.remoteFeatureFlagMetadata() - SDK_BACKGROUND_THREADING / sdk_background_threading; CoreModule wiring and tests Made-with: Cursor
- Annotate pollJob with @volatile for cross-thread visibility - Remove duplicate onFocus from start(); ApplicationService already fires on subscribe - Apply remote flags via in-place model property updates (REMOTE_FEATURE_FLAGS tag) to avoid clobbering concurrent params hydration with a stale full-model replace - Use OneSignalDispatchers.launchOnIO so CancellationException is not logged as an error - Compare enabled key sets (order-insensitive) before writing Made-with: Cursor
10725ac to
2d9f1f3
Compare
Contributor
Author
|
@claude review |
264a52a to
6c5f33c
Compare
…y casing - RemoteFeatureFlagsFetchOutcome Success/Unavailable; parseSuccessful distinguishes valid Turbine JSON (including empty features) from parse/contract failure; no cache wipe on Unavailable - Polling loop: rethrow CancellationException; catch Exception only for fetch errors - FeatureManager: drop dead REMOTE_FEATURE_FLAGS onModelReplaced branch; canonicalize flag keys - ConfigModelChangeTags + FeatureFlag KDoc/comment fixes - Tests: parseSuccessful, case-insensitive remote key Made-with: Cursor
Contributor
Author
|
@claude review |
added 2 commits
April 21, 2026 13:44
Keep the "next fetch in …ms" and before/after diff debug logs in FeatureFlagsRefreshService, and keep mavenLocal() in the demo's allprojects repositories; just remove the TODO(revert-before-merge) markers now that these are staying. Made-with: Cursor
99334e8 to
d79186a
Compare
Contributor
Author
|
@claude review |
Contributor
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Contributor
Author
|
@claude review |
sherwinski
reviewed
Apr 23, 2026
Comment on lines
+189
to
+193
| if (queueItem.waiter != null && existing.waiter == null) { | ||
| existing.waiter = queueItem.waiter | ||
| } else { | ||
| queueItem.waiter?.wake(true) | ||
| } |
Contributor
There was a problem hiding this comment.
Might be worth writing a test for this (and L36) that confirms the waiters get switched.
Contributor
There was a problem hiding this comment.
Something like: seed the queue with an OperationQueueItem(queuedOp, waiter = seededWaiter, bucket = 0), then call enqueueAndWait for the same
onesignalId, let the executor run the queued op (mock it to return SUCCESS), and assert both the seeded waiter and the enqueueAndWait caller wake with true.
33b73fc to
2f615fc
Compare
441c409 to
68f2df3
Compare
6a69a4b to
3c9b55d
Compare
…ract-violating responses
ConfigModelStoreListener.fetchParams now re-sources sdkRemoteFeatureFlags{,Metadata}
from the live model immediately before replace(), so concurrent in-place writes
from FeatureFlagsRefreshService survive Model.initializeFromModel's data.clear()+putAll().
FeatureFlagsJsonParser.parseRootStrict now distinguishes an authoritative empty
features array from one that filtered down to empty due to element-type violations;
the latter returns null -> Unavailable so callers preserve the cached list rather
than overwriting it with [].
Adds regression tests for both paths. Addresses PR #2612 review comments.
Made-with: Cursor
`ConfigModelStoreListener.fetchParams` replaces the live `ConfigModel` with the HYDRATE tag during init, which fired `onModelReplaced` on `FeatureFlagsRefreshService` and triggered a second Turbine GET right after the focus-driven first one, even though the appId hadn't changed. Track the appId currently being polled and treat a re-trigger for the same appId as a no-op; genuine appId changes still cancel and restart. `onUnfocused` clears the cached appId so the next focus event re-arms polling. Pull the foreground cadence off the constructor and onto an `internal var refreshIntervalMs` (default 8 min, test-only override before `start`). The IoC reflection in `ServiceRegistrationReflection.resolve` cannot resolve a non-service `Long` parameter and was throwing "Could not instantiate service: ServiceRegistrationReflection@..." out of `StartupService.scheduleStart` once the service was registered. Made-with: Cursor
OneSignal.login() scheduled the user switch asynchronously via suspendifyOnIO, so the API call returned before the identity model was updated. Calls made right after login (e.g. addTag) applied to the OLD user because the switch hadn't happened yet. Split login into two phases: - switchUser(): synchronous under the login/logout lock. Swaps the local identity/properties/subscription models immediately so subsequent SDK calls see the new user. - enqueueLogin(): async. Enqueues the LoginUserOperation and awaits completion of the backend user creation. Because switchUser returns before enqueueLogin runs, there's a small window where RecoverFromDroppedLoginBug can observe the state (new externalId, local onesignalId, no LoginUserOperation in queue) and enqueue its own recovery login. When the real enqueueLogin then adds another login op, the executor crashes on a grouped batch containing two LoginUserOperations. Add a dedupe check in internalEnqueue: if a LoginUserOperation for the same onesignalId is already queued, drop the incoming one and wake its waiter optimistically with true so loginSuspend callers don't hang on enqueueAndWait. If the duplicate came from loadSavedOperations (addToStore = false), also remove it from the operation model store so we don't leave an orphan entry that lingers until the next cold start. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When dedupe fires, the existing queued op came from RecoverFromDroppedLoginBug (always has existingOnesignalId=null) and the incoming op from enqueueLogin may carry the anonymous user's UUID needed to merge their tags/subscriptions into the newly- identified user. Previously the dedupe kept whichever op was first, silently dropping the merge info when the recovery op won the race. Merge the incoming op's existingOnesignalId into the queued op in place before dropping the duplicate. LoginUserOperation.existingOnesignalId setter relaxed from private to internal to allow the in-place update. Also initialize the LoginUserOperations in the loadSavedOperations test properly — the old dedupe only compared ids so uninitialized ops worked incidentally; the new dedupe reads onesignalId. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change OperationQueueItem.waiter to var. On LoginUserOperation dedupe, transfer the incoming op's waiter to the already-queued item so the enqueueAndWait caller (loginSuspend) receives the real execution result instead of an optimistic true. Fall back to optimistic wake only in the unlikely case where both items already have waiters. - Add OperationRepoTests coverage for the dedupe+merge path with matching onesignalId and differing existingOnesignalId — the previous test setup used different onesignalIds so the merge branch had no regression guard.
When an operation execution returns FAIL_PAUSE_OPREPO, the repo is paused and the ops are re-queued with their original waiters still attached. enqueueAndWait callers (loginSuspend) would suspend on the waiter forever — the op stays in the queue but never executes while paused, and the waiter is never woken. Wake the waiter with false and re-queue with waiter=null, matching the pattern from #2600. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the ad-hoc runBlocking { delay(100) } in the new
LoginUserOperation dedupe test with the same waitForInternalEnqueue
helper every other test in the file uses. Deterministic instead of
a fixed 100ms race against CI.
Mirrors the login race fix for the logout flow. OneSignal.logout() scheduled the user switch asynchronously via suspendifyOnIO, so the API call returned before the identity/properties/subscription models were updated. Calls made right after logout() (e.g. addTag) targeted the pre-logout user. Split LogoutHelper.logout() into switchUser() (sync, under loginLogoutLock) and enqueueLogout() (async). OneSignalImp.logout() and logoutSuspend() call switchUser() synchronously, then launch enqueueLogout() in the coroutine scope — mirroring LoginHelper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard the LoginUserOperation dedup+merge path with !IDManager.isLocalId. When upgrading from 5.0.0-5.1.7 with a dropped anon-user create, switchUser returns a local- existingOneSignalId. If RecoverFromDroppedLoginBug wins the race and its null-existingOnesignalId op is queued first, merging the local id flips canStartExecute to false and strands the op — the waiter (now transferred) never wakes, and the stuck op blocks future recovery.
When the queued LoginUserOperation carries a waiter, dedupe wakes the incoming enqueueAndWait caller immediately with true and the queued op's waiter receives the real execution result on SUCCESS. Existing dedupe coverage only exercised the waiter-transfer branch where the queued op had no waiter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3c9b55d to
dce3184
Compare
Base automatically changed from
sdk-4363-android-sdk-turbine-feature-flags
to
main
April 28, 2026 19:07
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
One Line Summary
(Not JWT related)
Implementation of #2600
Fix a race in
OneSignal.login()andOneSignal.logout()where SDK calls made immediately after either operation (e.g.addTag) landed on the previous user's identity.Details
Motivation
Both
OneSignal.login()andOneSignal.logout()scheduled the local user switch asynchronously viasuspendifyOnIO, so the API call returned before the identity/properties/subscription models were updated. Calls made right after targeted the OLD user.Example reproduction:
Scope
Split both
LoginHelper.login()andLogoutHelper.logout()into two phases each:switchUser()— synchronous under the login/logout lock. Swaps the local identity/properties/subscription models immediately so subsequent SDK calls see the new user.enqueueLogin()/enqueueLogout()— async. Enqueues theLoginUserOperation(and for login, awaits completion of the backend user creation).OneSignalImp.login(),loginSuspend(),logout(), andlogoutSuspend()all callswitchUser()synchronously, then launch the enqueue in the coroutine scope.Because
switchUserreturns before the enqueue runs, there's a brief window whereRecoverFromDroppedLoginBugcan observe the intermediate state (newexternalId, localonesignalId, noLoginUserOperationin queue) and enqueue its own recovery login. When the real enqueue then adds another login op, the executor crashes on a grouped batch containing twoLoginUserOperations with "Unrecognized operation".To handle this:
OperationRepo.internalEnqueuededupesLoginUserOperationbyonesignalId. If a queued op already exists for the same user, the incoming op is dropped. The incoming op'sexistingOnesignalIdis merged into the queued op first so the anonymous-user conversion link isn't lost (the recovery op always hasexistingOnesignalId = null, the real login op may carry the anon UUID needed for the backend merge).enqueueAndWaitcallers (e.g.loginSuspend) receive the real execution result.LoginUserOperation.existingOnesignalIdsetter relaxed fromprivatetointernalto allow the in-place merge.Also applies the
FAIL_PAUSE_OPREPOwaker fix from #2600: ops re-queued on pause detach their waiters (wake withfalse) soenqueueAndWaitcallers don't hang on a paused queue.Related: #2600 made the same split on
feat/identity_verification_5.8, but that branch hasn't merged. This port brings the fix tomainso all users get the race fix in the next release, and extends the pattern symmetrically to logout.Testing
Unit testing
LoginHelperTeststo use the newswitchUser()+enqueueLogin()split; existing tests pass.LogoutHelperTeststo use the newswitchUser()+enqueueLogout()split; existing tests pass.OperationRepoTestscase for the dedupe+merge path: seed aLoginUserOperation(existingOnesignalId=null)in the queue (simulatesRecoverFromDroppedLoginBug), enqueue an incoming op with the sameonesignalIdand a non-nullexistingOnesignalId, verify the queued op is merged in place and the incoming op is dropped. UseswaitForInternalEnqueuehelper for determinism.loadSavedOperationstest to constructLoginUserOperationwith full args (the previous no-args construction stopped working once the dedupe readsonesignalId).Manual testing
login()+addTag()calls — confirmed each tag landed on the wrong user before the fix.logout()+addTag()now attaches the tag to the new anonymous user instead of the previously logged-in one.login()call — confirmed the executor no longer crashes with "Unrecognized operation" and the anonymous-user conversion link is preserved.Affected code checklist
Checklist
Overview
Testing
Final pass