fix: [SDK-4388] subscription state permanently stuck at "Never Subscribed" when login() is called before requestPermission()#2627
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes SDK-4388 where calling OneSignal.login() before requestPermission() could leave the backend push subscription permanently stuck in a “Never Subscribed / Permission Not Granted” state by ensuring “create with non-local subscriptionId” is executed as an update (PATCH) rather than a create (POST).
Changes:
- Route
CreateSubscriptionOperationwith a non-localsubscriptionIdthrough a new “update existing subscription” path (PATCH) instead of POSTing/subscriptions. - Stop including
subscriptionIdin the create-subscription request payload (always POST withid = null) for the local-id create path. - Update/extend unit tests to assert PATCH behavior for non-local create, including the grouped create+update repro.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
OneSignalSDK/onesignal/core/src/main/java/com/onesignal/user/internal/operations/impl/executors/SubscriptionOperationExecutor.kt |
Adds branching + helper to PATCH when CreateSubscriptionOperation carries a non-local subscription id; ensures POST create path doesn’t send an id. |
OneSignalSDK/onesignal/core/src/test/java/com/onesignal/user/internal/operations/SubscriptionOperationExecutorTests.kt |
Updates existing test expectations and adds a repro test for grouped create+update to ensure a single PATCH and no POST. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return if (startingOp is CreateSubscriptionOperation) { | ||
| createSubscription(startingOp, operations) | ||
| // SDK-4388: if the subscription already exists on the backend (non-local id), | ||
| // POSTing to /subscriptions with that id is silently treated as a no-op and | ||
| // drops the merged enabled/status payload. Dispatch as an update instead. | ||
| if (!IDManager.isLocalId(startingOp.subscriptionId)) { | ||
| updateExistingSubscriptionFromCreate(startingOp, operations) | ||
| } else { | ||
| createSubscription(startingOp, operations) | ||
| } |
Adds three tests covering branches in `updateExistingSubscriptionFromCreate` that the SDK-4388 fix introduced but didn't fully exercise: 1. Create + Delete with non-local id is a successful no-op (locks in the delete short-circuit so a future refactor can't accidentally route to PATCH and resurrect a deleted subscription). 2. Network-retryable backend errors on the PATCH path propagate as FAIL_RETRY with retryAfterSeconds (matches the local-id Create behavior and ensures errors aren't swallowed). 3. Multiple batched updates with a non-local-id Create collapse into a single PATCH carrying the *last* update's values (locks in last-wins semantics; mirrors the existing local-id POST test). These complement the two existing non-local-id tests and bring coverage for the new path to parity with the local-id Create path. Made-with: Cursor
| private suspend fun updateExistingSubscriptionFromCreate( | ||
| createOperation: CreateSubscriptionOperation, | ||
| operations: List<Operation>, | ||
| ): ExecutionResponse { | ||
| // if there are any deletes all operations should be tossed, nothing to do. | ||
| if (operations.any { it is DeleteSubscriptionOperation }) { | ||
| return ExecutionResponse(ExecutionResult.SUCCESS) | ||
| } | ||
|
|
||
| val lastUpdateOperation = operations.lastOrNull { it is UpdateSubscriptionOperation } as UpdateSubscriptionOperation? | ||
| val patchOp = | ||
| UpdateSubscriptionOperation( | ||
| createOperation.appId, | ||
| createOperation.onesignalId, | ||
| createOperation.subscriptionId, | ||
| createOperation.type, | ||
| lastUpdateOperation?.enabled ?: createOperation.enabled, | ||
| lastUpdateOperation?.address ?: createOperation.address, | ||
| lastUpdateOperation?.status ?: createOperation.status, | ||
| ) | ||
| return updateSubscription(patchOp, listOf(patchOp)) |
There was a problem hiding this comment.
🔴 Routing every CreateSubscriptionOperation with a non-local subscriptionId through PATCH breaks the existing 404/MISSING recovery flow in updateSubscription(): outside the missing-retry window, updateSubscription returns FAIL_NORETRY with a follow-up CreateSubscriptionOperation that re-uses the same non-local subscriptionId, which is now re-routed back through updateExistingSubscriptionFromCreate -> PATCH -> 404 -> ..., looping indefinitely with no escape. Suggested fix: catch BackendException(404) inside updateExistingSubscriptionFromCreate and fall back to createSubscription (POST) so the rebuild-user / FAIL_NORETRY recovery in createSubscription can terminate the cycle (or null out the id on the recovery op so it forces POST).
Extended reasoning...
Bug
The new updateExistingSubscriptionFromCreate helper unconditionally dispatches every CreateSubscriptionOperation whose subscriptionId is non-local through updateSubscription (PATCH /subscriptions/{id}). When that PATCH returns 404 outside the missing-retry window, updateSubscription (lines 235-260) returns FAIL_NORETRY with a recovery op:
ExecutionResponse(
ExecutionResult.FAIL_NORETRY,
operations = listOf(
CreateSubscriptionOperation(
lastOperation.appId,
lastOperation.onesignalId,
lastOperation.subscriptionId, // <-- same non-local id
...
),
),
)OperationRepo.executeOperations (OperationRepo.kt:314-322) enqueues response.operations at the front of the queue regardless of the FAIL_NORETRY result. On the next pass, startingOp is CreateSubscriptionOperation and IDManager.isLocalId(...) is still false, so it re-enters updateExistingSubscriptionFromCreate -> PATCH -> 404 -> another CreateSubscriptionOperation with the same id... loop.
Step-by-step proof
- Operation queue contains
CreateSubscriptionOperation(subscriptionId = "remote-uuid", ...)(e.g. server-side deletion happened after the local model cached the server uuid). execute()(line 56) seesstartingOp is CreateSubscriptionOperationand!IDManager.isLocalId("remote-uuid")is true -> callsupdateExistingSubscriptionFromCreate.- The helper builds a
patchOpand callsupdateSubscription(patchOp, listOf(patchOp)). - Backend returns 404.
_newRecordState.isInMissingRetryWindow(...)is false (window expired or never opened).updateSubscriptionenters theMISSINGbranch and returnsFAIL_NORETRY+ aCreateSubscriptionOperationre-usinglastOperation.subscriptionId = "remote-uuid". OperationRepo.executeOperationsremoves the original op (FAIL_NORETRY clearsretries) andif (response.operations != null)enqueues the newCreateSubscriptionOperation("remote-uuid")at the front.- Loop repeats from step 2. Each iteration produces a fresh
CreateSubscriptionOperationUUID but the same non-localsubscriptionId, andretriesnever accumulates because each cycle isFAIL_NORETRY. Backoff is whateverdelayBeforeNextExecution(0, null)gives - effectively nothing.
Why pre-PR code did not have this problem
Pre-PR, the recovery CreateSubscriptionOperation went through createSubscription (POST). On 404, createSubscription's MISSING handler calls _buildUserService.getRebuildOperationsIfCurrentUser(...), which either bootstraps a rebuild (terminating the create cycle for this id) or returns null -> terminating FAIL_NORETRY. Either way the cycle ends. The new helper has no such fallback and no 404 handling at all - it just delegates to updateSubscription and lets the recovery op come back to the same branch.
Impact
Trigger scenarios are uncommon but plausible: server-side subscription deletion (admin/GDPR cleanup, manual REST delete, server lifecycle), or any case where the cached server uuid no longer exists on the backend. Consequence is unbounded queue churn - the SDK will hammer PATCH /subscriptions/{id} indefinitely with no client-side termination, draining battery and traffic until the process dies. There is no test exercising this path.
Fix
Either:
- Catch
BackendExceptionwith 404 inupdateExistingSubscriptionFromCreateand fall through tocreateSubscription(POST), restoring the rebuild-user recovery path; or - In
updateSubscription's 404 handler, setsubscriptionId = nullon the rebuiltCreateSubscriptionOperationso it is forced through the local-id POST branch on re-execution.
A unit test exercising PATCH-404-outside-missing-retry-window -> recovery should be added.
🔬 also observed by copilot-pr-reviewer
| val subscription = | ||
| SubscriptionObject( | ||
| id = subId, | ||
| id = null, |
Summary
Fixes SDK-4388. On a fresh install, when
OneSignal.login(externalId)is called afterinitWithContextbut beforerequestPermission(), the server-side subscription gets permanently stuck atenabled:false, notification_types:0("Never Subscribed / Permission Not Granted") even though the device is fully subscribed locally (optedIn == true).Cherry-picked from @fadii925's
fadi/sdk-4388-...branch (which is based on5.7-main) ontomain. Authorship preserved.Repro (from the ticket)
initWithContextruns, SDK POSTs/apps/.../users→ server responds 201 with a server-assignedsubscriptionId.update-subscription(NO_PERMISSION).OneSignal.login(externalId)→ SDK runscreateAndSwitchToNewUser(), which enqueuescreate-subscriptionreusing the server-assignedsubscriptionId+login-userwithexistingOnesignalId.requestPermission()→ user grants → SDK enqueuesupdate-subscription(SUBSCRIBED).update-subscription(NO_PERMISSION)→PATCH /subscriptions/{id}→ 200. Server:enabled:false, notification_types:0.login-user→PATCH .../identity→SUCCESS_STARTING_ONLY. Remaining ops translated from local user onto server user.create-subscription+update-subscription(SUBSCRIBED)batched into onePOST .../subscriptionswith{ id:"<server-uuid>", enabled:true, notification_types:1 }→200 {}(no-op, subscription already exists). Theenabled:trueis silently discarded.enabled:false, notification_types:0permanently.Root cause
SubscriptionOperationExecutor.createSubscriptionwas always issuing aPOST /subscriptionsforCreateSubscriptionOperation, regardless of whether the carriedsubscriptionIdwas local or already a server-assigned UUID. When the id is already known to the backend, POST is silently treated as a no-op and the mergedenabled/address/statuspayload is dropped.This happens specifically on
login()becauseUserSwitcher.createAndSwitchToNewUserreuses the existing push subscription's id under the new local user, andSubscriptionModelStoreListener.getAddOperationmechanically converts the resulting model add into aCreateSubscriptionOperationcarrying that real id. Combined withGroupComparisonType.ALTERgrouping it with the follow-upupdate-subscription(SUBSCRIBED), the executor ends up POSTing a doomed batch.Fix
In
SubscriptionOperationExecutor.execute(), branch onIDManager.isLocalId(startingOp.subscriptionId)forCreateSubscriptionOperation:createSubscriptionpath →POST /subscriptions.updateExistingSubscriptionFromCreatehelper which collapses the merged final state (enabled/address/statusfrom the lastUpdateSubscriptionOperationif present, otherwise from the create op) into anUpdateSubscriptionOperationand dispatches viaupdateSubscription→PATCH /subscriptions/{id}.This implements option (a) from the ticket. Option (b) (drop the create op) was rejected because it would no-op when there's no follow-up
UpdateSubscriptionOperationin the batch.The delete short-circuit (
if (operations.any { it is DeleteSubscriptionOperation }) return SUCCESS) is preserved in the new helper.Customer impact
e198f6f8-f233-4073-8380-4a3c79ae64e7requestPermission()beforelogin(), or REST-API correct stuck subscriptions viaPATCH /apps/{app_id}/subscriptions/{subscription_id}with{"subscription": {"notification_types": 1}}.Test plan
"create subscription with non-local id PATCHes instead of POSTing when grouped with update"reproduces the SDK-4388 scenario (Create(NO_PERMISSION) + Update(SUBSCRIBED) with non-local id) and asserts a single PATCH withenabled:true, notificationTypes=SUBSCRIBED, zero POSTs."create subscription includes the subscription ID if non-local"rewritten to assert PATCH (the previous "POST with existing id" behavior is no longer correct)../gradlew :OneSignal:core:testReleaseUnitTestgreen.enabled:true, notification_types:1.Followups (not in this PR)
The root modeling issue is that
UserSwitcher.createAndSwitchToNewUserre-attaches an existing server-assigned subscription model under a new local user, which the model-store listener interprets as a brand-new subscription needing creation. A follow-up should consider either:TransferSubscriptionOperation(or no-op) instead ofCreateSubscriptionOperationin the listener whenIDManager.isLocalId(model.id)is false, orNO_PROPOGATEincreateAndSwitchToNewUserand explicitly enqueuing a transfer op.There is also a sibling code path in
LoginUserOperationExecutor.createSubscriptionsFromOperation(CreateSubscriptionOperation, ...)that does the sameif (!isLocalId) operation.subscriptionId else nulltrick when sending to/users(the non-SUCCESS_STARTING_ONLYpath), and is not addressed here.Backport
This fix should also land on
5.7-mainfor the next 5.7.x patch release. @fadii925's original branch is already based on5.7-mainand can be PR'd there directly: https://github.com/OneSignal/OneSignal-Android-SDK/tree/fadi/sdk-4388-android-sdk-subscription-state-permanently-stuck-at-never