Skip to content

fix: [SDK-4388] subscription state permanently stuck at "Never Subscribed" when login() is called before requestPermission()#2627

Open
abdulraqeeb33 wants to merge 3 commits intomainfrom
ar/sdk-4388-android-sdk-subscription-state-permanently-stuck-at-never
Open

fix: [SDK-4388] subscription state permanently stuck at "Never Subscribed" when login() is called before requestPermission()#2627
abdulraqeeb33 wants to merge 3 commits intomainfrom
ar/sdk-4388-android-sdk-subscription-state-permanently-stuck-at-never

Conversation

@abdulraqeeb33
Copy link
Copy Markdown
Contributor

@abdulraqeeb33 abdulraqeeb33 commented Apr 27, 2026

Summary

Fixes SDK-4388. On a fresh install, when OneSignal.login(externalId) is called after initWithContext but before requestPermission(), the server-side subscription gets permanently stuck at enabled:false, notification_types:0 ("Never Subscribed / Permission Not Granted") even though the device is fully subscribed locally (optedIn == true).

Cherry-picked from @fadii925's fadi/sdk-4388-... branch (which is based on 5.7-main) onto main. Authorship preserved.

Repro (from the ticket)

  1. Fresh install → initWithContext runs, SDK POSTs /apps/.../users → server responds 201 with a server-assigned subscriptionId.
  2. FCM returns a push token → SDK enqueues update-subscription(NO_PERMISSION).
  3. App calls OneSignal.login(externalId) → SDK runs createAndSwitchToNewUser(), which enqueues create-subscription reusing the server-assigned subscriptionId + login-user with existingOnesignalId.
  4. App calls requestPermission() → user grants → SDK enqueues update-subscription(SUBSCRIBED).
  5. Op queue flushes:
    • update-subscription(NO_PERMISSION)PATCH /subscriptions/{id} → 200. Server: enabled:false, notification_types:0.
    • login-userPATCH .../identitySUCCESS_STARTING_ONLY. Remaining ops translated from local user onto server user.
    • create-subscription + update-subscription(SUBSCRIBED) batched into one POST .../subscriptions with { id:"<server-uuid>", enabled:true, notification_types:1 }200 {} (no-op, subscription already exists). The enabled:true is silently discarded.
  6. No further PATCH is issued → server stays at enabled:false, notification_types:0 permanently.

Root cause

SubscriptionOperationExecutor.createSubscription was always issuing a POST /subscriptions for CreateSubscriptionOperation, regardless of whether the carried subscriptionId was local or already a server-assigned UUID. When the id is already known to the backend, POST is silently treated as a no-op and the merged enabled/address/status payload is dropped.

This happens specifically on login() because UserSwitcher.createAndSwitchToNewUser reuses the existing push subscription's id under the new local user, and SubscriptionModelStoreListener.getAddOperation mechanically converts the resulting model add into a CreateSubscriptionOperation carrying that real id. Combined with GroupComparisonType.ALTER grouping it with the follow-up update-subscription(SUBSCRIBED), the executor ends up POSTing a doomed batch.

Fix

In SubscriptionOperationExecutor.execute(), branch on IDManager.isLocalId(startingOp.subscriptionId) for CreateSubscriptionOperation:

  • Local id (genuinely new subscription): keep existing createSubscription path → POST /subscriptions.
  • Non-local id (subscription already on backend): route through new updateExistingSubscriptionFromCreate helper which collapses the merged final state (enabled/address/status from the last UpdateSubscriptionOperation if present, otherwise from the create op) into an UpdateSubscriptionOperation and dispatches via updateSubscriptionPATCH /subscriptions/{id}.

This implements option (a) from the ticket. Option (b) (drop the create op) was rejected because it would no-op when there's no follow-up UpdateSubscriptionOperation in the batch.

The delete short-circuit (if (operations.any { it is DeleteSubscriptionOperation }) return SUCCESS) is preserved in the new helper.

Customer impact

  • App ID: e198f6f8-f233-4073-8380-4a3c79ae64e7
  • Affected SDK: Android Native 5.7.7 and earlier (likely all 5.x versions with this code path)
  • Customer workaround: call requestPermission() before login(), or REST-API correct stuck subscriptions via PATCH /apps/{app_id}/subscriptions/{subscription_id} with {"subscription": {"notification_types": 1}}.

Test plan

  • New unit test "create subscription with non-local id PATCHes instead of POSTing when grouped with update" reproduces the SDK-4388 scenario (Create(NO_PERMISSION) + Update(SUBSCRIBED) with non-local id) and asserts a single PATCH with enabled:true, notificationTypes=SUBSCRIBED, zero POSTs.
  • Existing test "create subscription includes the subscription ID if non-local" rewritten to assert PATCH (the previous "POST with existing id" behavior is no longer correct).
  • Pre-existing test coverage for create-subscription with local id, transfer, delete, and update paths preserved unchanged.
  • CI: ./gradlew :OneSignal:core:testReleaseUnitTest green.
  • Manual: repro the ticket flow on a fresh install (initWithContext → login → requestPermission) and verify the server-side subscription ends at enabled:true, notification_types:1.

Followups (not in this PR)

The root modeling issue is that UserSwitcher.createAndSwitchToNewUser re-attaches an existing server-assigned subscription model under a new local user, which the model-store listener interprets as a brand-new subscription needing creation. A follow-up should consider either:

  1. Emitting a TransferSubscriptionOperation (or no-op) instead of CreateSubscriptionOperation in the listener when IDManager.isLocalId(model.id) is false, or
  2. Handling the re-attach with NO_PROPOGATE in createAndSwitchToNewUser and explicitly enqueuing a transfer op.

There is also a sibling code path in LoginUserOperationExecutor.createSubscriptionsFromOperation(CreateSubscriptionOperation, ...) that does the same if (!isLocalId) operation.subscriptionId else null trick when sending to /users (the non-SUCCESS_STARTING_ONLY path), and is not addressed here.

Backport

This fix should also land on 5.7-main for the next 5.7.x patch release. @fadii925's original branch is already based on 5.7-main and can be PR'd there directly: https://github.com/OneSignal/OneSignal-Android-SDK/tree/fadi/sdk-4388-android-sdk-subscription-state-permanently-stuck-at-never

Copilot AI review requested due to automatic review settings April 27, 2026 18:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes SDK-4388 where calling OneSignal.login() before requestPermission() could leave the backend push subscription permanently stuck in a “Never Subscribed / Permission Not Granted” state by ensuring “create with non-local subscriptionId” is executed as an update (PATCH) rather than a create (POST).

Changes:

  • Route CreateSubscriptionOperation with a non-local subscriptionId through a new “update existing subscription” path (PATCH) instead of POSTing /subscriptions.
  • Stop including subscriptionId in the create-subscription request payload (always POST with id = null) for the local-id create path.
  • Update/extend unit tests to assert PATCH behavior for non-local create, including the grouped create+update repro.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
OneSignalSDK/onesignal/core/src/main/java/com/onesignal/user/internal/operations/impl/executors/SubscriptionOperationExecutor.kt Adds branching + helper to PATCH when CreateSubscriptionOperation carries a non-local subscription id; ensures POST create path doesn’t send an id.
OneSignalSDK/onesignal/core/src/test/java/com/onesignal/user/internal/operations/SubscriptionOperationExecutorTests.kt Updates existing test expectations and adds a repro test for grouped create+update to ensure a single PATCH and no POST.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 56 to +64
return if (startingOp is CreateSubscriptionOperation) {
createSubscription(startingOp, operations)
// SDK-4388: if the subscription already exists on the backend (non-local id),
// POSTing to /subscriptions with that id is silently treated as a no-op and
// drops the merged enabled/status payload. Dispatch as an update instead.
if (!IDManager.isLocalId(startingOp.subscriptionId)) {
updateExistingSubscriptionFromCreate(startingOp, operations)
} else {
createSubscription(startingOp, operations)
}
Adds three tests covering branches in `updateExistingSubscriptionFromCreate`
that the SDK-4388 fix introduced but didn't fully exercise:

1. Create + Delete with non-local id is a successful no-op (locks in the
   delete short-circuit so a future refactor can't accidentally route to
   PATCH and resurrect a deleted subscription).
2. Network-retryable backend errors on the PATCH path propagate as
   FAIL_RETRY with retryAfterSeconds (matches the local-id Create behavior
   and ensures errors aren't swallowed).
3. Multiple batched updates with a non-local-id Create collapse into a
   single PATCH carrying the *last* update's values (locks in last-wins
   semantics; mirrors the existing local-id POST test).

These complement the two existing non-local-id tests and bring coverage
for the new path to parity with the local-id Create path.

Made-with: Cursor
Comment on lines +83 to +103
private suspend fun updateExistingSubscriptionFromCreate(
createOperation: CreateSubscriptionOperation,
operations: List<Operation>,
): ExecutionResponse {
// if there are any deletes all operations should be tossed, nothing to do.
if (operations.any { it is DeleteSubscriptionOperation }) {
return ExecutionResponse(ExecutionResult.SUCCESS)
}

val lastUpdateOperation = operations.lastOrNull { it is UpdateSubscriptionOperation } as UpdateSubscriptionOperation?
val patchOp =
UpdateSubscriptionOperation(
createOperation.appId,
createOperation.onesignalId,
createOperation.subscriptionId,
createOperation.type,
lastUpdateOperation?.enabled ?: createOperation.enabled,
lastUpdateOperation?.address ?: createOperation.address,
lastUpdateOperation?.status ?: createOperation.status,
)
return updateSubscription(patchOp, listOf(patchOp))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Routing every CreateSubscriptionOperation with a non-local subscriptionId through PATCH breaks the existing 404/MISSING recovery flow in updateSubscription(): outside the missing-retry window, updateSubscription returns FAIL_NORETRY with a follow-up CreateSubscriptionOperation that re-uses the same non-local subscriptionId, which is now re-routed back through updateExistingSubscriptionFromCreate -> PATCH -> 404 -> ..., looping indefinitely with no escape. Suggested fix: catch BackendException(404) inside updateExistingSubscriptionFromCreate and fall back to createSubscription (POST) so the rebuild-user / FAIL_NORETRY recovery in createSubscription can terminate the cycle (or null out the id on the recovery op so it forces POST).

Extended reasoning...

Bug

The new updateExistingSubscriptionFromCreate helper unconditionally dispatches every CreateSubscriptionOperation whose subscriptionId is non-local through updateSubscription (PATCH /subscriptions/{id}). When that PATCH returns 404 outside the missing-retry window, updateSubscription (lines 235-260) returns FAIL_NORETRY with a recovery op:

ExecutionResponse(
    ExecutionResult.FAIL_NORETRY,
    operations = listOf(
        CreateSubscriptionOperation(
            lastOperation.appId,
            lastOperation.onesignalId,
            lastOperation.subscriptionId, // <-- same non-local id
            ...
        ),
    ),
)

OperationRepo.executeOperations (OperationRepo.kt:314-322) enqueues response.operations at the front of the queue regardless of the FAIL_NORETRY result. On the next pass, startingOp is CreateSubscriptionOperation and IDManager.isLocalId(...) is still false, so it re-enters updateExistingSubscriptionFromCreate -> PATCH -> 404 -> another CreateSubscriptionOperation with the same id... loop.

Step-by-step proof

  1. Operation queue contains CreateSubscriptionOperation(subscriptionId = "remote-uuid", ...) (e.g. server-side deletion happened after the local model cached the server uuid).
  2. execute() (line 56) sees startingOp is CreateSubscriptionOperation and !IDManager.isLocalId("remote-uuid") is true -> calls updateExistingSubscriptionFromCreate.
  3. The helper builds a patchOp and calls updateSubscription(patchOp, listOf(patchOp)).
  4. Backend returns 404. _newRecordState.isInMissingRetryWindow(...) is false (window expired or never opened). updateSubscription enters the MISSING branch and returns FAIL_NORETRY + a CreateSubscriptionOperation re-using lastOperation.subscriptionId = "remote-uuid".
  5. OperationRepo.executeOperations removes the original op (FAIL_NORETRY clears retries) and if (response.operations != null) enqueues the new CreateSubscriptionOperation("remote-uuid") at the front.
  6. Loop repeats from step 2. Each iteration produces a fresh CreateSubscriptionOperation UUID but the same non-local subscriptionId, and retries never accumulates because each cycle is FAIL_NORETRY. Backoff is whatever delayBeforeNextExecution(0, null) gives - effectively nothing.

Why pre-PR code did not have this problem

Pre-PR, the recovery CreateSubscriptionOperation went through createSubscription (POST). On 404, createSubscription's MISSING handler calls _buildUserService.getRebuildOperationsIfCurrentUser(...), which either bootstraps a rebuild (terminating the create cycle for this id) or returns null -> terminating FAIL_NORETRY. Either way the cycle ends. The new helper has no such fallback and no 404 handling at all - it just delegates to updateSubscription and lets the recovery op come back to the same branch.

Impact

Trigger scenarios are uncommon but plausible: server-side subscription deletion (admin/GDPR cleanup, manual REST delete, server lifecycle), or any case where the cached server uuid no longer exists on the backend. Consequence is unbounded queue churn - the SDK will hammer PATCH /subscriptions/{id} indefinitely with no client-side termination, draining battery and traffic until the process dies. There is no test exercising this path.

Fix

Either:

  • Catch BackendException with 404 in updateExistingSubscriptionFromCreate and fall through to createSubscription (POST), restoring the rebuild-user recovery path; or
  • In updateSubscription's 404 handler, set subscriptionId = null on the rebuilt CreateSubscriptionOperation so it is forced through the local-id POST branch on re-execution.

A unit test exercising PATCH-404-outside-missing-retry-window -> recovery should be added.

🔬 also observed by copilot-pr-reviewer

Copy link
Copy Markdown
Contributor

@nan-li nan-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await review

val subscription =
SubscriptionObject(
id = subId,
id = null,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants