Skip to content

[Improvement] make part of operationrepo initialization async#2068

Merged
jinliu9508 merged 5 commits intomainfrom
operationrepo-init-async
May 2, 2024
Merged

[Improvement] make part of operationrepo initialization async#2068
jinliu9508 merged 5 commits intomainfrom
operationrepo-init-async

Conversation

@jinliu9508
Copy link
Copy Markdown
Contributor

@jinliu9508 jinliu9508 commented Apr 29, 2024

Description

One Line Summary

Make part of the initialization of OperationRepo asynchronous so that previously saved operations can be added asynchronously, preventing long-loading operations from blocking the main thread.

Details

Motivation

We have observed numerous ANRs during the initialization phase, with OperationRepo.init being the top cause. This issue does not occur consistently, and we suspect it may be related to the device's state or having a problem accessing device's disk. To address this, we plan to make the initialization process asynchronous in OperationRepo. By moving the loading part to a background thread, we aim to prevent the main thread from being blocked when the initialization process unexpectedly takes a long time.

Scope

Saved operations from previous session will not be executed until they are loaded successfully. The order may be incorrect depends on the timing of the loading completion This change will try to insert saved operations starting from the beginning of the queue, and any later operation will be added to the end of the queue.

Testing

Manual testing

The manual test I have done to ensure the SDK is loading saved operation correctly:

  1. Turn off wifi
  2. Login user and add a tag
  3. Kill the app
  4. Re-enter the app with wifi on, and observe that the saved operations are successfully loaded and executed shortly after.
    image

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
    • If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
    • Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
  • I have reviewed this PR myself, ensuring it meets each checklist item
    • WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

This change is Reviewable

Copy link
Copy Markdown
Member

@jkasten2 jkasten2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to delay OperationModelStore.load() as well, as this is what does the disk read. See this ANR stack trace:

       at com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:68)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:30)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:162)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.<init>(OperationModelStore.kt:32)
       at java.lang.reflect.Constructor.newInstance0(Native method)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:89)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:82)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.internal.OneSignalImp.initWithContext(OneSignalImp.kt:510)
       at com.onesignal.OneSignal.initWithContext(OneSignal.kt:135)

So the order of operations of ServiceProvider creating instances of classes is it goes deep first and works its way back up. So in this case since OperationRepo requires an instance of ConfigModelStore as part of it's constructor, an instance of ConfigModelStore is created before OperationRepo.

@jinliu9508
Copy link
Copy Markdown
Contributor Author

We need to delay OperationModelStore.load() as well, as this is what does the disk read. See this ANR stack trace:

       at com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:68)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:30)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:162)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.<init>(OperationModelStore.kt:32)
       at java.lang.reflect.Constructor.newInstance0(Native method)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:89)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:82)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.internal.OneSignalImp.initWithContext(OneSignalImp.kt:510)
       at com.onesignal.OneSignal.initWithContext(OneSignal.kt:135)

So the order of operations of ServiceProvider creating instances of classes is it goes deep first and works its way back up. So in this case since OperationRepo requires an instance of ConfigModelStore as part of it's constructor, an instance of ConfigModelStore is created before OperationRepo.

Since load() is a genetic function from ModelStore, should we delay all model stores or limit the change to OperationModelStore only?

Also, both load() and persist() may be locking the models for longer than needed, especially they include the access to the preference service inside the synchronized block. Do you think we can also introduce a little optimization along with this issue?

@jkasten2
Copy link
Copy Markdown
Member

jkasten2 commented Apr 30, 2024

Since load() is a genetic function from ModelStore, should we delay all model stores or limit the change to OperationModelStore only?

Longer term we probably want change ModelStore, so none of the models read from disk in the constructor. Or ensure we never create these instances on the main thread. In the short term, to get a quick fix out, scoping it to only OperationModelStore is probably what we should do for now.

Also, both load() and persist() may be locking the models for longer than needed, especially they include the access to the preference service inside the synchronized block. Do you think we can also introduce a little optimization along with this issue?

Ya we could make those changes in this PR as well.

@jinliu9508 jinliu9508 changed the title WIP: [Improvement] make part of operationrepo initialization async [Improvement] make part of operationrepo initialization async May 2, 2024
Copy link
Copy Markdown
Member

@jkasten2 jkasten2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the logic changes look good to me.

Just needs some tests and some fixes to the linting and comments.

@jinliu9508 jinliu9508 requested a review from jkasten2 May 2, 2024 19:07
Copy link
Copy Markdown
Member

@jkasten2 jkasten2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix ups look good, but ktlint is still failing on CI. You can run it locally with the following command:
./gradlew ktlintCheck --console=plain

Also using the ktlint plugin in Android Studio helps make sure you follow lint rules.

@jinliu9508 jinliu9508 merged commit fda7c3e into main May 2, 2024
@jinliu9508 jinliu9508 deleted the operationrepo-init-async branch May 2, 2024 20:08
@jkasten2
Copy link
Copy Markdown
Member

jkasten2 commented May 2, 2024

@jinliu9508 I believe this PR will break RecoverFromDroppedLoginBug.kt. As when it calls OperationRepo.containsInstanceOf() it assumes it will already have loaded all the save operations from disk. Can you address this in a follow up PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants