[Fix] Prevent operations added to queue while it's processing from being executed without delay by jkasten2 · Pull Request #2053 · OneSignal/OneSignal-Android-SDK

jkasten2 · 2024-04-12T22:25:16Z

Description

One Line Summary

Fixes new operations added to OperationRepo's queue while its executing from skipping the opRepoExecutionInterval delay.

Details

Motivation

Preventing a misbehaving app adding operations (such as addTag()) in a tight loop, causing a number of back-to-back network calls to OneSignal's backend.

Scope

Only affects OperationRepo's batching logic.

New OperationRepo Concept - Buckets

   /** *** Buckets ***
     * Purpose: Bucketing is a pattern we are using to help save network
     * calls. It works together with opRepoExecutionInterval to define
     * a time window operations can be added to the bucket.
     *
     * When enqueue() is called it creates a new OperationQueueItem with it's
     * bucket = enqueueIntoBucket. Just before we start processing a bucket we
     * enqueueIntoBucket++, this ensures anything new that comes in while
     * executing doesn't cause it to skip the opRepoExecutionInterval delay.
     *
     * NOTE: Bucketing only effects the starting operation we grab.
     *       The reason is we still want getGroupableOperations() to find
     *       other operations it can execute in one go (same network call).
     *       It's more efficient overall, as it lowers the total number of
     *       network calls.
     */

Future Considerations

There is still a problem if someone rapidity adds login operations which switches users. (such calling login() with different external_ids, or login() and logout() over and over). They would all be in the same bucket and each would be its own network call and we would call them back-to-back without a delay. This isn't a problem with things like addTag, since those can be grouped into one network call.
Ideas to address this:

Maybe we always wait after creating a user then?
- We probably already need to do this to account for the 404 problem.
  - 404 problem meaning the backend bug where it can return 404 if doing an operation on something you just created
maxCount - We only process x number of operations back-to-back in a bucket. Then we enforce the delay.
rateLimit - Only make x number of network calls per y amount of time.

Testing

Unit testing

Commit 0988977 with failing test to prove the batching issue
- Failing test results
Commit 0f718b7 adding bucketing feature
- Test now passes
- add a eba7081 ensuring we don't have a off-by-one errors
Additional tests that cover lines changed (had no existing tests for lines)
- starting OperationModelStore should be processed, following normal delay rules
- ensure results from executeOperations are added to beginning of the queue

Manual testing

Manually tested on an Android 14 emulator, tested to ensure prompting for notifications works and updates the subscription correctly. As well as logging into a new user.

Also tested running the following code no longer reproduces the orignal issue:

    private void reproducesOperationRepoLoopingTooQuickly() {
        new Thread(() -> {
            while (true) {
                OneSignal.getUser().addTag("test", "Test");
                OneSignal.getUser().removeTag("test");
                OneSignal.getUser().addTag("test", "Test");
                OneSignal.getUser().removeTag("test");
                try {
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        }).start();
    }

Affected code checklist

Checklist

Overview

I have filled out all REQUIRED sections above
PR does one thing
- If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
Any Public API changes are explained in the PR details and conform to existing APIs

Testing

I have included test coverage for these changes, or explained why they are not needed
All automated tests pass, or I explained why that is not possible
I have personally tested this on my device, or explained why that is not possible

Final pass

Code is as readable as possible.
- Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
I have reviewed this PR myself, ensuring it meets each checklist item
- WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

This change is

This commit simply adds a test to prove the following issue with the current implementation of OperationRepo. Once processQueueForever starts processing operations it will execute them back-to-back until it can't process any more. This is done intently as the idea is sync all changes to the backend quickly, after waiting a set amount of time for batching. However nothing is in place to account for something continually adding new operations to the repo while it's in this mode. Some misbehaving app could be adding operations in a tight loop as fast or faster than the queue could exclude them.

Buckets Purpose: Bucketing is a pattern we are using to help save network calls. It works together with opRepoExecutionInterval to define a time window operations can be added to the bucket. When enqueue() is called it creates a new OperationQueueItem with it's bucket = enqueueIntoBucket. Just before we start processing a bucket we enqueueIntoBucket++, this ensures anything new that comes in while executing doesn't cause it to skip the opRepoExecutionInterval delay. NOTE: Bucketing only effects the starting operation we grab. The reason is we still want getGroupableOperations() to find other operations it can execute in one go (same network call). It's more efficient overall, as it lowers the total number of network calls. This address the failing test "operations enqueued while repo is executing should be executed only after the next opRepoExecutionInterval" we added in commit 0988977

This ensures we don't have a off-by-one errors

We want to ensure we are processing operations starting operations we didn't process the last time the app was running and in an efficient way.

nan-li

Tested manually by enqueuing lots of addTags while operations are executing and requests are made every second or less. After PR changes, spam enqueuing makes requests every 5 seconds only.

Also tested with device offline, cached operations and any cached operation items initialize with bucket = 0. Behavior appears correct.

jinliu9508

Manually tested with increment of bucket ensuring the logic works correctly. Also tested other normal operations including login, push, logout to make sure it doesn't break existing features.

jkasten2 · 2024-04-15T20:30:57Z

@nan-li @jinliu9508 appreciate both of you testing as well! And for the code review

jkasten2 force-pushed the fix/op-repo-add-bucketing-to-limit-network-calls-further branch from 7788f3d to 0988977 Compare April 12, 2024 22:45

jkasten2 force-pushed the fix/op-repo-add-bucketing-to-limit-network-calls-further branch from 5c3f1b7 to 0f718b7 Compare April 12, 2024 23:26

jkasten2 added 3 commits April 12, 2024 20:27

add test to ensure a 2nd pass behaves correctly

eba7081

This ensures we don't have a off-by-one errors

test to ensure we are processing cold start ops

c5fde00

We want to ensure we are processing operations starting operations we didn't process the last time the app was running and in an efficient way.

add results from executeOperations test

8001c6e

jkasten2 changed the title ~~[WIP] [Fix] Prevent operations added to queue while it's processing from being executed without delay~~ [Fix] Prevent operations added to queue while it's processing from being executed without delay Apr 13, 2024

jkasten2 requested review from brismithers, emawby, jennantilla, jinliu9508, nan-li and shepherd-l April 13, 2024 20:32

jkasten2 assigned jinliu9508 Apr 13, 2024

nan-li approved these changes Apr 15, 2024

View reviewed changes

jinliu9508 approved these changes Apr 15, 2024

View reviewed changes

jkasten2 merged commit 5d20fa8 into main Apr 15, 2024

jkasten2 deleted the fix/op-repo-add-bucketing-to-limit-network-calls-further branch April 15, 2024 20:31

jkasten2 mentioned this pull request Apr 15, 2024

Release 5.1.9 #2058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Prevent operations added to queue while it's processing from being executed without delay#2053

[Fix] Prevent operations added to queue while it's processing from being executed without delay#2053
jkasten2 merged 5 commits intomainfrom
fix/op-repo-add-bucketing-to-limit-network-calls-further

jkasten2 commented Apr 12, 2024 •

edited

Loading

Uh oh!

nan-li left a comment

Uh oh!

jinliu9508 left a comment

Uh oh!

jkasten2 commented Apr 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jkasten2 commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

One Line Summary

Details

Motivation

Scope

New OperationRepo Concept - Buckets

Future Considerations

Related

Testing

Unit testing

Manual testing

Affected code checklist

Checklist

Overview

Testing

Final pass

Uh oh!

nan-li left a comment

Choose a reason for hiding this comment

Uh oh!

jinliu9508 left a comment

Choose a reason for hiding this comment

Uh oh!

jkasten2 commented Apr 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jkasten2 commented Apr 12, 2024 •

edited

Loading