Skip to content

[Fix] Prevent operations added to queue while it's processing from being executed without delay#2053

Merged
jkasten2 merged 5 commits intomainfrom
fix/op-repo-add-bucketing-to-limit-network-calls-further
Apr 15, 2024
Merged

[Fix] Prevent operations added to queue while it's processing from being executed without delay#2053
jkasten2 merged 5 commits intomainfrom
fix/op-repo-add-bucketing-to-limit-network-calls-further

Conversation

@jkasten2
Copy link
Copy Markdown
Member

@jkasten2 jkasten2 commented Apr 12, 2024

Description

One Line Summary

Fixes new operations added to OperationRepo's queue while its executing from skipping the opRepoExecutionInterval delay.

Details

Motivation

Preventing a misbehaving app adding operations (such as addTag()) in a tight loop, causing a number of back-to-back network calls to OneSignal's backend.

Scope

Only affects OperationRepo's batching logic.

New OperationRepo Concept - Buckets

   /** *** Buckets ***
     * Purpose: Bucketing is a pattern we are using to help save network
     * calls. It works together with opRepoExecutionInterval to define
     * a time window operations can be added to the bucket.
     *
     * When enqueue() is called it creates a new OperationQueueItem with it's
     * bucket = enqueueIntoBucket. Just before we start processing a bucket we
     * enqueueIntoBucket++, this ensures anything new that comes in while
     * executing doesn't cause it to skip the opRepoExecutionInterval delay.
     *
     * NOTE: Bucketing only effects the starting operation we grab.
     *       The reason is we still want getGroupableOperations() to find
     *       other operations it can execute in one go (same network call).
     *       It's more efficient overall, as it lowers the total number of
     *       network calls.
     */

Future Considerations

There is still a problem if someone rapidity adds login operations which switches users. (such calling login() with different external_ids, or login() and logout() over and over). They would all be in the same bucket and each would be its own network call and we would call them back-to-back without a delay. This isn't a problem with things like addTag, since those can be grouped into one network call.
Ideas to address this:

  1. Maybe we always wait after creating a user then?
    • We probably already need to do this to account for the 404 problem.
      • 404 problem meaning the backend bug where it can return 404 if doing an operation on something you just created
  2. maxCount - We only process x number of operations back-to-back in a bucket. Then we enforce the delay.
  3. rateLimit - Only make x number of network calls per y amount of time.

Related

This improves on the PR #2033, which fixed opRepoExecutionInterval from being skipped anytime two operations were called back-to-back.

Testing

Unit testing

  1. Commit 0988977 with failing test to prove the batching issue
  2. Commit 0f718b7 adding bucketing feature
  3. Additional tests that cover lines changed (had no existing tests for lines)

Manual testing

Manually tested on an Android 14 emulator, tested to ensure prompting for notifications works and updates the subscription correctly. As well as logging into a new user.

Also tested running the following code no longer reproduces the orignal issue:

    private void reproducesOperationRepoLoopingTooQuickly() {
        new Thread(() -> {
            while (true) {
                OneSignal.getUser().addTag("test", "Test");
                OneSignal.getUser().removeTag("test");
                OneSignal.getUser().addTag("test", "Test");
                OneSignal.getUser().removeTag("test");
                try {
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        }).start();
    }

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
    • If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
    • Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
  • I have reviewed this PR myself, ensuring it meets each checklist item
    • WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

This change is Reviewable

This commit simply adds a test to prove the following issue with
the current implementation of OperationRepo.

Once processQueueForever starts processing operations it will execute
them back-to-back until it can't process any more. This is done
intently as the idea is sync all changes to the backend quickly, after
waiting a set amount of time for batching. However nothing is in place
to account for something continually adding new operations to the repo
while it's in this mode. Some misbehaving app could be adding
operations in a tight loop as fast or faster than the queue could
exclude them.
@jkasten2 jkasten2 force-pushed the fix/op-repo-add-bucketing-to-limit-network-calls-further branch from 7788f3d to 0988977 Compare April 12, 2024 22:45
Buckets
Purpose: Bucketing is a pattern we are using to help save network
calls. It works together with opRepoExecutionInterval to define
a time window operations can be added to the bucket.

When enqueue() is called it creates a new OperationQueueItem with it's
bucket = enqueueIntoBucket. Just before we start processing a bucket we
enqueueIntoBucket++, this ensures anything new that comes in while
executing doesn't cause it to skip the opRepoExecutionInterval delay.

NOTE: Bucketing only effects the starting operation we grab.
The reason is we still want getGroupableOperations() to find
other operations it can execute in one go (same network call).
It's more efficient overall, as it lowers the total number of
network calls.

This address the failing test "operations enqueued while repo is
executing should be executed only after the next
opRepoExecutionInterval" we added in commit
0988977
@jkasten2 jkasten2 force-pushed the fix/op-repo-add-bucketing-to-limit-network-calls-further branch from 5c3f1b7 to 0f718b7 Compare April 12, 2024 23:26
This ensures we don't have a off-by-one errors
We want to ensure we are processing operations starting
operations we didn't process the last time the app was running and in an
efficient way.
@jkasten2 jkasten2 changed the title [WIP] [Fix] Prevent operations added to queue while it's processing from being executed without delay [Fix] Prevent operations added to queue while it's processing from being executed without delay Apr 13, 2024
Copy link
Copy Markdown
Contributor

@nan-li nan-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested manually by enqueuing lots of addTags while operations are executing and requests are made every second or less. After PR changes, spam enqueuing makes requests every 5 seconds only.

Also tested with device offline, cached operations and any cached operation items initialize with bucket = 0. Behavior appears correct.

Copy link
Copy Markdown
Contributor

@jinliu9508 jinliu9508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually tested with increment of bucket ensuring the logic works correctly. Also tested other normal operations including login, push, logout to make sure it doesn't break existing features.

@jkasten2
Copy link
Copy Markdown
Member Author

@nan-li @jinliu9508 appreciate both of you testing as well! And for the code review

@jkasten2 jkasten2 merged commit 5d20fa8 into main Apr 15, 2024
@jkasten2 jkasten2 deleted the fix/op-repo-add-bucketing-to-limit-network-calls-further branch April 15, 2024 20:31
@jkasten2 jkasten2 mentioned this pull request Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants