Skip to content

Conversation

@ArturT
Copy link
Member

@ArturT ArturT commented Sep 10, 2020

problem

Before this PR the knapsack_pro gem does 3 attempts to connect to Knapsack Pro API before starting running tests in Fallback Mode.

The described problem below is relevant only for running knapsack_pro in Regular Mode.

Example scenarios

Assume:

  • there are 2 parallel CI nodes.
  • the user uses knapsack_pro gem in Regular Mode.

Here you can learn about the difference between Regular Mode and Queue Mode

Scenario when Knapsack Pro API is not available at all (no bug - successful scenario)

  • CI node index 0 can't connect to the API so it starts running tests in Fallback Mode.
  • CI node index 1 can't connect to the API so it starts running tests in Fallback Mode.

The whole test suite is executed across parallel CI nodes. Everything works fine.

Scenario when Knapsack Pro API was not available only for one of the parallel CI nodes (bug exists - buggy scenario)

  • CI node index 0 can't connect to the API so it starts running tests in Fallback Mode
  • CI node index 1 can connect to the API so it starts running tests based on list of tests from the API

There is a risk that tests that supposed to be run on CI node index 0 and were never fetched from API won't run at all because Fallback Mode can run a different set of tests on CI node index 0.
CI node index 1 instead of running tests in Fallback Mode was able to connect to the API so it got a different set of tests then it would get if it was also running Fallback Mode.

Problem:

  • A) This can lead to the scenario that not all test files from test suite will be run as part of CI build. This is a problem.
  • B) Some of test files run in Fallback Mode on CI node index 0 can be the same as test files fetched from API on CI node index 1. This is not an issue that we run some tests at least once. The biggest issue is that we can skip some test files (issue A).

Important: knapsack_pro gem in Regular Mode can guarantee each test file from the test suite will be executed only when Fallback Mode starts on all parallel CI nodes.

what we aim for

  • We should aim to the scenario that all parallel CI nodes can connect to Knapsack Pro API.
  • Or we should aim to the scenario that all parallel CI nodes should run Fallback Mode.

Only those 2 scenarios guarantee that all test files from the test suite will be run as part of the CI build.

  • We should avoid scenario that some of CI nodes connect to Knapsack Pro API and other CI nodes use Fallback Mode. This can lead to skipping some tests as part of CI build.

solution

We can increase max request retries attempts to the Knapsack Pro API.

knapsack_pro gem in Regular Mode will do 6 attempts to connect to the API before starting Fallback Mode.
6 attempts will be spread across 2 minutes. This should be plenty of time for the Knapsack Pro API to auto-scale up the new available servers to serve increased traffic to the API.

If in 2 minutes knapsack_pro gem can't connect to the API then it is more likely a serious problem with the API availability so API should rather be not available at all to all parallel CI nodes (then all CI nodes should start in Fallback Mode).

Of course, there is still a risk that edge case can happen and across all parallel CI nodes some of them connected to API and some of them did 6 requests attempts and started Fallback Mode.

recommendations

  • If you would like to be 100% sure you never have this edge case then you can completely disable Fallback Mode with the env variable KNAPSACK_PRO_FALLBACK_MODE_ENABLED=false.

  • If you want to define on your own how many request attempts can be made you can do it with the env variable KNAPSACK_PRO_MAX_REQUEST_RETRIES=7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants