-
Notifications
You must be signed in to change notification settings - Fork 16
intermittent failure in hubspot integration test #992
Description
Bug Description
As noted in this comment:
unsafe CI checks succeeded after 2 failed attempts, without any code changes, so there's something fishy going on with the unsafe CI test.
in this case, both failures were on the same assertion in the hubspot test erasure task -- the assertion that confirms the results of the access request that is accessing the erasure seed data. The same assertion failure occurred on this workflow run yesterday.
I've also noted this nondeterministic behavior locally when executing pytest tests/ops/integration_tests/saas/test_hubspot_task.py within my server shell.
Steps to Reproduce
Given what we know at this point, this seems to just occur "randomly" when running external integration tests. Once we narrow down the issue further, we'll probably be able to have more precise repro steps/scenarios :)
After ensuring you have vault access or the correct hubspot credentials in your local env, you can try executing pytest tests/ops/integration_tests/saas/test_hubspot_task.py within your local server shell. That being said, i've only seen the failure once locally after executing the test ~5 times.
In CI, with what we know at this point, just triggering the unsafe CI checks action enough times should reproduce the error, if this is truly just randomly occurring.
Expected behavior
Unsafe CI checks should reliably pass
Environment
This nondeterministic behavior seems to occur both:
- in CI (unsafe PR checks)
- locally
Additional context
here are some very rough thoughts based on some initial investigation:
the test failing here is just an assertion that the initial test data is seeded in the remote system. the fixture that's responsible for that seeding explicitly confirms the data is there in the remote system. the fixture isn't what fails. what fails is when we try to execute an access request against that again confirms that same data is there in the remote system. i can't imagine why the access request would provide nondeterministic results if the fixture is confirming that the remote data is there.
the two main ideas i have are:
- the check that the fixture is doing for the remote data is executing a slightly different request against the remote system than the access request ultimately makes under the hood. at first glance, everything looks consistent to me, and there's not a ton of variables
- something more internal to the access request, like the graph traversal isn't identifying the right nodes, etc. i noticed that this recent PR did make some high-touch changes that maybe could be impacting things here? that's a total shot in the dark, though.