Skip to content

Edpalenc/sac generator fix#52

Merged
Edilmo merged 12 commits intoreleases/0.8.6from
edpalenc/sac-generator-fix
Jan 28, 2021
Merged

Edpalenc/sac generator fix#52
Edilmo merged 12 commits intoreleases/0.8.6from
edpalenc/sac-generator-fix

Conversation

@Edilmo
Copy link

@Edilmo Edilmo commented Jan 26, 2021

Why are these changes needed?

There are some generators swallowing StopIteration exceptions in SAC execution plan.
The fix take advantage of PEP 479 to address the problem.
For more details about this PEP, here is the CPython PR.

Related issue number

None

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Edilmo Edilmo merged commit 5f53f6b into releases/0.8.6 Jan 28, 2021
@Edilmo Edilmo deleted the edpalenc/sac-generator-fix branch January 28, 2021 22:10

thread_local = threading.local()

UNION_MAX_PULL = 1000
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this value hardcoded and not accesible, with a value of 100.
The value controls how many pull operations are going to be tried before stoping (StopIteration) an UNION of PARALLEL-ITERATORS.
Also, before, all pull reties over an UNION of PARALLEL-ITERATORS were made one after the other without any waiting time. This was a too optimistic approach that could lead to a lot of false-negative when the source of one of the iterators in the UNION is slow (or even worst if all iterators are slow).

Here we are not only increasing the amount of retries to 1000 but also introducing an exponential backoff logic in these retries operations.

There are some trade-off with this approach though:

  • FAST sources: the exponential backoff logic could slow down a little the pipeline when some sort of intermittent failure or delays.
  • SLOW sources: the default values could be too small if the sources are extremely slow.

Edilmo added a commit that referenced this pull request Feb 4, 2021
* fixing broken generators

* improving fix for broken generators

* Remove npm authenticate

* Disable Lint Job

* Testing parallel iterators in finite sequences explicitly

* Tuning some tests

* Remove logging

* Tuning timeouts

* More fine tuning

* Improving Parallel Iterators tests

* Increasing time

* Increasing time
# Conflicts:
#	ci/azure_pipelines/templates/gcs-python.yml
#	ci/azure_pipelines/templates/multi-node.yml
#	ci/azure_pipelines/templates/rlib-regression-tf-2.yml
#	python/ray/tests/test_memory_scheduling.py
#	python/ray/util/iter.py
#	python/ray/util/sgd/tests/test_torch.py
#	rllib/execution/concurrency_ops.py
#	rllib/execution/replay_buffer.py
#	rllib/execution/replay_ops.py
#	rllib/tests/test_execution.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant