-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Issue 11485][C++] Connect timer cancellation does not call timeout callback #11486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
merlimat
merged 1 commit into
apache:master
from
Vanlightly:cpp-connect-timeout-cancellation
Jul 30, 2021
Merged
[Issue 11485][C++] Connect timer cancellation does not call timeout callback #11486
merlimat
merged 1 commit into
apache:master
from
Vanlightly:cpp-connect-timeout-cancellation
Jul 30, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Master issue: apache#11485 When the PeriodicTask timer is cancelled it calls the handler which in turn ends up invoking the callback which closes the socket. This prevents the iteration over multiple endpoints until the connection succeeds. This change checks the error_code in the handler and returns if it is operation_cancelled, avoiding erroneously calling the callback.
BewareMyPower
approved these changes
Jul 29, 2021
sijie
approved these changes
Jul 29, 2021
merlimat
approved these changes
Jul 30, 2021
BewareMyPower
added a commit
that referenced
this pull request
Aug 8, 2021
…11557) Fixes #11551 ### Motivation Currently there're some bugs of C++ client and some tests cannot pass: 1. Introduced from #10601 because it changed the behavior of the admin API to get partition metadata while the C++ implementation relies on the original behavior to create topics automatically. So any test that uses HTTP lookup will fail. - AuthPluginTest.testTlsDetectHttps - AuthPluginToken.testTokenWithHttpUrl - BasicEndToEndTest.testHandlerReconnectionLogic - BasicEndToEndTest.testV2TopicHttp - ClientDeduplicationTest.testProducerDeduplication 2. Introduced from #11029 and #11486 , the implementation will iterate more than once even there's only one valid resolved IP address. - ClientTest.testConnectTimeout In addition, there's an existed flaky test from very early time: ClientTest.testLookupThrottling. Python tests are also broken. Because it must run after all C++ tests passed, they're also not exposed. 1. Some tests in `pulsar_test.py` might encounter `Timeout` error when creating producers or consumers. 2. Some tests in `schema_test.py` failed because some comparisons between two `ComplexRecord`s failed. Since the CI test of C++ client would never fail after #10309 (will be fixed by #11575), all PRs about C++ or Python client are not verified even if CI passed. Before #11575 is merged, we need to fix all existed bugs of C++ client. ### Modifications Corresponding to the above tests group, this PR adds following modifications: 1. Add the `?checkAllowAutoCreation=true` URL suffix to allow HTTP lookup to create topics automatically. 2. When iterating through a resolved IP list, increase the iterator first, then run the connection timer and try to connect the next IP. Regarding to the flaky `testLookupThrottling`, this PR adds a `client.close()` at the end of test and fix the `ClientImpl::close` implementation. Before this PR, if there're no producers or consumers in a client, the `close()` method wouldn't call `shutdown()` to close connection poll and executors. Only after the `Client` instance was destructed would the `shutdown()` method be called. In this case, this PR calls `handleClose` instead of invoking callback directly. In addition, change the log level of this test to debug. This PR also fixes the failed timeout Python tests, some are caused by incorrect import of classes, some are caused by `client` was not closed. Regarding to Python schema tests, in Python2, `self.__ne__(other)` is not equivalent to `not self.__eq__(other)` when the default `__eq__` implementation is overwritten. If a `Record` object has a field whose type is also `Record`, the `Record.__ne__` method will be called, see https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client-cpp/python/pulsar/schema/definition.py#L138-L139 but it just uses the default implementation to check whether they're not equal. The custom `__eq__` method won't be called. Therefore, this PR implement `Record.__ne__` explicitly to call `Record.__eq__` so that the comparison will work for Python2. ### Verifying this change We can only check the workflow output to verify this change.
LeBW
pushed a commit
to LeBW/pulsar
that referenced
this pull request
Aug 9, 2021
…pache#11557) Fixes apache#11551 ### Motivation Currently there're some bugs of C++ client and some tests cannot pass: 1. Introduced from apache#10601 because it changed the behavior of the admin API to get partition metadata while the C++ implementation relies on the original behavior to create topics automatically. So any test that uses HTTP lookup will fail. - AuthPluginTest.testTlsDetectHttps - AuthPluginToken.testTokenWithHttpUrl - BasicEndToEndTest.testHandlerReconnectionLogic - BasicEndToEndTest.testV2TopicHttp - ClientDeduplicationTest.testProducerDeduplication 2. Introduced from apache#11029 and apache#11486 , the implementation will iterate more than once even there's only one valid resolved IP address. - ClientTest.testConnectTimeout In addition, there's an existed flaky test from very early time: ClientTest.testLookupThrottling. Python tests are also broken. Because it must run after all C++ tests passed, they're also not exposed. 1. Some tests in `pulsar_test.py` might encounter `Timeout` error when creating producers or consumers. 2. Some tests in `schema_test.py` failed because some comparisons between two `ComplexRecord`s failed. Since the CI test of C++ client would never fail after apache#10309 (will be fixed by apache#11575), all PRs about C++ or Python client are not verified even if CI passed. Before apache#11575 is merged, we need to fix all existed bugs of C++ client. ### Modifications Corresponding to the above tests group, this PR adds following modifications: 1. Add the `?checkAllowAutoCreation=true` URL suffix to allow HTTP lookup to create topics automatically. 2. When iterating through a resolved IP list, increase the iterator first, then run the connection timer and try to connect the next IP. Regarding to the flaky `testLookupThrottling`, this PR adds a `client.close()` at the end of test and fix the `ClientImpl::close` implementation. Before this PR, if there're no producers or consumers in a client, the `close()` method wouldn't call `shutdown()` to close connection poll and executors. Only after the `Client` instance was destructed would the `shutdown()` method be called. In this case, this PR calls `handleClose` instead of invoking callback directly. In addition, change the log level of this test to debug. This PR also fixes the failed timeout Python tests, some are caused by incorrect import of classes, some are caused by `client` was not closed. Regarding to Python schema tests, in Python2, `self.__ne__(other)` is not equivalent to `not self.__eq__(other)` when the default `__eq__` implementation is overwritten. If a `Record` object has a field whose type is also `Record`, the `Record.__ne__` method will be called, see https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client-cpp/python/pulsar/schema/definition.py#L138-L139 but it just uses the default implementation to check whether they're not equal. The custom `__eq__` method won't be called. Therefore, this PR implement `Record.__ne__` explicitly to call `Record.__eq__` so that the comparison will work for Python2. ### Verifying this change We can only check the workflow output to verify this change.
hangc0276
pushed a commit
that referenced
this pull request
Aug 12, 2021
…11557) Fixes #11551 ### Motivation Currently there're some bugs of C++ client and some tests cannot pass: 1. Introduced from #10601 because it changed the behavior of the admin API to get partition metadata while the C++ implementation relies on the original behavior to create topics automatically. So any test that uses HTTP lookup will fail. - AuthPluginTest.testTlsDetectHttps - AuthPluginToken.testTokenWithHttpUrl - BasicEndToEndTest.testHandlerReconnectionLogic - BasicEndToEndTest.testV2TopicHttp - ClientDeduplicationTest.testProducerDeduplication 2. Introduced from #11029 and #11486 , the implementation will iterate more than once even there's only one valid resolved IP address. - ClientTest.testConnectTimeout In addition, there's an existed flaky test from very early time: ClientTest.testLookupThrottling. Python tests are also broken. Because it must run after all C++ tests passed, they're also not exposed. 1. Some tests in `pulsar_test.py` might encounter `Timeout` error when creating producers or consumers. 2. Some tests in `schema_test.py` failed because some comparisons between two `ComplexRecord`s failed. Since the CI test of C++ client would never fail after #10309 (will be fixed by #11575), all PRs about C++ or Python client are not verified even if CI passed. Before #11575 is merged, we need to fix all existed bugs of C++ client. ### Modifications Corresponding to the above tests group, this PR adds following modifications: 1. Add the `?checkAllowAutoCreation=true` URL suffix to allow HTTP lookup to create topics automatically. 2. When iterating through a resolved IP list, increase the iterator first, then run the connection timer and try to connect the next IP. Regarding to the flaky `testLookupThrottling`, this PR adds a `client.close()` at the end of test and fix the `ClientImpl::close` implementation. Before this PR, if there're no producers or consumers in a client, the `close()` method wouldn't call `shutdown()` to close connection poll and executors. Only after the `Client` instance was destructed would the `shutdown()` method be called. In this case, this PR calls `handleClose` instead of invoking callback directly. In addition, change the log level of this test to debug. This PR also fixes the failed timeout Python tests, some are caused by incorrect import of classes, some are caused by `client` was not closed. Regarding to Python schema tests, in Python2, `self.__ne__(other)` is not equivalent to `not self.__eq__(other)` when the default `__eq__` implementation is overwritten. If a `Record` object has a field whose type is also `Record`, the `Record.__ne__` method will be called, see https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client-cpp/python/pulsar/schema/definition.py#L138-L139 but it just uses the default implementation to check whether they're not equal. The custom `__eq__` method won't be called. Therefore, this PR implement `Record.__ne__` explicitly to call `Record.__eq__` so that the comparison will work for Python2. ### Verifying this change We can only check the workflow output to verify this change. (cherry picked from commit 4919a82)
bharanic-dev
pushed a commit
to bharanic-dev/pulsar
that referenced
this pull request
Mar 18, 2022
…he#11486) Master issue: apache#11485 When the PeriodicTask timer is cancelled it calls the handler which in turn ends up invoking the callback which closes the socket. This prevents the iteration over multiple endpoints until the connection succeeds. This change checks the error_code in the handler and returns if it is operation_cancelled, avoiding erroneously calling the callback. Co-authored-by: Jack Vanlightly <jvanlightly@splunk.com>
bharanic-dev
pushed a commit
to bharanic-dev/pulsar
that referenced
this pull request
Mar 18, 2022
…pache#11557) Fixes apache#11551 ### Motivation Currently there're some bugs of C++ client and some tests cannot pass: 1. Introduced from apache#10601 because it changed the behavior of the admin API to get partition metadata while the C++ implementation relies on the original behavior to create topics automatically. So any test that uses HTTP lookup will fail. - AuthPluginTest.testTlsDetectHttps - AuthPluginToken.testTokenWithHttpUrl - BasicEndToEndTest.testHandlerReconnectionLogic - BasicEndToEndTest.testV2TopicHttp - ClientDeduplicationTest.testProducerDeduplication 2. Introduced from apache#11029 and apache#11486 , the implementation will iterate more than once even there's only one valid resolved IP address. - ClientTest.testConnectTimeout In addition, there's an existed flaky test from very early time: ClientTest.testLookupThrottling. Python tests are also broken. Because it must run after all C++ tests passed, they're also not exposed. 1. Some tests in `pulsar_test.py` might encounter `Timeout` error when creating producers or consumers. 2. Some tests in `schema_test.py` failed because some comparisons between two `ComplexRecord`s failed. Since the CI test of C++ client would never fail after apache#10309 (will be fixed by apache#11575), all PRs about C++ or Python client are not verified even if CI passed. Before apache#11575 is merged, we need to fix all existed bugs of C++ client. ### Modifications Corresponding to the above tests group, this PR adds following modifications: 1. Add the `?checkAllowAutoCreation=true` URL suffix to allow HTTP lookup to create topics automatically. 2. When iterating through a resolved IP list, increase the iterator first, then run the connection timer and try to connect the next IP. Regarding to the flaky `testLookupThrottling`, this PR adds a `client.close()` at the end of test and fix the `ClientImpl::close` implementation. Before this PR, if there're no producers or consumers in a client, the `close()` method wouldn't call `shutdown()` to close connection poll and executors. Only after the `Client` instance was destructed would the `shutdown()` method be called. In this case, this PR calls `handleClose` instead of invoking callback directly. In addition, change the log level of this test to debug. This PR also fixes the failed timeout Python tests, some are caused by incorrect import of classes, some are caused by `client` was not closed. Regarding to Python schema tests, in Python2, `self.__ne__(other)` is not equivalent to `not self.__eq__(other)` when the default `__eq__` implementation is overwritten. If a `Record` object has a field whose type is also `Record`, the `Record.__ne__` method will be called, see https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client-cpp/python/pulsar/schema/definition.py#L138-L139 but it just uses the default implementation to check whether they're not equal. The custom `__eq__` method won't be called. Therefore, this PR implement `Record.__ne__` explicitly to call `Record.__eq__` so that the comparison will work for Python2. ### Verifying this change We can only check the workflow output to verify this change.
BewareMyPower
pushed a commit
that referenced
this pull request
Sep 16, 2022
Master issue: #11485 When the PeriodicTask timer is cancelled it calls the handler which in turn ends up invoking the callback which closes the socket. This prevents the iteration over multiple endpoints until the connection succeeds. This change checks the error_code in the handler and returns if it is operation_cancelled, avoiding erroneously calling the callback. Co-authored-by: Jack Vanlightly <jvanlightly@splunk.com> (cherry picked from commit 8070a82)
|
@Vanlightly Please provide a correct documentation label for your PR. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #11485
Motivation
Multiple endpoints, for example "localhost", "127.0.0.1", may need to be iterated over before a connection succeeds. In each iteration the connectTimeoutTask (PeriodicTask) is stopped and started. Stopping it cancels the timer, which fires the handler (handleTimeout) which in turn calls the connect timeout callback which closes the socket. This causes the createProducerAsync to return ResultConnectError. This causes a large number of tests to fail.
Modifications
This change checks the error_code in the PeriodicTask timer handler and returns if it is operation_cancelled, avoiding erroneously calling the callback.
Verifying this change
All existing tests pass.
Does this pull request potentially affect one of the following parts:
None
Documentation
For contributor
For this PR, do we need to update docs?