Fix resolution of domains with only AAAA records (#10694)#10717
Conversation
|
DCO (Developer Certificate of Origin: http://developercertificate.org/) |
|
Could you maybe add the following to the PR description? Then the issues will be automatically closed; |
|
I wonder if this also addresses #6895 |
|
The tests are failing due to this change, I think something like the following will fix it: diff --git a/tests/handlers/test_send_email.py b/tests/handlers/test_send_email.py
index 6f77b1237..10a1c4a5f 100644
--- a/tests/handlers/test_send_email.py
+++ b/tests/handlers/test_send_email.py
@@ -71,6 +71,8 @@ class _DummyMessage:
class SendEmailHandlerTestCase(HomeserverTestCase):
def test_send_email(self):
"""Happy-path test that we can send email to a non-TLS server."""
+ self.reactor.lookups["localhost"] = "1.2.3.4"
+
h = self.hs.get_send_email_handler()
d = ensureDeferred(
h.send_email(
@@ -82,7 +84,7 @@ class SendEmailHandlerTestCase(HomeserverTestCase):
(host, port, client_factory, _timeout, _bindAddress) = self.reactor.tcpClients[
0
]
- self.assertEqual(host, "localhost")
+ self.assertEqual(host, "1.2.3.4")
self.assertEqual(port, 25)
# wire it up to an SMTP serverI also think we need to |
richvdh
left a comment
There was a problem hiding this comment.
Thank you for the contribution! Some comments below though.
| hs.get_reactor().connectTCP( | ||
| endpoint = HostnameEndpoint( | ||
| hs.get_reactor(), | ||
| hs.config.redis.redis_host.encode(), | ||
| hs.config.redis.redis_port, | ||
| self._factory, | ||
| ) | ||
| endpoint.connect(self._factory) | ||
| else: | ||
| client_name = hs.get_instance_name() | ||
| self._factory = DirectTcpReplicationClientFactory(hs, client_name, self) | ||
| host = hs.config.worker_replication_host | ||
| port = hs.config.worker_replication_port | ||
| hs.get_reactor().connectTCP(host.encode(), port, self._factory) | ||
| endpoint = HostnameEndpoint(hs.get_reactor(), host.encode(), port) | ||
| endpoint.connect(self._factory) |
There was a problem hiding this comment.
Unfortunately, I think fixing this is more involved. It's to do with how we handle connection failures, and retries.
DirectTcpReplicationClientFactory and RedisDirectTcpReplicationClientFactory are both based on twisted.internet.protocol.ReconnectingClientFactory. If reactor.connectTCP is unable to connect to the hostname, it will call ReconnectingClientFactory.clientConnectionFailed, which will schedule a retry a few seconds later.
HostnameEndpoint.connect does its own set of retries - one for each IP address returned by the hostname lookup - and then calls reactor.connectTCP for each.
So, if we use HostnameEndpoint and ReconnectingClientFactory, we can end up with two different, conflicting, retry systems going on at the same time, leaving us in a terrible mess.
I gather (h/t @glyph) that ReconnectingClientFactory is softly deprecated, and instead we want to use https://twistedmatrix.com/documents/current/api/twisted.application.internet.ClientService.html, which will compose properly with a HostnameEndpoint.
So, I think what we want to do is create a ClientService in the constructor of ReplicationCommandHandler, and then we can just call ClientService.startService in start_replication.
Now, all of this is getting rather complicated. I think you should split up this PR - let's just fix the SMTP sender for now, and then consider fixing the replication connections in a later PR?
| endpoint = HostnameEndpoint( | ||
| reactor, host.encode(), port, timeout=30, bindAddress=None | ||
| ) | ||
| endpoint.connect(factory) |
There was a problem hiding this comment.
this has the same problem as ReplicationCommandHandler.
| endpoint = HostnameEndpoint( | ||
| reactor, smtphost, smtpport, timeout=30, bindAddress=None | ||
| ) | ||
| endpoint.connect(factory) |
There was a problem hiding this comment.
As @clokep said, I think we need to await this. In fact, for logcontext-related reasons too annoying to explain here, we need to:
| endpoint.connect(factory) | |
| await make_deferred_yieldable(endpoint.connect(factory)) |
|
Hello @xfk, just checking in to see whether you're around to tackle the requested changes here? |
|
I made some progress, but didn't get a chance to finish. They are a bit more involved than anticipated. |
No worries. Feel free to ask questions if you hit a roadblock. We're happy to help! |
|
ping - what's the best way to move forward with this one? |
|
Hello @xfk! Are you around to continue working on this PR? If fixing the full problem is too daunting for now, you may want to split up the PR into more manageable chunks, landing what's figured out so far for now, as suggested in https://github.com/matrix-org/synapse/pull/10717/files#r699626644? |
|
Sadly it looks like @xfk no longer has time to work on this, so I'm going to close it. |
Use twisted HostnameEndpoint instead of connectTCP method to successfully resolve domains with only AAAA DNS records.
Fixes #10694
Fixes #7720