Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Fix resolution of domains with only AAAA records (#10694)#10717

Closed
xfk wants to merge 3 commits into
matrix-org:developfrom
xfk:develop
Closed

Fix resolution of domains with only AAAA records (#10694)#10717
xfk wants to merge 3 commits into
matrix-org:developfrom
xfk:develop

Conversation

@xfk
Copy link
Copy Markdown

@xfk xfk commented Aug 30, 2021

Use twisted HostnameEndpoint instead of connectTCP method to successfully resolve domains with only AAAA DNS records.

Fixes #10694
Fixes #7720

@clokep
Copy link
Copy Markdown
Member

clokep commented Aug 30, 2021

@xfk Could you please sign-off on your contribution? You can just add it as a comment on the PR. 👍

@xfk
Copy link
Copy Markdown
Author

xfk commented Aug 30, 2021

DCO (Developer Certificate of Origin: http://developercertificate.org/)
Signed-off-by: Fran Kristo fran.kristo42@gmail.com

@clokep clokep requested a review from a team August 30, 2021 12:51
@ShadowJonathan
Copy link
Copy Markdown
Contributor

Could you maybe add the following to the PR description? Then the issues will be automatically closed;

Fixes #10694
Fixes #7720

@ShadowJonathan
Copy link
Copy Markdown
Contributor

I wonder if this also addresses #6895

@clokep
Copy link
Copy Markdown
Member

clokep commented Aug 30, 2021

The tests are failing due to this change, I think something like the following will fix it:

diff --git a/tests/handlers/test_send_email.py b/tests/handlers/test_send_email.py
index 6f77b1237..10a1c4a5f 100644
--- a/tests/handlers/test_send_email.py
+++ b/tests/handlers/test_send_email.py
@@ -71,6 +71,8 @@ class _DummyMessage:
 class SendEmailHandlerTestCase(HomeserverTestCase):
     def test_send_email(self):
         """Happy-path test that we can send email to a non-TLS server."""
+        self.reactor.lookups["localhost"] = "1.2.3.4"
+
         h = self.hs.get_send_email_handler()
         d = ensureDeferred(
             h.send_email(
@@ -82,7 +84,7 @@ class SendEmailHandlerTestCase(HomeserverTestCase):
         (host, port, client_factory, _timeout, _bindAddress) = self.reactor.tcpClients[
             0
         ]
-        self.assertEqual(host, "localhost")
+        self.assertEqual(host, "1.2.3.4")
         self.assertEqual(port, 25)
 
         # wire it up to an SMTP server

I also think we need to await each connect call since that returns a Deferred?

Copy link
Copy Markdown
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Some comments below though.

Comment on lines -317 to +330
hs.get_reactor().connectTCP(
endpoint = HostnameEndpoint(
hs.get_reactor(),
hs.config.redis.redis_host.encode(),
hs.config.redis.redis_port,
self._factory,
)
endpoint.connect(self._factory)
else:
client_name = hs.get_instance_name()
self._factory = DirectTcpReplicationClientFactory(hs, client_name, self)
host = hs.config.worker_replication_host
port = hs.config.worker_replication_port
hs.get_reactor().connectTCP(host.encode(), port, self._factory)
endpoint = HostnameEndpoint(hs.get_reactor(), host.encode(), port)
endpoint.connect(self._factory)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think fixing this is more involved. It's to do with how we handle connection failures, and retries.

DirectTcpReplicationClientFactory and RedisDirectTcpReplicationClientFactory are both based on twisted.internet.protocol.ReconnectingClientFactory. If reactor.connectTCP is unable to connect to the hostname, it will call ReconnectingClientFactory.clientConnectionFailed, which will schedule a retry a few seconds later.

HostnameEndpoint.connect does its own set of retries - one for each IP address returned by the hostname lookup - and then calls reactor.connectTCP for each.

So, if we use HostnameEndpoint and ReconnectingClientFactory, we can end up with two different, conflicting, retry systems going on at the same time, leaving us in a terrible mess.

I gather (h/t @glyph) that ReconnectingClientFactory is softly deprecated, and instead we want to use https://twistedmatrix.com/documents/current/api/twisted.application.internet.ClientService.html, which will compose properly with a HostnameEndpoint.

So, I think what we want to do is create a ClientService in the constructor of ReplicationCommandHandler, and then we can just call ClientService.startService in start_replication.


Now, all of this is getting rather complicated. I think you should split up this PR - let's just fix the SMTP sender for now, and then consider fixing the replication connections in a later PR?

Comment on lines +368 to +371
endpoint = HostnameEndpoint(
reactor, host.encode(), port, timeout=30, bindAddress=None
)
endpoint.connect(factory)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has the same problem as ReplicationCommandHandler.

endpoint = HostnameEndpoint(
reactor, smtphost, smtpport, timeout=30, bindAddress=None
)
endpoint.connect(factory)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @clokep said, I think we need to await this. In fact, for logcontext-related reasons too annoying to explain here, we need to:

Suggested change
endpoint.connect(factory)
await make_deferred_yieldable(endpoint.connect(factory))

@richvdh richvdh added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Aug 31, 2021
@anoadragon453
Copy link
Copy Markdown
Member

Hello @xfk, just checking in to see whether you're around to tackle the requested changes here?

@xfk
Copy link
Copy Markdown
Author

xfk commented Sep 14, 2021

@anoadragon453

I made some progress, but didn't get a chance to finish. They are a bit more involved than anticipated.

@anoadragon453
Copy link
Copy Markdown
Member

@anoadragon453

I made some progress, but didn't get a chance to finish. They are a bit more involved than anticipated.

No worries. Feel free to ask questions if you hit a roadblock. We're happy to help!

@telmich
Copy link
Copy Markdown

telmich commented Dec 3, 2021

ping - what's the best way to move forward with this one?

@anoadragon453
Copy link
Copy Markdown
Member

Hello @xfk! Are you around to continue working on this PR?

If fixing the full problem is too daunting for now, you may want to split up the PR into more manageable chunks, landing what's figured out so far for now, as suggested in https://github.com/matrix-org/synapse/pull/10717/files#r699626644?

@richvdh
Copy link
Copy Markdown
Member

richvdh commented Apr 21, 2022

Sadly it looks like @xfk no longer has time to work on this, so I'm going to close it.

@richvdh richvdh closed this Apr 21, 2022
@telmich
Copy link
Copy Markdown

telmich commented Apr 21, 2022

@richvdh Can we keep this open? I know that @xfk is not around anymore, but we would be willing to put a bounty out on this one to improve the IPv6 support in synapse.

@richvdh
Copy link
Copy Markdown
Member

richvdh commented Apr 22, 2022

@telmich: the issues #10694 and #7720 are still open. I don't think there's any value in keeping this incomplete PR open.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In IPv6-only environment, can't connect to Redis due to DNS only looking for A records Synapse can't connect to an IPv6-only mail server via hostname

6 participants