Skip to content

feat(client): implement MultiResolver with Healthcheck and recovery#64

Open
hiddify-com wants to merge 3 commits intonet2share:mainfrom
hiddify:multidns
Open

feat(client): implement MultiResolver with Healthcheck and recovery#64
hiddify-com wants to merge 3 commits intonet2share:mainfrom
hiddify:multidns

Conversation

@hiddify-com
Copy link
Copy Markdown

@hiddify-com hiddify-com commented Mar 30, 2026

you can now use multiple time -udp -dot or -doh flags

it needs more tests

@selfishblackberry177
Copy link
Copy Markdown
Collaborator

Thanks for your contribution. Much appreciated.

I was thinking of this PR for days, BUT one major issue with the current approach is the use of round-robin. This means that if a single resolver goes down, it can break all connections.

A better design would be to maintain multiple sessions, each tied to a specific resolver, and then distribute streams across those sessions. With that setup, if one resolver fails, only some streams are affected and they can recover with retries, instead of the entire session collapsing.

In the current implementation, a resolver failure effectively kills the whole session. In the alternative approach, failures are more isolated and recoverable.

Also, even the current design could be implemented by using a custom made DNS. For example, you could run something like AdGuard DNSProxy and handle load balancing there, with additional features such as least-latency routing and other optimizations.

That would provide a more advanced and flexible version of what is currently implemented, so I would say we have to implement multiple session holding over multiple dns instead of this approach.

@hiddify-com
Copy link
Copy Markdown
Author

I do not fully agree with your point.

First, we had already implemented multiple sessions over multiple DNS resolvers in the past, but we encountered real operational issues that led us to introduce the multi-resolver approach for a single DNSTT session instead.

  1. Resolver failure does not drop the connection
    The KCP layer is responsible for retransmissions. If one resolver goes down, the connection itself will not be dropped. Packets can be retransmitted through other resolvers, and we can implement even better strategies in the future.

  2. Single resolver per connection limits throughput
    Using a single resolver per connection creates a bandwidth limitation for a single stream, for example an SSH connection. With multiple resolvers per session, we can effectively increase upload/download throughput for a single connection.

  3. Small QNAME length increases request count significantly
    When we use a small qname-len, the number of client requests becomes extremely high. For example, uploading 100 KB may require at least ~2000 DNS requests (ignoring other overhead). Some resolvers may start blocking or rate-limiting these requests, and even if we reconnect using another resolver, the new connection will face the same issue again.

  4. External DNS load balancing is not ideal for this use case
    Using an external DNS server (like AdGuard DNSProxy) for load balancing or least-latency routing adds complexity for users, and those tools are not optimized for our tunneling use case.

Instead, we can extend the current implementation by collecting statistics per DNS resolver per tunnel. For example, if we track incoming and outgoing packet counts per resolver, we can automatically detect blocked resolvers and stop sending traffic through them for a period.

  1. Least latency is not really the main concern
    Latency measurement of dns servers is not a good metric because those responses are heavily cached across resolvers. We are using DNS as a tunnel, so the important delay is the delay to the final server, and the main source of latency is usually the GFW. The latency differences between resolvers are likely not very significant.

  2. KCP window and retransmission behavior
    One limitation is that different resolver latencies can introduce high RTT variance, which may lead to spurious retransmissions and suboptimal congestion/window behavior in KCP. However, by tuning KCP parameters, we can mitigate some of these effects.

  3. Real-World Testing and Iterative Improvement
    That said, this implementation provides an opportunity to test and evaluate different scenarios in real-world conditions and iteratively improve the algorithm over time. It would be more valuable to observe its behavior in practice with end users.

@hiddify-com hiddify-com changed the title new: add multi resolver feat(client): implement MultiResolver with Healthcheck and recovery Apr 1, 2026
@hiddify-com hiddify-com marked this pull request as ready for review April 1, 2026 18:06
@hiddify-com
Copy link
Copy Markdown
Author

hiddify-com commented Apr 1, 2026

We have implemented MultiResolver with health checks and automatic recovery.

To monitor the status of resolvers, run the application with:

-log-level trace

This will show the resolver state and switching behavior.

If only a single resolver is configured, the system will behave the same as the previous implementation, and the old behavior will be preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants