Skip to content

Conversation

@patterniha
Copy link
Collaborator

similar to #4611 but new-features removed.

so, after this PR merged, i open new PR for new-features.

fixed bugs and refactor and optimization are related and cannot be separated further.

Fixed bugs:

  1. When the cache is disabled (disableCache = true), instead of not using the cache and sending a new IP-query, it both uses the cache and sends new IP-Query!
    This is because the order of the codes in nameserver_xxx-QueryIP function is wrong and the ips, ttl, err := s.findIPsForDomain(fqdn, option) should be after select-case code. link
    also we should not use for loop for this part of code.

  2. Suppose dialing a dns-server is unsuccessful(for tcp/quic base DNS)[for example receiving rst-ack after sending syn or receiving http-error response for doh], Instead of immediately returning an error and trying the next DNS-server-fallback, it waits until the timeout ends and then tries the next fallback!

  3. When ipOption.IPv4Enable and ipOption.IPv6Enable are both true, two IP-Query(A, AAAA) are sent and it waits for both responses to be received, then it merges the responses and returns, but suppose only AAAA-response is received and in the meantime, another request comes, while the first request is still waiting for A-response, For the second request, since AAAA-record is in the cache, it uses the cache and incorrectly only v6-IPs is returned for the second request! while the second request must wait for A-response like the first request.
    also, suppose the A-record expire sooner than AAAA-record, so So until AAAA-record expires we only have IPv6!
    also, suppose we receive only AAAA-response and the A-response dropped for a request,
    so for all subsequent requests and for 600 seconds we only have ipv6! and if our network is IPv4 only, for 600 seconds we cannot access internet!!!
    this problem affect internal usage of built-in-DNS (for example domainStrategy = "UseIP" or domainStrategy = "IPOnDemand" for routing) but not affect client/browser request, because for client/browser IP-request we have two distinct request for IPv4 and IPv6(pass through dns-proxy) but for UseIP/IPOnDemand requests we have one merged request, and this bug only affect merged requests.

  4. when IP-record-until-expire-time is less than 1 seconds and the IP-list is not empty, it returns IPs with TTL = 0 but we shouldn't set record-TTL to 0. The number 0 is not defined in the standard, and it may cause DNS information to be ignored or rejected,
    so after converting to uint32, It should be rounded up, not down. link

  5. the IsOwnLink function in "app > dns > dns.go" not updated after adding tag for each DNS-server.

  6. instead of creating new GeoIPMatcherContainer in "dns.go" we should use GlobalGeoIPContainer to reduce memory usage. GeoIPMatcherContainer is implemented to reduce memory usage, so we should only have one instance of GeoIPMatcherContainer in the entire code.

  7. when no IPs match expectedIPs, we should return ErrEmptyResponse instead of errExpectedIPNonMatch,
    otherwise, DNS-proxy send no response to client/browser DNS-query.

  8. in multi_error.go > AllEqual errors compare with == instead of errors.Is !

  9. CleanUP task should start(check-executed) in addPendingRequest function in nameserver_udp.go

refactor:

Most codes in nameserver files are duplicated, Cleanup, findIPsForDomain, updateIP, ...

These functions control how to read and write in the cache.

so I add CacheController struct to implement these functions in one place, and each tcp/quic/udp/doh-NameServer struct has a CacheController.

So now the codes are standard, and these function are only written once.

optimization:

Also, some optimizations have been made which are clear in the code and do not need to be explained.

for example in

ips, ttl, err := c.server.QueryIP(ctx, domain, c.clientIP, option, disableCache)

you pass c.clientIP and disableCache as function arguments, but these are fixed-options and should not pass as function arguments each time "QueryIP" call.

@RPRX RPRX changed the title DNS: fix bugs, refactor, optimization DNS: Fix some bugs; Refactors; Optimizations Apr 29, 2025
@RPRX RPRX merged commit aa4134f into XTLS:main Apr 29, 2025
35 checks passed
@RPRX
Copy link
Member

RPRX commented Apr 29, 2025

感谢 PR,我对 @patterniha 的能力还是有一些信心的,所以 #4661 (comment) ,今天或明天我发个四月累积更新版本让用户测试

@patterniha
Copy link
Collaborator Author

@RPRX

Thank you very much for trusting me.

@j2rong4cn
Copy link
Contributor

I need finalQuery

@patterniha
Copy link
Collaborator Author

I need finalQuery

#4666

maoxikun added a commit to maoxikun/Xray-core that referenced this pull request Aug 22, 2025
@Meo597
Copy link
Collaborator

Meo597 commented Oct 21, 2025

https://github.com/XTLS/Xray-core/blob/b69a376aa1b61ad94c9c05b2c2cd263567b8cc23/app/dns/cache_controller.go#L59-69

  1. 这里 else 分支明显的无意义操作,不过什么都不影响
  2. 撑死用不了几兆内存,负载高时几乎永不可能被执行;负载小时频繁被执行反而更多 cpu 时间浪费在扩容 map 上。真想优化应该加个尺寸判断应对低负载,并定期拷贝 map 应对高负载
  3. 一分钟清理一次没必要,因为大部分权威 ttl 默认 300

好明显的瑕疵,果然没有人 review 过

@patterniha
Copy link
Collaborator Author

i just put cache-related-codes to "cache_controller.go" file, and just copied the codes.
this else is also found in original v2ray code:
https://github.com/v2fly/v2ray-core/blob/dbf25a4c1cc776251d5d36ffc70c62a881bcb5f5/app/dns/nameserver_doh.go#L151-L153

so your comment is not related to this PR, if you want to fix it you can fix it in one your open PR, or open new PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants