Skip to content

Conversation

@Meo597
Copy link
Collaborator

@Meo597 Meo597 commented Nov 9, 2025

1. geoip:cn geoip:!cn 同时用,不会再占两份内存


2. 多 matcher 现在会智能合并,更有机会被内部 IPSet 合并 CIDR 以节省内存
我用的 IP 库更准,我把国家按大洲写在一条 rule 里,以前会吃很多内存

Before

localhost [~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.9G      866.2M      938.0M      192.0K      174.1M      964.4M
Swap:        256.0M           0      256.0M

After

localhost [~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.9G      599.7M        1.2G      192.0K      175.4M        1.2G
Swap:        256.0M           0      256.0M

3. 启动速度更快,现在时间减半
我用的 IP 库体积约 60m 以前光启动就要三分钟


4. 一条 rule 含多个 matcher 时提升 2~10 倍,比快更快
Before

BenchmarkMultiGeoIPMatcher-4   	 2517230	       480.5 ns/op	      28 B/op	       2 allocs/op
PASS
ok  	github.com/xtls/xray-core/app/router	25.065s

After

BenchmarkMultiGeoIPMatcher-4   	 7061422	       190.3 ns/op	      28 B/op	       2 allocs/op
PASS
ok  	github.com/xtls/xray-core/app/router	24.173s

5. 多 IP 匹配、过滤速度更快,具体取决于 DNS 解析结果 /24 (/48) 的聚合情况
以 youtube 为例约提升 20% 左右
这辈子都不可能写 benchmark 和 test 的

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 9, 2025

review 时突然意识到 xray 目前不支持在一个 rule 里写多个反转的 geoip
因为所有 matcher 间都是 or 关系
ip: [!cn, !us] 等于匹配了个寂寞

现在因为我 merge 了它们,正向和反向各自合为一个 matcher,反向的内部是 and 关系
修复了以前的阴间 bug

@Meo597 Meo597 closed this Nov 10, 2025
@Meo597 Meo597 reopened this Nov 10, 2025
@Meo597

This comment was marked as resolved.

@patterniha
Copy link
Collaborator

patterniha commented Nov 13, 2025

During the review, I suddenly realized that Xray currently does not support writing multiple inverted geoips in a single rule because all matchers are ORed, ip: [!cn, !us]which means the matching is ineffective.

Now, because I merged them, the forward and reverse directions are each combined into a single matcher, and the reverse direction contains an AND relationship. This also fixes a previous, serious bug.

There were no bugs.

"ip": [A, B] means: A U B
"ip": [!A] means: [0.0.0.0/0, ::/0] - A

so "ip": [!A, !B] means: ([0.0.0.0/0, ::/0] - A) U ([0.0.0.0/0, ::/0] - B) = ‍‍‍‍[0.0.0.0/0, ::/0] - (A ∩ B)

or for example "ip": [!A, !B, C] means: ([0.0.0.0/0, ::/0] - A) U ([0.0.0.0/0, ::/0] - B) U C

///

currently there is no option for intersection-IPs, if you need that you can add new option for that and you should not break current logic.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

There were no bugs.

"ip": [A, B] = (ip in A) OR (ip in B) = ip in (A ∪ B)
你完全错误,因为你漏掉了中间那步

@patterniha
Copy link
Collaborator

patterniha commented Nov 13, 2025

Ok, if you still don't understand this is with middle steps:

rule-1: "ip": [A, B] = (ip in A) OR (ip in B) = ip in (A ∪ B)
rule-2: "ip": [!A] = ip in !A = ip not in A = ip in ([0.0.0.0/0, ::/0] - A)

rule-1 and rule-2 -> "ip": [!A, !B] = (ip in !A) OR (ip in !B) =(ip not in A) OR (ip not in B) = (ip in ([0.0.0.0/0, ::/0] - A)) OR (ip in ([0.0.0.0/0, ::/0] - B)) = ip in (([0.0.0.0/0, ::/0] - A) U ([0.0.0.0/0, ::/0] - B)) = ip in ([0.0.0.0/0, ::/0] - (A ∩ B))

///

this is just elementary logic/set-theory.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

把我给看乐了

对于 GeoIP 来说 A 和 B 没有交集
你应该把这个关键信息告诉 AI

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

以及你在 #4666 能写出这种代码
还是不要大谈特谈布尔逻辑,集合论了吧

		if isFound && !reverse {
			newIPs = append(newIPs, ip)
			continue
		}
		if !isFound && reverse {
			newIPs = append(newIPs, ip)
			continue
		}

@patterniha
Copy link
Collaborator

以及你在 #4666 能写出这种代码 还是不要大谈特谈布尔逻辑,集合论了吧

		if isFound && !reverse {
			newIPs = append(newIPs, ip)
			continue
		}
		if !isFound && reverse {
			newIPs = append(newIPs, ip)
			continue
		}

This is completely correct.
Anyway, I don't have time to argue with you, unlike your other PRs, this one has a problem and should not be merged.

according to #5289 (comment)
"ip": [!A, !B] should be: [0.0.0.0/0, ::/0] - (A ∩ B). you can understand with a simple Venn diagram
but this PR mistakenly changes it to: [0.0.0.0/0, ::/0] - (A ∪ B)

Anyway, @RPRX can also decide whether [0.0.0.0/0, ::/0] - (A ∩ B) is correct or [0.0.0.0/0, ::/0] - (A ∪ B) is correct for "ip": [!A, !B].

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

你用 AI 写了一大堆集合运算的推导
但你忘了告诉 AI 最基本的前提
A B 没有交集

但凡上过高中,都能轻易的算出 !A or !B 结果是所有 IP 地址,懂吗?

以及你 #4666 代码问题,但凡写过两天代码都知道应该用 isFound != reverse

@Fangliding
Copy link
Member

其实就行为这点上 @patterniha 可能是对的
按普通人平时使用反转规则的时候比如搜索引擎 是默认用 AND 连接多个条件的 所以 !关键词A !关键词B 是有A或B就完全排除
但是IP规则内部是OR条件 按正常的规则运算 (!A) or (!B) 如果AB完全互不包含那确实会得出any 在传统geoip: 下它们也确实互不包含 但是它们可能来自extfile 更别说现在的geoip里好像也有除了country之外的奇怪分类 总之虽然最后组合出的逻辑会很疑惑但也不一定完全无意义 而且确实破坏语义了

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

问题的关键点在于多个 ! 出现在一个 rule 的阴间用法真的有人干吗?

按照最主流的 geoip 用法绝不可能同时用两个 !
即便 Loyalsoldier 的 geoip 中现在有 telegram 等特殊分类

A B 有机会产生交集
我自己想了下,两个 ! 我想个十秒钟还能理解,三个 ! 我自己都不行了

归根结底这是一个设计问题:

  1. 愿意放弃极少部分用阴间语义使用两个 OR ! 的用户
  2. 还是接纳未来可能希望使用多个 AND ! 的用户

改成 2 只需要三秒,我把 neg 那行给注释掉就行,换来无非就是不痛不痒的 ! 的性能下降

至于 @patterniha 就算了吧
他压根不会集合运算,都没能意识到这个更深入的问题

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

原有的 matcher 间 OR 的设计

100% 原作者压根没考虑到 ! 问题
只是为了实现正向更大范围的匹配,简单的糊了一个 matcher 间的 OR

甚至没能想到把正向 matcher 合并会更快并且更加地节省内存
又怎么可能深入考虑 ! 问题呢?

何况当时是 v2 版的 geoip 因此分类间不可能有交集

只是后来 Loyalsoldier 搞出了非标 geoip
分类间意外地有了交集
两个 ! 才被赋予了有意义的语义
但作用极小,属于不为人知的 hack 用法,不懂的人还会搞出 bug

我尝试着理解 ! OR 用法,想写一个有实际意义的用例都写不出来
吭哧了半天这就已经是极限了
ip: [geoip:!telegram, geoip:!us]

你甚至不能在里面写更多 !

所以我说 @patterniha 是在根本不懂集合运算的情况下完全靠 AI 写了一段垃圾回复
我不提出来这个问题,他自己可能都意识不到
他只是为了反对而反对,甚至都没反对在点上

总之多个 ! 功能我自己压根用不上
是严格遵循以前意外的错误带来的阴间语义,还是改成符合人类直觉的形式
决定权在 @Fangliding@RPRX

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

@Fangliding 以及你是否认为有必要 cover >24 >48 可能在不同 geoip 分类中,并且 DNS 查询结果会在一次响应同时包含它们时,当下的启发式算法导致出现 bug 的问题

我觉得不可能叠出这种 buff

@Fangliding
Copy link
Member

Fangliding commented Nov 14, 2025

我不知道你说的BUG具体是什么 不过看标题滤IP速度提升就20%那我觉得还是正确性重要一点 我之前也无聊优化一个几百ns的函数纯当玩 到后面我发现造成两个版本最大的区别是找 crypto/rand.Read 多要了几个字节 这种东西平时拿来用都不眨眼的 所以我一直强调有的东西没必要啥就没必要

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

比如:
1.1.1.1 在 geosite:us
1.1.1.2 在 geosite:cn

www.example.com 同时响应这俩 ip

当下的启发式算法会把 >=24 的合并了去二分查找
一旦上述条件达成(极其罕见)

导致:

  1. DNS 过滤 IP 不准
    比如 exceptIPs: [us] 会带上 cn 的 IP

  2. 路由模块匹配多个 IP 本来应该是空 IP
    现在意外的能匹配 cn 或 us

如果要 cover 这种 edge case 我再实现一个 matcher

@patterniha
Copy link
Collaborator

patterniha commented Nov 14, 2025

@Meo597

Now, with this way of talking, I'm sure that you...

Have you ever wondered why I added the unexpectedIPs option? because you can't generate all possible set-combinations just by having expectedIPs.

also, for router-rules, you can add an option something like unexpectedIPs to router-rules, so you can generate all set-combinations, but there's no need (because we can add another rule with reverse-ip-list for different outbound).

///

currently, if you need !(A ∪ B) you can do:

"expectedIPs": [!A],
"unexpectedIPs": [B]

or simply:

"unexpectedIPs": [A, B]

there is no other way, you can't violate logic just because of your need.
///

in short, Regardless of the application, the logic must be correct first.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

@patterniha 我现在十分确信你根本不知道把 ! 间 AND 到底意味着什么
你甚至都不知道多个,甚至只有两个 ! OR 会发生什么

在你不懂集合运算的情况下,我建议你像机器一样,简单的找个 IP 代入看看会发生什么

@marillindie
Copy link

@Meo597:

@patterniha is logically and theoretically correct, and from my perspective I do believe he knows set theory.

And yes -- you do have a good point that ["geoip:!a", "geoip:!b"] is practically useless when geoip:a and geoip:b are disjoint, which is almost always the case. I suggest instead just make a warning/error if someone try to use multiple geoip:!x.

If $\neg (A \cup B)$ is really necessary, I suggest (personally) either use unexpectedIPs-like pattern for all other occurrences of IP matchers, or maybe introduce a new pattern for AND relations, e.g. geoip:!a,!b,c for $\neg A \cap \neg B \cap C$.
(I add c here just for completeness. It should probably not appear in actual configurations.)

Also, I would like to remind everyone here about De Morgan's laws:
$$\neg (A \cap B) = (\neg A) \cup (\neg B)$$
which can be easily extended to
$$\neg \bigcap_{i=1}^n A_i = \bigcup_{i=1}^n (\neg A_i)$$
(i.e. $\neg (A_1 \cap A_2 \cap ... \cap A_n) = \neg A_1 \cup \neg A_2 \cup ... \cup \neg A_n$)
by mathematical induction, so what would happen with multiple !s OR-ed together is not a hard problem.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

@marillindie

我们都同意 ! OR 几乎无用
而且对于魔改版的 geoip 来说,只可能有两个子集相交,不可能有更多
并且即便在仅两个子集相交的情况下,OR 它们也没有实际意义(从实际需求出发)

把 ! 间由 OR 改为 AND 仅是此 PR 中不值一提的改动
此 PR 也无意实现更复杂的匹配逻辑

从实际需求的角度出发,我不提出来原有的语义问题,根本都没人在意此问题
为了反对而反对是不可接受的

I suggest instead just make a warning/error if someone try to use multiple geoip:!x.

在我看来直接把 ! AND 更实用,更符合人类直觉,搜索引擎们也是这么做的


		if isFound && !reverse {
			newIPs = append(newIPs, ip)
			continue
		}
		if !isFound && reverse {
			newIPs = append(newIPs, ip)
			continue
		}

@patterniha is logically and theoretically correct, and from my perspective I do believe he knows set theory.

以及,如果你知道这段神奇代码的作者,我相信你肯定不会说出这句话
都不需要懂集合论或者布尔代数,稍微受过训练的程序员都不至于。。。

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

@patterniha

Have you ever wondered why I added the unexpectedIPs option? because you can't generate all possible set-combinations just by having expectedIPs.

你刚开始加这个选项纯粹是为了想要在伊朗滥用 cfworker 顺畅的访问 twitter
但你忽略了一个基础常识,随便一个运营过小网站的不懂任何技术的站长,都会知道的常识
cloudflare cdn 的 ns 根本不会响应其它 cdn 的 ip
twitter 静态资源域同时使用两家 cdn 是靠负载均衡 cname 实现的

以致于你设计出来一个几乎完全无用的选项
至少对于你的需求来说,它完全无用

@patterniha
Copy link
Collaborator

patterniha commented Nov 14, 2025

Difference between:

        if isFound && !reverse {
			newIPs = append(newIPs, ip)
			continue
		}
		if !isFound && reverse {
			newIPs = append(newIPs, ip)
			continue
		}

and:

        if isFound != reverse {
			newIPs = append(newIPs, ip)
			continue
		}

is just 1-nanosecond!, because it just checks an extra if with fixed-boolean variables!

this is just a chore-minor-optimization!, there are many worse cases in the code. You can open pr for them.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

这不是性能差距问题,这是程序员的尊严问题
初学者一般会写出这种代码,因为没学过布尔逻辑

@marillindie
Copy link

Well, that code is ... at least correct, though suboptimal (maybe isFound != reverse, or isFound == !reverse for (maybe?) easier understanding), and I do know some non-CS students/researchers (even if they major in other science fields and do well there) who write such code.

I suggested making it an error for compatibility concerns: just make it explicit that there is a breaking change (even if it will probably only break < ~10 configs). I do agree that your implementation is intuitive. It's just I think it's better to make all users know that it's being changed.

Now I think maybe it'd be good to add a temporary "ipMatcher": "legacy" (default, with error on multiple negations) | "searchEngineLike" (or any name you like for this strategy) settings somewhere, just like domainMatcher which used to exist (I think?), so that prior users of multiple negations know that it's now broken (instead of migrating to the new strategy unexpectedly), most users get better performance, and those who want to use AND over negations can enable it explicitly and enjoy. And maybe after a few releases remove this settings entry and just default to your strategy.

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 14, 2025

Well, that code is ... at least correct

通常来说学过集合论或布尔代数甚至是 junior coder 都不至于写出这种代码

以及如果会集合论应当使用正确的符号表示补集和全集,而不是疯狂地使用 [0.0.0.0/0, ::/0] - A...
不懂集合论(尽管只是高中知识)给 AI 写了一大段提示词,AI 为了迎合提问者被迫接受这种设定才会有这种回复

并且他的多轮回复经提醒仍意识不到最基本的不相交全集问题,说明不会最简单的集合运算

那么真相只有一个了。。。


Now I think maybe it'd be good to add a temporary...

只要有人能举出 ! OR 的真实示例,及其必要性
ipMatcher: legacy 类似的选项是完全可接受的

甚至对于我来说,多个 ! 继续保持在我看来以前错误且毫无意义的语义也没有任何问题
我仅需要改动几行代码就能回到以前的语义,我也完全用不到 ! AND 功能

但在自己完全不懂的情况下,乱用 AI 生成垃圾回复来否定整个 PR 是不可接受的

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 17, 2025

当下的启发式算法不能保证 100% 准确性,在多个 IP 匹配或过滤时
尽管我不认为一次 DNS 查询响应的 IP 会横跨多个国家,如果有一般是 IP 库出错了

但就像 @Fangliding 说的“保证正确性比较重要”
我又改了一版,还剩一点今天抽空做完
完事了

此 PR 带来的性能收益主要靠合并 matcher 提速 2~10 倍
麻烦无比的启发式算法只能在此基础上再提 20%~35%

其实本来也足够快的,我的用例下一条路由规则 3295ns
费那么大劲做到现在的 133ns 体感上无区别
不像那些 DNS PR 收益看得见摸得着

@RPRX
Copy link
Member

RPRX commented Nov 21, 2025

这个可以有,虽然我 review 不了,直接合了算了

催一下 #4422 (comment)

@RPRX RPRX merged commit fcfb0a3 into XTLS:main Nov 21, 2025
39 checks passed
@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 21, 2025

重构 geosite 以节省内存昨天开始动工了
这几天就能完事
那坨东西有点恶心说实话

@RPRX
Copy link
Member

RPRX commented Nov 21, 2025

终于合并完了,期待 geosite 的重构,它本来就应该只占一份内存

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 21, 2025

妥妥的

本来还担心这几个 PR 不合的话我 geosite 咋写
那玩意和 DNS 有点耦合

@Meo597 Meo597 deleted the perf-geoip-matcher branch November 21, 2025 07:22
@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 26, 2025

期待 geosite 的重构,它本来就应该只占一份内存

本以为重构 geosite 很简单
弄了一上午发现是个大坑

dns 那边需要先把域名匹配所有服务器的所有规则,好能挑选服务器去解析
所以为了性能会把整个 dns 所有域名规则中 domain/full 类型的放到一个 hashmap 里去查
k 是域名,v 很容易转为 dns 服务器 id

而路由那边是逐条 rule 匹配,每条 rule 是一个 matcher

按我之前预想的计划节省内存,无非是让每 rule 甚至是每 geosite 共享 matcher 实例
但这就导致 dns 那边性能下降,因为需要多次 hashmap.get,原来只有一次

由重构导致的任何性能下降都是不可接受的
我试了下搞个 string 池,结果没有什么提升
因为大量域名都是 domain 匹配器,内部是 hashmap
然后 mphmatcher 那边的 key 和 dnsmatcher 的 key 不一样,多了个 .
因此只能老老实实重构了

然后我想了一个方案,改造路由的 mphmatcher 让其保存域名所在分类
这样 dns 和 router 直接共享一个 matcher

dns 遇到域名去查其所属分类,再去遍历各服务器,算交集
router 在 pre 阶段去查其所属分类,遇到 rule 算交集

给两个切片算交集是原本没有的步骤,为了追平差距这里可以用 []uint64 做 bitmask 的 & 运算,快速路径是单 uint64 当关注分类小于 64 个时

这样完美地解决了 dns 多次 hashmap.get 问题

然而这又导致了 router 性能下降,因为还有个大坑是 regexp
router 本来没必要把所有 geosite 分类中的 regexp 全算完,现在不得不
不过这里比较容易解决,可以让 router pre 时忽略 regexp,遇到 rule 时再算
regexp 和 geosite 分类之间很难一对多,所以这里也是完美的
(这也提醒了 geosite 的维护者:尽量别用线性的正则。最快的是 full 和 domain,而 keyword 有 ac 自动机性能也可接受)

然后问题又来了,mphmatcher 需要一口气把规则集全部加载完再 build
router 和 dns 不可能完全一样,所以要么在 main 函数下手,要么把装入的规则集做个延迟清理

问题又又又来了,api 调用动态添加的 rule 咋办,想了一下直接放弃它
一旦动态添加了 rule,直接用双份内存不再跟 dns 共享
或者是发现 api 配置块就不主动释放原始规则,以供 api 后面动态添加 rule 时还能 rebuild
同时再新增个 command 允许 api 主动释放原始 rule 来省内存

这是个十分浩大的工程
两三天搞不完

然后正则这边还有个微优化
即便是 DNS 也不应该匹配所有正则表达式
哈希和自动机命中了的分类,没必要再执行这些分类的正则匹配

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants