Skip to content

Conversation

@cyfung1031
Copy link
Collaborator

@cyfung1031 cyfung1031 commented Jan 9, 2026


  1. 改善 charset 判断

charset 判断并不容易
现存的node跟browser通用的 js lib 好像只有这个 chardet 比较理想。
在这个基础上引入我自家的理论,能准确有效判断charset.

我加了测试。如果你用原 chardet 的 detect 是无法准确判断出正确charset.

原理不解释了。你问问 copilot 吧
(我的自家理论,AI生成不出来,但应该能看懂)


  1. 新增 utf-32le utf-32be 支持

原生 TextDecoder 未支持 utf-32le utf-32be
透过手动转换 (LE直接DataView转换,BE要编译)
整合至 bytesDecode


  1. 修改 unit test

实际 detect 对像是脚本代码
你只用几个 byte 测试肯定什么都试不出来
至少给它一句句子

@cyfung1031 cyfung1031 force-pushed the cy-charset-enhancement branch 7 times, most recently from 30c3782 to 4085a61 Compare January 10, 2026 00:19
@cyfung1031 cyfung1031 marked this pull request as draft January 10, 2026 02:39
@cyfung1031 cyfung1031 force-pushed the cy-charset-enhancement branch from 17943af to fd4748e Compare January 10, 2026 03:00
@cyfung1031 cyfung1031 force-pushed the cy-charset-enhancement branch from fd4748e to ddb8d1b Compare January 10, 2026 03:01
@cyfung1031 cyfung1031 marked this pull request as ready for review January 10, 2026 03:05
@CodFrm CodFrm requested a review from Copilot January 12, 2026 06:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

此PR改善了字符集检测功能,主要包括三个方面的改进:

Changes:

  • 增强了charset检测算法,采用文本重复性测试来判断最佳编码
  • 新增了UTF-32LE和UTF-32BE编码支持(原生TextDecoder不支持)
  • 改进了单元测试,使用更真实的脚本代码作为测试数据

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/pkg/utils/encoding.ts 新增decodeUTF32和bytesDecode函数,重构detectEncoding逻辑
src/pkg/utils/encoding.test.ts 大幅扩展测试用例,使用真实场景数据
src/pages/install/App.tsx 使用新的bytesDecode函数替代TextDecoder
package.json 添加iconv-lite依赖用于测试
pnpm-lock.yaml 依赖锁文件更新
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

if (!decodedText) continue;
if (!highestConfidence) {
highestConfidence = entry.confidence;
if (highestConfidence > 90) return encoding;
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

检测到的编码应该返回,而不是立即终止循环。当前的逻辑在第一次成功解码后就返回,但根据后续代码逻辑,应该继续分析所有可能的编码,然后选择最佳的。建议将 return encoding; 改为 continue; 或移除这行。

Suggested change
if (highestConfidence > 90) return encoding;
if (highestConfidence > 90) continue;

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可性度高的编码,可以直接返回

@CodFrm
Copy link
Member

CodFrm commented Jan 12, 2026

没看明白你的自家理论,是根据可性度去处理?

这个场景应该也不会遇到很离谱的编码问题,不过这么完善的处理也行吧

@CodFrm CodFrm merged commit 2af2845 into scriptscat:release/v1.3 Jan 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants