🐛 处理脚本编码问题 #1115 #1138

CodFrm · 2026-01-08T09:05:40Z

概述 Descriptions

close #1115

变更内容 Changes

截图 Screenshots

Copilot

Pull request overview

这个 PR 修复了脚本安装时的字符编码检测问题（issue #1115）。主要改进是将基于 HTTP Content-Type header 的编码检测替换为使用 chardet 库进行自动编码检测，使得编码识别更加可靠和准确。

主要变更：

引入 chardet 库用于自动检测脚本文件的字符编码
移除了原有的基于 Content-Type header 的 charset 解析逻辑
改进了错误处理机制，在编码检测或解码失败时回退到 UTF-8 而不是抛出异常

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/pages/install/App.tsx	将脚本获取函数中的编码检测从 Content-Type header 解析改为使用 chardet 自动检测，并优化了错误处理逻辑
package.json	添加 chardet@^2.1.1 依赖
pnpm-lock.yaml	添加 chardet 依赖项并更新 lock 文件格式（自动添加 libc 字段以更好地支持平台特定依赖）

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

src/pages/install/App.tsx

CodFrm · 2026-01-08T09:17:21Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-01-08T09:17:29Z

@CodFrm I've opened a new pull request, #1139, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan * 优化编码检测性能并添加完整测试覆盖 Co-authored-by: CodFrm <22783163+CodFrm@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: CodFrm <22783163+CodFrm@users.noreply.github.com>

cyfung1031 · 2026-01-08T16:08:13Z

你的代码看起来很不错！有几个小的改进建议：

建议的改进：

增加更多回退编码选项
添加更详细的日志（可选）
处理常见的编码别名

改进后的版本：

import chardet from "chardet";

/**
 * 从 Content-Type header 中解析 charset
 */
export const parseCharsetFromContentType = (contentType: string | null): string | null => {
  if (!contentType) return null;

  const match = contentType.match(/charset=([^;]+)/i);
  if (match && match[1]) {
    return match[1].trim().toLowerCase().replace(/['"]/g, "");
  }
  return null;
};

/**
 * 常见编码的别名映射（解决 chardet 可能返回别名的问题）
 */
const ENCODING_ALIASES: Record<string, string> = {
  'ascii': 'utf-8',
  'us-ascii': 'utf-8',
  'iso-8859-1': 'windows-1252', // 常见混淆
  'gb2312': 'gb18030', // GB2312 是 GB18030 的子集
  'cp1252': 'windows-1252',
  'cp1251': 'windows-1251',
  'shift-jis': 'shift_jis',
  'ms932': 'shift_jis',
};

/**
 * 标准化编码名称
 */
const normalizeEncoding = (encoding: string): string => {
  const normalized = encoding.toLowerCase().trim();
  return ENCODING_ALIASES[normalized] || normalized;
};

/**
 * 验证编码是否有效
 */
const isValidEncoding = (encoding: string): boolean => {
  try {
    new TextDecoder(encoding);
    return true;
  } catch {
    return false;
  }
};

/**
 * 尝试解码以验证编码是否正确
 */
const testDecode = (data: Uint8Array, encoding: string, sampleSize: number = 1024): boolean => {
  try {
    const sample = data.subarray(0, Math.min(data.length, sampleSize));
    const decoder = new TextDecoder(encoding, { fatal: true });
    decoder.decode(sample);
    return true;
  } catch {
    return false;
  }
};

/**
 * 检测字节数组的编码
 * 优先使用 Content-Type header，失败时使用 chardet（仅对前16KB检测以提升性能）
 */
export const detectEncoding = (
  data: Uint8Array, 
  contentType: string | null,
  options: {
    verbose?: boolean;
    fallbackEncodings?: string[];
  } = {}
): string => {
  const {
    verbose = false,
    fallbackEncodings = ['utf-8', 'windows-1252', 'iso-8859-1']
  } = options;

  // 1. 优先尝试使用 Content-Type header 中的 charset
  const headerCharset = parseCharsetFromContentType(contentType);
  if (headerCharset) {
    const normalizedHeaderCharset = normalizeEncoding(headerCharset);
    
    if (isValidEncoding(normalizedHeaderCharset)) {
      if (testDecode(data, normalizedHeaderCharset)) {
        if (verbose) console.log(`Using charset from Content-Type header: ${normalizedHeaderCharset}`);
        return normalizedHeaderCharset;
      } else if (verbose) {
        console.warn(`Charset from header failed to decode: ${normalizedHeaderCharset}`);
      }
    } else if (verbose) {
      console.warn(`Invalid charset from Content-Type header: ${headerCharset} (normalized: ${normalizedHeaderCharset})`);
    }
  }

  // 2. 使用 chardet 检测编码
  const sampleSize = Math.min(data.length, 16 * 1024);
  const sample = data.subarray(0, sampleSize);
  
  try {
    const detected = chardet.detect(sample);
    
    if (detected) {
      const detectedEncoding = normalizeEncoding(detected);
      
      if (isValidEncoding(detectedEncoding)) {
        if (testDecode(data, detectedEncoding)) {
          if (verbose) console.log(`Using charset detected by chardet: ${detectedEncoding}`);
          return detectedEncoding;
        } else if (verbose) {
          console.warn(`Charset detected by chardet failed to decode: ${detectedEncoding}`);
        }
      } else if (verbose) {
        console.warn(`Invalid charset detected by chardet: ${detected} (normalized: ${detectedEncoding})`);
      }
    }
  } catch (error: any) {
    if (verbose) console.warn(`chardet detection failed: ${error.message}`);
  }

  // 3. 尝试回退编码
  for (const fallback of fallbackEncodings) {
    if (isValidEncoding(fallback) && testDecode(data, fallback)) {
      if (verbose) console.log(`Using fallback encoding: ${fallback}`);
      return fallback;
    }
  }

  // 4. 最终回退到 UTF-8（带错误恢复）
  if (verbose) console.log(`Falling back to utf-8`);
  return "utf-8";
};

/**
 * 安全解码字节数组到字符串
 */
export const decodeBuffer = (
  data: Uint8Array,
  contentType: string | null,
  options?: Parameters<typeof detectEncoding>[2]
): string => {
  const encoding = detectEncoding(data, contentType, options);
  const decoder = new TextDecoder(encoding, { fatal: false });
  return decoder.decode(data);
};

主要改进：

编码别名处理：处理常见编码的别名问题
实际解码测试：不仅验证 TextDecoder 能否创建，还测试能否实际解码数据
详细日志选项：添加 verbose 选项用于调试
多级回退：可以指定多个回退编码
工具函数：添加了 decodeBuffer 函数方便直接使用

使用示例：

// 基本使用
const encoding = detectEncoding(data, contentType);

// 带详细日志
const encoding = detectEncoding(data, contentType, { verbose: true });

// 自定义回退编码
const encoding = detectEncoding(data, contentType, {
  fallbackEncodings: ['utf-8', 'gb18030', 'big5']
});

// 直接解码
const text = decodeBuffer(data, contentType);

你的原始代码已经很好了，这些改进是可选的，根据你的具体需求选择使用。

🐛 处理脚本编码问题 #1115

876edcb

CodFrm requested a review from Copilot January 8, 2026 09:09

Copilot started reviewing on behalf of CodFrm January 8, 2026 09:10 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

src/pages/install/App.tsx Outdated Show resolved Hide resolved

src/pages/install/App.tsx Outdated Show resolved Hide resolved

Copilot AI mentioned this pull request Jan 8, 2026

优化脚本安装编码检测性能并添加测试覆盖 #1139

Merged

Copilot AI and others added 3 commits January 8, 2026 17:53

修复lint问题

144a5f0

data.subarray

1419699

CodFrm merged commit a3abaf0 into release/v1.3 Jan 9, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 处理脚本编码问题 #1115 #1138

🐛 处理脚本编码问题 #1115 #1138

Uh oh!

CodFrm commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

CodFrm commented Jan 8, 2026

Uh oh!

Copilot AI commented Jan 8, 2026

Uh oh!

cyfung1031 commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🐛 处理脚本编码问题 #1115 #1138

🐛 处理脚本编码问题 #1115 #1138

Uh oh!

Conversation

CodFrm commented Jan 8, 2026

概述 Descriptions

变更内容 Changes

截图 Screenshots

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

CodFrm commented Jan 8, 2026

Uh oh!

Copilot AI commented Jan 8, 2026

Uh oh!

cyfung1031 commented Jan 8, 2026

建议的改进：

改进后的版本：

主要改进：

使用示例：

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants