Skip to content

新增知识库文档重新分块功能#332

Merged
xerrors merged 3 commits intoxerrors:mainfrom
supreme0597:main
Nov 13, 2025
Merged

新增知识库文档重新分块功能#332
xerrors merged 3 commits intoxerrors:mainfrom
supreme0597:main

Conversation

@supreme0597
Copy link
Copy Markdown
Contributor

新增知识库文档重新分块功能,保存分块参数到元数据中,为后续重新分块提供参数回填

@xerrors xerrors self-requested a review November 12, 2025 16:09
@xerrors xerrors self-assigned this Nov 12, 2025
@xerrors
Copy link
Copy Markdown
Owner

xerrors commented Nov 12, 2025

你好, @supreme0597 ! 感谢 PR,非常感谢!如果能提供一些相关的测试日志和截图说明就更好!我晚些时候 Review 一下!

@supreme0597
Copy link
Copy Markdown
Contributor Author

抱歉,现在在上班,安全合规检查,不能上传图片;
贴了部分日志,可以看看

11-13 10:00:21 DEBUG knowledge_router.py:387: Rechunks documents for db_id kb_ae60c968c537b239d494d6fb229d8907: ['file_8c95be'] params={'chunk_size': 1000, 'chunk_overlap': 200, 'use_qa_split': False, 'qa_separator': '\n\n\n'}
11-13 10:00:21 INFO tasker.py:138: Enqueued task 3227dd8543d14c0ab9c5c184ed9b8a78 (文档重新分块(kb_ae60c968c537b239d494d6fb229d8907))
11-13 10:00:21 INFO: 127.0.0.1:63441 - "POST /api/knowledge/databases/kb_ae60c968c537b239d494d6fb229d8907/documents/rechunks HTTP/1.1" 200
11-13 10:00:21 INFO: 127.0.0.1:63443 - "GET /api/knowledge/databases/kb_ae60c968c537b239d494d6fb229d8907 HTTP/1.1" 200
2025-11-13 10:00:22 INFO: 5 changes detected
2025-11-13 10:00:22 INFO: 5 changes detected
11-13 10:00:23 DEBUG base.py:371: Added file file_8c95be to processing queue
11-13 10:00:23 DEBUG base.py:626: Saved milvus metadata
2025-11-13 10:00:23 INFO: 6 changes detected
11-13 10:00:23 INFO milvus.py:540: Deleted chunks for file file_8c95be from Milvus
11-13 10:00:23 DEBUG kb_utils.py:101: Successfully split text into 17 chunks using MarkdownTextSplitter
11-13 10:00:23 INFO milvus.py:370: Split 文件_1f7f.html into 17 chunks
11-13 10:00:25 INFO milvus.py:391: Updated file saves\knowledge_base_data\milvus_data\kb_ae60c968c537b239d494d6fb229d8907\uploads\文件_1f7f.html in Milvus. Done.
11-13 10:00:25 DEBUG base.py:626: Saved milvus metadata
11-13 10:00:25 DEBUG base.py:383: Removed file file_8c95be from processing queue

还有,后续有计划将上传的文件传到minio上,把元数据存数据库么?感觉现在的方案,只能docker单机部署,部署不到k8s上面。单机的话,就有点不稳定了。

@xerrors
Copy link
Copy Markdown
Owner

xerrors commented Nov 13, 2025

太强了!从未见过如此顺心的 PR!再次感谢 PR!

关于文件上传,我之前属于是偷懒没有选择上传到 minio,而是使用本地的文件夹。后续有可能会统一调整将知识库的文件、附件、以及运行过程中产生的文生图、图片消息等都统一使用 minio

@xerrors xerrors merged commit 1700628 into xerrors:main Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants