Feat(ovpack): recursively vectorize all imported docs by sponge225 · Pull Request #1294 · volcengine/OpenViking

sponge225 · 2026-04-08T06:16:34Z

Description

修复/添加 ovpack 导入后的“直接向量化入队”逻辑：导入完成后递归遍历整棵目录树，把子目录和内部文件都加入向量化入队，避免导入后在子目录范围内检索不到内容。同时使用固定 worker 池，降低大pack场景的协程创建与调度压力。

Related Issue

现状： import_ovpack(..., vectorize=True) 会触发“导入后向量化入队”，但原逻辑只对导入根目录做一次入队，导致：

限定 --uri=viking://resources/<import_root>/ 时检索经常为空
导入后的子目录/内部文件无法被召回（因为没有向量索引）

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

ovpack 导入 import_ovpack(..., vectorize=True) 后的向量化入队从“仅入队根目录”改为“递归覆盖导入树（目录 + 文件）”，导入后子目录检索可命中。
向量化标准对齐 add-resource：
- 目录：读取并向量化该目录下存在的 .abstract.md / .overview.md （通过 vectorize_directory_meta 入队）。
- 文件：通过 vectorize_file 对文件内容向量化（包含 max_input_chars 截断/格式处理等既有逻辑）。
并发模型优化：用 asyncio.Queue + 固定 worker_count 处理入队任务，避免一次性 gather(N) 创建海量 coroutine（对大 ovpack 更稳定）。

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

手动验证：

导入一个包含多级目录的 ovpack（ vectorize=True ），导入完成后执行：
ov search --uri=viking://resources/<import_root>/ -n 10 "<关键词>"
预期：修改前子目录命中为空/很少；修改后子目录能命中（目录 .abstract/.overview 以及内部文件内容）。

提取root目录后再导入

修改前导入到了imports，修改后导入到了imports_fix

验证修改前

验证修改后

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

CLAassistant · 2026-04-08T06:16:44Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2026-04-08T06:17:22Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Silent Exception Swallowing The code uses a bare `except Exception:` that returns without logging, which can hide real errors when reading directory metadata files (.abstract.md/.overview.md). except Exception: return

github-actions · 2026-04-08T06:18:07Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Add logging for directory metadata read errors Add structured logging when an exception occurs while reading directory metadata files (.abstract.md and .overview.md). Swallowing exceptions without logging makes debugging difficult. Use the existing `logger` to log the error with context (dir_uri). openviking/storage/local_fs.py [116-124] try: if await viking_fs.exists(abstract_uri, ctx=ctx): content = await viking_fs.read_file(abstract_uri, ctx=ctx) abstract = content.decode("utf-8") if isinstance(content, bytes) else content if await viking_fs.exists(overview_uri, ctx=ctx): content = await viking_fs.read_file(overview_uri, ctx=ctx) overview = content.decode("utf-8") if isinstance(content, bytes) else content -except Exception: +except Exception as e: + logger.warning("Failed to read directory metadata files", dir_uri=dir_uri, error=e) return Suggestion importance[1-10]: 6 __ Why: Adds structured logging for exceptions when reading directory metadata files, improving debuggability without altering core functionality.	Low
General	Add worker error handling with logging Add error handling and logging for worker tasks that fail while indexing files or directories. This ensures we track individual task failures without crashing all workers, improving resilience and observability. openviking/storage/local_fs.py [142-155] async def worker() -> None: while True: kind, payload = await work_queue.get() try: if kind == "stop": return if kind == "dir": (dir_uri,) = payload await index_dir(dir_uri) elif kind == "file": file_uri, parent_uri, file_name = payload await index_file(file_uri, parent_uri, file_name) + except Exception as e: + logger.warning("Worker task failed", kind=kind, payload=payload, error=e) finally: work_queue.task_done() Suggestion importance[1-10]: 6 __ Why: Adds error handling and logging for worker tasks, enhancing resilience and observability of the indexing process.	Low

qin-ctx · 2026-04-08T06:25:00Z

-
-    embedding_msg = EmbeddingMsgConverter.from_context(resource)
-    await embedding_queue.enqueue(embedding_msg)
+    entries = await viking_fs.tree(uri, output="original", node_limit=100000, level_limit=1000, ctx=ctx)


这里为什么要加各种参数

默认的node_limit是1000，level_limit是3，比较小，调大防止遍历不全

kaisongli · 2026-04-08T09:55:33Z

https://github.com/volcengine/OpenViking/actions/runs/24125971068/job/70390515057

curl -X GET -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" -H "Accept-Encoding: gzip, deflate" -H "Accept: */*" -H "Connection: keep-alive" -H "Accept-Language: zh-CN,zh;q=0.9" -H "X-OpenViking-Account: default" -H "X-OpenViking-User: default" -H "Authorization: [REDACTED]" -H "Content-Type: application/json" "http://127.0.0.1:1933/api/v1/fs/ls?uri=viking%3A%2F%2Fresources%2Fupload_2488f2b1a9894a3bb84ee425ae5ca9f9&simple=False&recursive=False"
windows系统运行有API报错，是否会影响这个接口的正常运行

* feat(encryption): add encrypt unit test for plaintext shorter than the magic header (volcengine#1217) * fix: PID lock recycle, recall threshold bypass, orphaned compressor refs (volcengine#1211) * fix: PID lock recycle, recall threshold bypass, orphaned compressor refs 1. process_lock: verify PID is actually OpenViking on Linux by checking /proc/{pid}/cmdline. Prevents false DataDirectoryLocked when PIDs are recycled to unrelated processes after crash (Fixes volcengine#1088). 2. memory-ranking: add scoreThreshold param to pickMemoriesForInjection and filter non-leaf items below threshold. Previously low-scoring memories bypassed recallScoreThreshold when supplementing leaves (Fixes volcengine#1106). 3. compressor: catch FileNotFoundError separately in _merge_into_existing, clean up orphaned vector records so they are not retried indefinitely (Fixes volcengine#1048). * style: ruff format process_lock.py --------- Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> * fix(queuefs): dedupe memory semantic parent enqueues within window (volcengine#769) (volcengine#792) Repeated memory writes enqueue the same parent directory for .overview/.abstract regeneration, causing redundant VLM work. Skip duplicate enqueues for the same (account, user, agent, uri) within 45s; resource/session semantic paths unchanged. Fixes volcengine#769 Made-with: Cursor * docs: add Claude Code Memory Plugin example link and Chinese docs (volcengine#1228) Add README link for Claude Code Memory Plugin example in all language variants (EN, CN, JA) and add Chinese documentation for the plugin. * fix: prevent startup hang from blocking VLM call in redo recovery (volcengine#1226) Move redo recovery to a background task so the server starts without waiting for potentially slow VLM calls. Add 60s timeout to extract_long_term_memories to prevent indefinite hangs on individual redo tasks. Clean up the redo task on stop(). Fixes volcengine#1222 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> * fix(cli): handle non-UTF-8 filenames in add-resource (volcengine#1224) Replace to_string_lossy() with to_str() to preserve valid UTF-8 filenames (Chinese, Japanese, special characters) instead of silently corrupting them. Non-UTF-8 paths now return a clear Error::InvalidPath instead of garbled output. Fixes volcengine#1018 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> * test(crypto): add regression tests for decrypting short plaintext files (volcengine#1223) Add tests covering the fix in PR volcengine#1163 where decrypt() raised 'Ciphertext too short' on plaintext files shorter than 4 bytes. Tests added: - test_decrypt_empty_plaintext: raw b'' returns b'' - test_decrypt_short_plaintext_less_than_4_bytes: b'X', b'AB', b'ABC' return as-is - test_decrypt_magic_prefix_without_full_header: b'OVE1' raises CorruptedCiphertextError Co-authored-by: yc111233 <yc111233@users.noreply.github.com> * Implement request-scoped wait for write APIs (volcengine#1212) fix: request wait telemetry id fix: register request wait before enqueue add log * reorg: Rewrite agfs to ragfs with rust (volcengine#1221) * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * reorg: rewrite agfs with rust, and named with ragfs, keep License * fix: grep level limit * fix: grep root * fix: import error * fix: rust code optimazation * fix: CI error * fix: CI go mod cache * fix: grep level limit * fix: CI --------- Co-authored-by: openviking <openviking@example.com> * fix: update main when release, and add docker hub push (volcengine#1229) * fix security: feat(resources): harden HTTP resource ingestion against private-network SSRF (volcengine#1133) * Harden HTTP resource ingestion against private-network SSRF * chore(ci): retrigger checks * style: fix resource service import order * fix(session): add timeout to _wait_for_previous_archive_done to prevent infinite hang (volcengine#1235) When Phase 2 of a previous archive crashes after writing messages but before writing .done or .failed.json, the next archive's memory extraction enters an infinite polling loop with no exit condition. Add a 5-minute deadline. On timeout, return False (treated as failure) so the caller writes .failed.json and the session is no longer stuck. Co-authored-by: yc111233 <yc111233@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: volcengine#1238 and volcengine#1242 and volcengine#1232 (volcengine#1243) * fix(bot): respect OPENVIKING_CONFIG_FILE even when file doesn't exist Previously, bot's config path resolution would fallback to ~/.openviking/ov.conf when OPENVIKING_CONFIG_FILE was set but the file didn't exist. This was inconsistent with server's behavior, which treats a missing env-specified config as an error. In container deployments with OPENVIKING_CONFIG_FILE=/app/ov.conf, this caused bot to potentially write auto-generated config to /root/.openviking/ov.conf instead of the intended /app/ov.conf, leading to config file path mismatches. Now bot respects the environment variable unconditionally, matching server's behavior and ensuring both components use the same config path. Fixes volcengine#1242 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(embedder): support dimension truncation for OpenAI-compatible models Previously, when users configured a dimension (e.g., 1024) for OpenAI-compatible embedding models that don't support the 'dimensions' API parameter, the system would fail with "dimensions is currently not supported" error. Additionally, when dimension was not configured, the config layer would return a hardcoded fallback of 2048, but the actual model might return a different dimension (e.g., 1024), causing dimension validation failures. This fix implements vector truncation in OpenAIDenseEmbedder: - Removes the 'dimensions' parameter from API calls (not supported by all models) - If user configures dimension=1024 and model returns 2048, truncates to 1024 - If no dimension is configured, uses model's native dimension without truncation - Applies truncation to both single and batch embedding operations This allows users to: 1. Use OpenAI-compatible models with custom dimensions via truncation 2. Control vector dimensions for storage optimization 3. Avoid dimension mismatch errors between config and actual embeddings Fixes volcengine#1238 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: volcengine#1232 --------- Co-authored-by: openviking <openviking@example.com> Co-authored-by: Claude <noreply@anthropic.com> * Feature/memory opt (volcengine#1159) * fix(plugin): add skills to autoRecall search scope (volcengine#1225) Include viking://agent/skills in the autoRecall search alongside user memories and agent memories. Skills stored in OpenViking are now auto-injected into context when relevant to the query. Fixes volcengine#1089 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> * Revert "fix(session): add timeout to _wait_for_previous_archive_done to preve…" (volcengine#1265) This reverts commit dc15daf. * channel mention (volcengine#1272) * feat(storage): volcengine vector db support sts token (volcengine#1268) * feat(storage): volcengine vector db support sts token https://www.volcengine.com/docs/7139/1302258?lang=zh * feat(storage): volcengine vector db support sts token https://www.volcengine.com/docs/7139/1302258?lang=zh * feat(server): support host all to use dual stack netstat (volcengine#1273) * feat(auth): Restrict trusted mode without API key to localhost (volcengine#1279) * Improve memory v2 lock retry controls in compressor (volcengine#1275) * fix(security): configurable embedding circuit breaker & log suppression (volcengine#1277) * update wechat group qrcode (volcengine#1282) replace the old one. * ci: optimize API test matrix from 5 to 3 channels (volcengine#1281) - Reduce OS matrix from 5 channels to 3 channels - Use ubuntu-latest, macos-latest, windows-latest instead of specific versions - Remove redundant ubuntu-arm and intel macos variants * feat: add MiniMax-M2.7 and MiniMax-M2.7-highspeed provider support (volcengine#1284) - Update provider registry to document MiniMax-M2.7 and MiniMax-M2.7-highspeed as the recommended models (replacing the outdated M2.1 example comment) - Add explicit note that MiniMax does not support system messages (they are merged into the first user message automatically) - Update README to advertise MiniMax-M2.7 and MiniMax-M2.7-highspeed as the recommended MiniMax models with configuration instructions - Add comprehensive unit tests (17 tests) covering: - Registry keyword matching for MiniMax-M2.7 and MiniMax-M2.7-highspeed - Model prefix resolution (minimax/MiniMax-M2.7) - System message merging for both LiteLLMProvider and OpenAICompatibleProvider - Edge cases: multiple system messages, no system messages, non-MiniMax models - MINIMAX_API_KEY environment variable and international API base URL Co-authored-by: octo-patch <octo-patch@github.com> * feat(eval): add openclaw eval sh (volcengine#1287) * 增加完整的一键评测脚本 * 增加完整的一键评测脚本 * add create time in add_message (volcengine#1288) * fix(lark): add lark-oapi (volcengine#1285) * docs: add prompt guides in zh and en (volcengine#1292) * benchmark: add LoCoMo evaluation scripts for mem0 (volcengine#1290) Implements a two-phase benchmark pipeline for evaluating mem0 on the LoCoMo long-term conversation dataset (10 samples, 1540 non-adversarial QA pairs). - ingest.py: imports LoCoMo conversation sessions into mem0, using sample_id as the userId namespace. All messages use "user" role with [SpeakerName]: prefix to preserve two-person dialogue structure. Temporal context is added via a [System] prefix on each session. - eval.py: sends QA questions to an OpenClaw agent backed by the openclaw-mem0 plugin. Restarts the gateway per sample to switch the active userId, verifies the correct user is loaded before running questions, then parallelizes questions within each sample using unique session keys. Parses session jsonl to collect accurate per-turn token usage. Optionally judges answers with a Volcengine ARK LLM. - delete_user.py: utility to clear mem0 memories for given user_ids. - README.md: documents prerequisites, ingest/eval parameters, output format, and per-sample run commands. * fix(docker): restore venv tooling for ragfs build (volcengine#1295) fix(docker): preserve ragfs wheel extraction script indentation fix(docker): keep heredoc cleanup inside run shell fix(docker): terminate ragfs extraction heredoc cleanly fix(ci): split docker manifest digests by registry * Include gemini optional dependency in Docker image (volcengine#1254) The Dockerfile only includes the `bot` extra when installing dependencies, which means the `google-genai` package is missing at runtime. This causes the server to crash with `TypeError: 'NoneType' object is not callable` when users configure the Gemini embedding provider. Add `--extra gemini` to the `uv sync` commands so the Docker image ships with Gemini support out of the box. Fixes volcengine#1253 * ci: add timeout and SMTP failure notification (volcengine#1293) - Add 50 minutes timeout for api_test and oc2ov_test workflows - Add SMTP email notification on workflow failure - Email will be sent to likaisong@bytedance.com Required secrets: - SMTP_USERNAME: SMTP email username - SMTP_PASSWORD: SMTP email password/app password * fix(openclaw): sanitize and cap recall queries (volcengine#1297) Recall should use cleaned user text and avoid sending oversized prompts to retrieval. Add unit coverage for sanitization and truncation behavior. * fix(ci): repair reusable build workflow yaml blocks (volcengine#1300) * feat(ast): add Lua parser support and extractor wiring (volcengine#1286) * fix(ci): remove disallowed notify-failure action (volcengine#1302) - Remove dawidd6/action-send-mail@v4 which is not in allowed list - Keep timeout-minutes: 50 configuration - Use GitHub built-in notifications instead * fix(embedder): reduce async contention in session flows (volcengine#1301) Introduce native async embedding paths across providers, switch async retrieval/session hotspots to use them, and add a standalone mixed-load benchmark plus before/after benchmark evidence for the regression. * Feat(ovpack): recursively vectorize all imported docs (volcengine#1294) * feat(ovpack): recursively enqueue directory vectorization * feat(ovpack): recursively vectorize all imported docs * refactor(ovpack): use fixed workers for direct vectorization * refactor(ovpack): revert direct vectorization to gather * chore: drop redundant default tree args * feat: 巻加测试优化功能 (volcengine#1280) - Session ID 自动管理 - 智能等待策略 - 重试机制 - 测试数据管理 * fix(eval): OpenClaw eval, import to ov use default user (volcengine#1305) * 增加完整的一键评测脚本 * 增加完整的一键评测脚本 * 完善评测脚本 * 完善评测脚本 * 完善评测脚本 * fix(memory): batch semantic processing in _process_memory_directory to prevent CPU spikes (volcengine#1245) (volcengine#1304) * Fix ci (volcengine#1307) * fix(ci): fix build ci * build: move ragfs-python packaging into setup.py * fix bug (volcengine#1317) * fix: 优化测试关键词匹配和移除 Release Approval Gate (volcengine#1313) - 添加中英文关键词匹配：删除、无、deleted、expired、no longer - 移除 workflow 中的 release-approval job * fix: remove debug print statement in bot health check endpoint (volcengine#1310) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add scenario-based API tests (volcengine#1303) * feat: add scenario-based API tests - Add scenario test framework with proper categorization - Add tests for resources_retrieval, sessions, and stability_error scenarios - Add get_task() and wait_for_task() methods to API client for async operations - Add get_session_context() method for session context retrieval - Update API test workflow name from '03' to '06' - All 14 scenario tests pass with proper business logic validation * fix: skip scenarios tests when no VLM/Embedding secrets Scenarios tests require VLM for session archival summaries and Embedding for semantic search. Skip them in basic test mode. * fix(security): remove leaked token from settings.py (volcengine#1319) - Remove tests/oc2ov_test/config/settings.py containing exposed auth token - Add settings.py to .gitignore to prevent future leaks - Users should copy settings.example.py to settings.py and fill in their own tokens * fix(resources): implement trailing slash semantics for resource URIs (volcengine#1321) add support for trailing slash rules in resource URIs to control file/directory placement update CLI, API, and documentation to reflect new URI handling semantics add comprehensive tests for all URI semantics cases * Revert "fix(resources): implement trailing slash semantics for resource URIs …" (volcengine#1322) This reverts commit dec57bd. * fix litellm embedding dimension issues (volcengine#1323) Co-authored-by: openviking <openviking@example.com> * ci: optimize runner usage with conditional OS matrix and parallel limit (volcengine#1327) - Only run full OS matrix (ubuntu-24.04, macos-14, windows-latest) on main branch pushes - PR branches use ubuntu-24.04 only to reduce runner wait time - Add max-parallel: 2 to limit concurrent job execution * fix(bot):Response language, Multi user memory commit (volcengine#1329) * 修复语种问题 * 多user提交ov * ci: remove lite and full test workflows (volcengine#1331) * docs: fix docker deployment (volcengine#1332) Co-authored-by: openviking <openviking@example.com> * docs(openclaw-plugin): add health check tools guide (volcengine#1326) Add solution for health check tools report http 404 error * fix(queue): expose re-enqueue counts in queue status (volcengine#1337) Track semantic and embedding re-enqueues as first-class queue metrics so observer output, wait_processed payloads, and telemetry summaries make retry loops visible before they escalate into hard errors. * feat(s3fs): add disable_batch_delete option for OSS compatibility (volcengine#1330) (volcengine#1333) Co-authored-by: yuanqianhe <yuanqianhe@lattebank.com> * Fix/add resource cover (volcengine#1338) * fix(resources): improve handling of resource imports and naming - Add source_name to file upload requests for preserving original filenames - Handle single-directory zip files by using their root directory directly - Support viking://resources as parent directory for imports - Split summarization for resources root imports into individual child items - Add tests for new resource import behaviors * style(tests): format test files with consistent line breaks Improve readability by applying consistent line breaks in test file patches and removing trailing whitespace * afterTurn: store messages with actual roles and skip heartbeat messages (volcengine#1340) * fix: fall back to prefix filters for volcengine path scope (volcengine#1342) Co-authored-by: haosenwang1018 <haosenwang1018@users.noreply.github.com> * fix(session): auto-create missing sessions on first add (volcengine#1348) Ensure the add-message API materializes a missing session before loading it so plugins can append the first turn without an explicit create_session call. * Fix/api test issues (volcengine#1341) * fix: make api_test more robust for CI environments - Add @pytest.hookimpl(optionalhook=True) for pytest-html hooks to fix compatibility issues - test_fs_read: skip test when AGFS service is not available - test_get_overview: skip test when overview file does not exist These changes ensure tests pass gracefully on CI servers where AGFS service may not be available or files may not exist. * ci: reduce max-parallel to 1 for better resource availability Reduce max-parallel from 2 to 1 to avoid waiting for multiple runners when GitHub-hosted runners are limited. * fix: derive context_type from URI in index_resource (volcengine#1346) index_resource() always passed the default context_type="resource" to vectorize_directory_meta() and vectorize_file(), even when indexing memory directories (URIs containing /memories/). This caused all records created via the /api/v1/content/reindex endpoint to be tagged as "resource" instead of "memory", breaking stats aggregation (which filters on context_type="memory") and search scoping. Use the existing get_context_type_for_uri() helper to derive the correct context_type from the URI and pass it through to both vectorize_directory_meta() and vectorize_file(). Co-authored-by: yc111233 <yc111233@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * reorg: remove golang depends (volcengine#1339) * docs: fix docker deployment * reorg: remove third_party/agfs * feat(s3fs): add disable_batch_delete option for OSS compatibility Port of PR volcengine#1333 from Go version to Rust: - Add disable_batch_delete config option to S3Client - When enabled, use sequential single-object deletes instead of DeleteObjects - This is for S3-compatible services like Alibaba Cloud OSS that require Content-MD5 for DeleteObjects but AWS SDK v2 does not send it by default - Add documentation and config example for OSS * fix(s3fs): pass disable_batch_delete config from Python to Rust Add disable_batch_delete to the s3_plugin_config dict in _generate_plugin_config so that the Python config can properly control the Rust S3FS plugin's behavior. * reorg: remove third_party/agfs * reorg: remove third_party/agfs * change some docs * change some docs --------- Co-authored-by: openviking <openviking@example.com> * Feat/mem opt (volcengine#1349) * fix(memory): handle string response from VLM when tools disabled - Fix AttributeError when VLM returns string instead of VLMResponse - Fix tuple creation bug with trailing comma causing double-nested tools array Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * update * chore: replace LGBTQ example with book club in entities.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(memory): disable tools on extended iteration to prevent infinite loop When max_iterations is extended due to tool calls, disable tools for the additional iteration to ensure the extract loop terminates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(memory): handle out-of-bounds range from LLM extraction Clamp range values to valid message indices instead of skipping, to handle cases where LLM extracts incorrect ranges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: openai like embedding models fix, no more matryoshka error (volcengine#1350) Co-authored-by: openviking <openviking@example.com> * 增加关闭ov的配置 (volcengine#1352) * feat: 新前端项目初始化 * style: 切换到vega风格 * feat: 新增定制化的 openapi codegen * chore: 把openapi生成的ov-client移到src/gen底下 * feat: 生成 openviking-server 的 client * feat: 搭建 openviking client 的适配层 * docs: 新增 ov-client 适配层描述 * chore: 清理项目模板 * clean: 清理模板样式 * deps: 新增 tanstack query 依赖 * style: 调整 shadcn 风格 * feat: 导入基础的shadcn组件 * feat: 复刻旧的 webconsole * fix: 修复legacy代码commit不完整的问题 * docs: readme * feat: 允许不显示传递自定义client * dev: add development environment script * feat: support tags for resource management and retrieval This commit introduces a native `tags` parameter across the entire stack (Core, API, SDK, CLI) to easily tag resources and filter them during semantic search. Changes include: - **Core & Storage**: - Write `tags` to the resource root's `.meta.json` during `add_resource`. - Read tags from `.meta.json` during async semantic processing and hoist them to the VectorDB context for indexing. - Enrich directory stats/entries in `VikingFS` (`ls`, `stat`) with `tags`. - **API & Service**: Add `tags` field to resource creation and search routes. - **Python SDK**: - Add `tags` parameter to `add_resource`. - Add `tags` shortcut parameter to `find` and `search` methods, which automatically constructs the underlying `contains` metadata filters. - **Rust CLI**: Add `--tags` flag to `ov add-resource`, `ov find`, and `ov search` commands. - **Docs**: Update English and Chinese documentation for Resources, Retrieval, and Filesystem APIs to reflect the new `tags` parameters and structures. - **Tests**: Add unit tests for resource processor meta merging, VikingFS tag reading, and HTTP client tag filtering logic. * feat(search): support tags filtering and return tags in retrieval results * refactor(tags): extract build_tags_filter helper and improve validation * chore(scripts): add fast mode toggle to bootstrap menu * fix(tags): sanitize once and isolate semantic tags context * feat: 三段式布局 + 功能页拆分 - 新增顶栏(全宽) + 左侧边栏 + 右侧内容区的三段式布局 - 将 Data 大页面拆分为独立路由: FileSystem、Find、Add Memory - 将 Access 页面迁移为独立的 Settings 页面 - 侧边栏按 Data/Ops/Access 分组显示导航项 - 提取共享工具函数到 data-utils.ts - 修复 .gitignore 中 data/ 规则误匹配子目录的问题 * feat(web-studio): add i18n support and dark/light theme toggle - Add i18next with browser language detection (zh-CN / en) - Add language switcher dropdown in header bar - Add dark/light theme toggle with animated Sun/Moon icons - Wire up next-themes ThemeProvider with class-based dark mode - Replace shadcn dark theme with default Zinc (neutral gray) - Connect sidebar labels to i18n translation keys - Fix dropdown menu forced dark styling --------- Co-authored-by: baojun-zhang <zhangbaojun.1@bytedance.com> Co-authored-by: Jason <101583541+JasonOA888@users.noreply.github.com> Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> Co-authored-by: Protocol Zero <257158451+Protocol-zero-0@users.noreply.github.com> Co-authored-by: 灿烂甜菜 <731426007@qq.com> Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com> Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: yc111233 <109650216+yc111233@users.noreply.github.com> Co-authored-by: yc111233 <yc111233@users.noreply.github.com> Co-authored-by: Jiahui Zhou <zhoujiahui.01@bytedance.com> Co-authored-by: MaojiaSheng <shengmaojia@bytedance.com> Co-authored-by: openviking <openviking@example.com> Co-authored-by: 13ernkastel <LennonCMJ@live.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: chenjw <chenjunwen@bytedance.com> Co-authored-by: yeshion23333 <dutao.1786@bytedance.com> Co-authored-by: heaoxiang-ai <heaoxiang@bytedance.com> Co-authored-by: Yaoyao <yuyao.yoyo@bytedance.com> Co-authored-by: kaisongli <likaisong@bytedance.com> Co-authored-by: Octopus <liyuan851277048@icloud.com> Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: AutoCoder <wulf234@163.com> Co-authored-by: yangxinxin-7 <yangxinxin.24@bytedance.com> Co-authored-by: Yang Zhi <yangzhi.see@gmail.com> Co-authored-by: Qin Haojie <qinhaojie.exe@bytedance.com> Co-authored-by: Shawn-o <64448366+Shawn-cf-o@users.noreply.github.com> Co-authored-by: zgy <1670519171@qq.com> Co-authored-by: chuanbao666 <sunchuanbao@bytedance.com> Co-authored-by: 7. Sun <jhao.sun@gmail.com> Co-authored-by: yepper <yangshunyao@bytedance.com> Co-authored-by: mrj666 <33885009+mrj666@users.noreply.github.com> Co-authored-by: yuan7he <yuan7he@gmail.com> Co-authored-by: yuanqianhe <yuanqianhe@lattebank.com> Co-authored-by: Sense_wang <167664334+haosenwang1018@users.noreply.github.com> Co-authored-by: haosenwang1018 <haosenwang1018@users.noreply.github.com> Co-authored-by: ifeichuan <feichuan05@gmail.com> Co-authored-by: Jye10032 <736891807@qq.com>

sponge225 added 3 commits April 7, 2026 19:38

feat(ovpack): recursively enqueue directory vectorization

4e94282

feat(ovpack): recursively vectorize all imported docs

7585a4a

refactor(ovpack): use fixed workers for direct vectorization

3e672e9

github-project-automation bot added this to OpenViking project Apr 8, 2026

github-project-automation bot moved this to Backlog in OpenViking project Apr 8, 2026

qin-ctx reviewed Apr 8, 2026

View reviewed changes

Comment thread openviking/storage/local_fs.py Outdated

qin-ctx reviewed Apr 8, 2026

View reviewed changes

sponge225 added 2 commits April 8, 2026 16:01

refactor(ovpack): revert direct vectorization to gather

3878789

chore: drop redundant default tree args

fe099f8

qin-ctx approved these changes Apr 8, 2026

View reviewed changes

qin-ctx merged commit 62ceedc into volcengine:main Apr 8, 2026
5 of 9 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(ovpack): recursively vectorize all imported docs#1294

Feat(ovpack): recursively vectorize all imported docs#1294
qin-ctx merged 5 commits intovolcengine:mainfrom
sponge225:feature/ovpack-recursive-vectorization

sponge225 commented Apr 8, 2026

Uh oh!

CLAassistant commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Uh oh!

qin-ctx Apr 8, 2026

Uh oh!

sponge225 Apr 8, 2026

Uh oh!

kaisongli commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sponge225 commented Apr 8, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

CLAassistant commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 8, 2026

PR Code Suggestions ✨

Uh oh!

Uh oh!

qin-ctx Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

sponge225 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

kaisongli commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants