Skip to content

Implement comprehensive document metadata support in Knowledge Base workflow node#10

Open
jaffrey-deepsource wants to merge 24 commits into
mainfrom
feat-rag-pipline-metadata
Open

Implement comprehensive document metadata support in Knowledge Base workflow node#10
jaffrey-deepsource wants to merge 24 commits into
mainfrom
feat-rag-pipline-metadata

Conversation

@jaffrey-deepsource
Copy link
Copy Markdown

No description provided.

ZeroZ-lab and others added 24 commits January 23, 2026 14:36
…e node

Implement comprehensive document metadata support in Knowledge Base workflow node, allowing users to configure metadata values through both constants and variables.

Backend changes:
- Add DocMetadata model with support for constant values and variable selectors
- Implement metadata processing in KnowledgeIndexNode with variable resolution
- Add batch query optimization to prevent N+1 queries
- Implement metadata validation and binding creation
- Add comprehensive unit tests for node and service layers

Frontend changes:
- Add MetadataSection component with type-aware input controls
  - String type: text input
  - Number type: number input with validation
  - Time type: date picker (Unix timestamp)
- Implement variable filtering based on metadata data type
  - String metadata: only string variables
  - Number metadata: only number/integer variables
  - Time metadata: only time-related number variables (timestamp, *time*, *date*, *at*)
- Add VarReferencePicker with 360px min-width for better UX
- Standardize font size to text-[13px] across all inputs
- Add i18n support for all user-facing strings

Technical improvements:
- Use SQLAlchemy attributes.flag_modified() for JSON field updates
- Optimize logging to follow project standards (warnings and exceptions only)
- Add type safety with proper TypeScript definitions
- Implement proper error handling with user-friendly messages

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix test_delete_metadata_not_found: use pytest.raises() since service
  raises ValueError instead of returning None
- Fix test_run_with_custom_metadata: remove mock of _invoke_knowledge_index
  that bypassed metadata processing logic, add missing BATCH/ORIGINAL_DOCUMENT_ID
  variable mocks, remove redundant side_effect code
- Fix test_save_document_with_dataset_id_ignores_lock_not_owned: add missing
  enable_built_in_metadata and doc_metadata attributes to knowledge_config
- Fix test_save_document_with_metadata: add .filter().all() mock chain for
  DatasetMetadata query
- Enhance check_metadata_used_in_pipeline to check both draft and current
  published workflows (via pipeline.workflow_id) to prevent deletion of
  metadata actively used in production

Co-Authored-By: Claude <noreply@anthropic.com>
Add proper input styling (border, rounded corners, background) to the
metadata value container in Knowledge Base node for visual consistency
with other form inputs.

Co-Authored-By: Claude <noreply@anthropic.com>
Rewrite workflow query conditions to avoid type checking error with
list.append() having inconsistent types.

Co-Authored-By: Claude <noreply@anthropic.com>
Mock attributes.flag_modified to avoid SQLAlchemy internal state
requirements. Update assertion to verify flag_modified call instead
of db.session.add call.
Resolve conflicts between doc_metadata feature and summary_index feature:
- api/core/workflow/nodes/knowledge_index/entities.py: merge both fields
- api/core/workflow/nodes/knowledge_index/knowledge_index_node.py: merge imports
- web/app/components/workflow/nodes/knowledge-base/hooks/use-config.ts: merge handlers
- web/app/components/workflow/nodes/knowledge-base/types.ts: merge type definitions
- web/app/components/workflow/nodes/knowledge-base/panel.tsx: merge handler usage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add system-xs-regular and truncate to date-picker for consistent font size
- Fix value container alignment with w-0 grow overflow-hidden pattern
- Change delete button from × to trash icon with destructive hover
- Align condition-date layout with other condition components

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added the web label Feb 12, 2026
@deepsource-development
Copy link
Copy Markdown

deepsource-development Bot commented Feb 12, 2026

DeepSource Code Review

DeepSource reviewed changes in the commit range b76c8fa..a4cd1fb on this pull request. Below is the summary for the review, and you can see the individual issues we found as review comments.

For detailed review results, please see the PR on DeepSource ↗

Important

Some issues found as part of this review are outside of the diff in this pull request and aren't shown in the inline review comments due to GitHub's API limitations. Please see the DeepSource dashboard for this PR to view those issues.

PR Report Card

Security × 5 issues Overall PR Quality   

Focus Area: Reliability

Guidance
Fix the context manager 'NoneType' issues in api/services/dataset_service.py, appearing four times.

Grade capped at D due to multiple critical issues
Reliability × 13 issues
Complexity × 26 issues
Hygiene × 21 issues

Code Review Summary

Analyzer Status Summary Details
Python 68 new issues detected. 2 existing issues fixed. Review ↗
Secrets No new issues detected. Review ↗
How are these analyzer statuses calculated?

Administrators can configure which issue categories are reported and cause analysis to be marked as failed when detected. This helps prevent bad and insecure code from being introduced in the codebase. If you're an administrator, you can modify this in the repository's settings.


💡 If you're a repository administrator, you can configure the quality gates from the settings.


# doc_metadata variables
if node_data_obj.doc_metadata:
for item in node_data_obj.doc_metadata:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`node_data_obj.doc_metadata` used non-iterably in `for` loop


The code attempts to iterate over node_data_obj.doc_metadata with a for loop, assuming it is iterable. If doc_metadata is None or another non-iterable type, this causes a TypeError that crashes execution.

Ensure doc_metadata is always set to an iterable type like a list before the loop. Alternatively, add a check to confirm iterability or provide a default empty iterable to avoid runtime errors.

)

# Execute
result = node._run()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable `result` wastes memory and confuses


Assigning the output of node._run() to result without using it wastes memory and may confuse developers about its purpose. This unused variable has no effect on program behavior but adds clutter.
Remove the result variable or replace it with _ if the value is intentionally unused to indicate its irrelevance to the program flow.

patch("services.dataset_service.redis_client") as mock_redis,
patch("services.dataset_service.DocumentService.build_document") as mock_build_document,
patch("services.dataset_service.current_user") as mock_current_user,
patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_indexing_task,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable `mock_indexing_task` adds confusion


The variable mock_indexing_task is assigned in the code but never used, which wastes space and can mislead developers about its purpose. This unused variable adds unnecessary noise and maintenance burden.
Rename it to start with _ or _unused, or remove it if not needed to clarify intent and clean the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants