Skip to content

Add comprehensive document metadata support in Knowledge Base workflow node#5

Closed
jaffrey-deepsource wants to merge 24 commits into
mainfrom
feat-rag-pipline-metadata
Closed

Add comprehensive document metadata support in Knowledge Base workflow node#5
jaffrey-deepsource wants to merge 24 commits into
mainfrom
feat-rag-pipline-metadata

Conversation

@jaffrey-deepsource
Copy link
Copy Markdown

No description provided.

ZeroZ-lab and others added 24 commits January 23, 2026 14:36
…e node

Implement comprehensive document metadata support in Knowledge Base workflow node, allowing users to configure metadata values through both constants and variables.

Backend changes:
- Add DocMetadata model with support for constant values and variable selectors
- Implement metadata processing in KnowledgeIndexNode with variable resolution
- Add batch query optimization to prevent N+1 queries
- Implement metadata validation and binding creation
- Add comprehensive unit tests for node and service layers

Frontend changes:
- Add MetadataSection component with type-aware input controls
  - String type: text input
  - Number type: number input with validation
  - Time type: date picker (Unix timestamp)
- Implement variable filtering based on metadata data type
  - String metadata: only string variables
  - Number metadata: only number/integer variables
  - Time metadata: only time-related number variables (timestamp, *time*, *date*, *at*)
- Add VarReferencePicker with 360px min-width for better UX
- Standardize font size to text-[13px] across all inputs
- Add i18n support for all user-facing strings

Technical improvements:
- Use SQLAlchemy attributes.flag_modified() for JSON field updates
- Optimize logging to follow project standards (warnings and exceptions only)
- Add type safety with proper TypeScript definitions
- Implement proper error handling with user-friendly messages

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix test_delete_metadata_not_found: use pytest.raises() since service
  raises ValueError instead of returning None
- Fix test_run_with_custom_metadata: remove mock of _invoke_knowledge_index
  that bypassed metadata processing logic, add missing BATCH/ORIGINAL_DOCUMENT_ID
  variable mocks, remove redundant side_effect code
- Fix test_save_document_with_dataset_id_ignores_lock_not_owned: add missing
  enable_built_in_metadata and doc_metadata attributes to knowledge_config
- Fix test_save_document_with_metadata: add .filter().all() mock chain for
  DatasetMetadata query
- Enhance check_metadata_used_in_pipeline to check both draft and current
  published workflows (via pipeline.workflow_id) to prevent deletion of
  metadata actively used in production

Co-Authored-By: Claude <noreply@anthropic.com>
Add proper input styling (border, rounded corners, background) to the
metadata value container in Knowledge Base node for visual consistency
with other form inputs.

Co-Authored-By: Claude <noreply@anthropic.com>
Rewrite workflow query conditions to avoid type checking error with
list.append() having inconsistent types.

Co-Authored-By: Claude <noreply@anthropic.com>
Mock attributes.flag_modified to avoid SQLAlchemy internal state
requirements. Update assertion to verify flag_modified call instead
of db.session.add call.
Resolve conflicts between doc_metadata feature and summary_index feature:
- api/core/workflow/nodes/knowledge_index/entities.py: merge both fields
- api/core/workflow/nodes/knowledge_index/knowledge_index_node.py: merge imports
- web/app/components/workflow/nodes/knowledge-base/hooks/use-config.ts: merge handlers
- web/app/components/workflow/nodes/knowledge-base/types.ts: merge type definitions
- web/app/components/workflow/nodes/knowledge-base/panel.tsx: merge handler usage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add system-xs-regular and truncate to date-picker for consistent font size
- Fix value container alignment with w-0 grow overflow-hidden pattern
- Change delete button from × to trash icon with destructive hover
- Align condition-date layout with other condition components

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added the web label Feb 12, 2026
@deepsource-development
Copy link
Copy Markdown

deepsource-development Bot commented Feb 12, 2026

DeepSource Code Review

DeepSource reviewed changes in the commit range b76c8fa..a4cd1fb on this pull request. Below is the summary for the review, and you can see the individual issues we found as review comments.

For detailed review results, please see the PR on DeepSource ↗

Important

Some issues found as part of this review are outside of the diff in this pull request and aren't shown in the inline review comments due to GitHub's API limitations. Please see the DeepSource dashboard for this PR to view those issues.

PR Report Card

Security × 5 issues Overall PR Quality   

Focus Area: Reliability

Guidance
Address the critical issue of using a non-iterable `node_data_obj.doc_metadata` in a `for` loop in `api/core/workflow/nodes/knowledge_index/knowledge_index_node.py`.

Grade capped at D due to multiple critical issues
Reliability × 15 issues
Complexity × 26 issues
Hygiene × 20 issues

Code Review Summary

Analyzer Status Summary Details
JavaScript 1 new issue detected. Review ↗
Python 68 new issues detected. 2 existing issues fixed. Review ↗
Secrets No new issues detected. Review ↗
How are these analyzer statuses calculated?

Administrators can configure which issue categories are reported and cause analysis to be marked as failed when detected. This helps prevent bad and insecure code from being introduced in the codebase. If you're an administrator, you can modify this in the repository's settings.


💡 If you're a repository administrator, you can configure the quality gates from the settings.

@jaffrey-deepsource jaffrey-deepsource changed the title Feat rag pipline metadata Add comprehensive document metadata support in Knowledge Base workflow node Feb 12, 2026
<Datepicker
className="h-full w-full"
value={item.value as number}
onChange={value => handleDocMetadataValueChange(index, value || 0)}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`null` date from `Datepicker` is coerced to `0`


The onChange handler for the Datepicker component coerces a null or undefined value into 0. This results in storing the Unix epoch timestamp (1970-01-01T00:00:00Z) when a user clears the date field, which is semantically incorrect and leads to storing misleading data.

The component state and backend should ideally support null to represent an unset date. Modify the handler to pass value || null instead of value || 0, and adjust the DocMetadataItem type if necessary to allow null values.


# doc_metadata variables
if node_data_obj.doc_metadata:
for item in node_data_obj.doc_metadata:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-iterable `node_data_obj.doc_metadata` used in `for` loop


The code attempts to iterate over node_data_obj.doc_metadata assuming it is iterable. If doc_metadata is None or not an iterable, the for loop will raise a TypeError, breaking the program execution.

Ensure node_data_obj.doc_metadata is always an iterable before the loop, or add a conditional check or default to an empty iterable to prevent this error.

patch("services.dataset_service.redis_client") as mock_redis,
patch("services.dataset_service.DocumentService.build_document") as mock_build_document,
patch("services.dataset_service.current_user") as mock_current_user,
patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_indexing_task,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable `mock_indexing_task` clutters code


The variable mock_indexing_task is created by patching "services.dataset_service.DocumentIndexingTaskProxy" but is not used anywhere in the code block. This adds clutter and can mislead maintainers or reviewers about its intent.
Rename the variable to _ or _unused_mock_indexing_task, or remove the assignment entirely if it's unnecessary to signify that the variable is intentionally unused.

Comment on lines +2520 to +2521
if custom_metadata:
doc_metadata.update(custom_metadata)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`dict.update` allows user metadata to overwrite system-generated fields


The custom_metadata provided by the user is merged into the system-generated doc_metadata dictionary. This can lead to a situation where a user-defined metadata field with a name like source overwrites a critical, system-managed built-in field, leading to data corruption and unexpected behavior.

To prevent this, check for key collisions before merging. You can iterate through the custom metadata keys and raise an error if a key is already present in the built-in metadata, or alternatively, ensure built-in fields are not overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants