fix: update chat components#10946
Conversation
WalkthroughThis PR introduces MCP Composer feature flag renaming, adds branch parameter propagation to CI workflows, bumps project version and dependencies, refactors knowledge base and file components for Docling support, adds auth settings handling for MCP projects, restructures agent provider logic, renames dozens of component display labels, adds replacement attributes to components, and modifies database constraints. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ProjectAPI as Project API
participant AuthHelper as Auth Helper
participant MCPComposer as MCP Composer
participant DB as Database
User->>ProjectAPI: update_project(auth_settings={"type":"oauth"})
ProjectAPI->>AuthHelper: handle_auth_settings_update(existing_project, new_auth_settings)
AuthHelper->>DB: decrypt current auth_settings
AuthHelper->>AuthHelper: merge/preserve masked secrets
AuthHelper->>DB: encrypt and save updated auth_settings
AuthHelper-->>ProjectAPI: {should_start_composer: true, should_stop_composer: false}
ProjectAPI->>MCPComposer: register_project_with_composer(project)
MCPComposer-->>ProjectAPI: registration complete
ProjectAPI-->>User: project updated, composer started
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
src/backend/base/langflow/components/icosacomputing/combinatorial_reasoner.py (2)
81-84: Potential AttributeError if called beforebuild_prompt().The method accesses
self.reasons(line 83), which is only set inbuild_prompt()(line 78). Ifbuild_reasons()is called independently or beforebuild_prompt(), it will raise anAttributeError.Consider adding defensive initialization:
+ def __init__(self, **kwargs): + super().__init__(**kwargs) + self.reasons = [] + def build_reasons(self) -> Data: # list of selected reasons + if not hasattr(self, "reasons"): + return Data(value=[]) final_reasons = [reason[0] for reason in self.reasons] return Data(value=final_reasons)
60-79: Return type mismatch: method declaresMessagebut returns string.Line 79 returns a plain string (
prompt), but the method signature declares-> Message. This violates the return type contract. All similar components in the codebase wrap string returns withMessage(text=...).Fix:
def build_prompt(self) -> Message: params = { "prompt": self.prompt, "apiKey": self.openai_api_key, "model": self.model_name, } creds = HTTPBasicAuth(self.username, password=self.password) response = requests.post( "https://cr-api.icosacomputing.com/cr/langflow", json=params, auth=creds, timeout=100, ) response.raise_for_status() prompt = response.json()["prompt"] self.reasons = response.json()["finalReasons"] - return prompt + return Message(text=prompt)src/backend/base/langflow/components/input_output/chat.py (1)
32-77: This removal breaks active test code and will cause test failures.The removal of
background_color,chat_icon, andtext_colorinputs is a breaking change. Test code insrc/backend/tests/unit/custom/component/test_component_instance_attributes.py(lines 47, 55-56) explicitly callsbuild()with these parameters and asserts these attributes exist on ChatInput instances. With these inputs removed from the component's inputs list, thebuild()method will no longer accept these parameters, causing test failures.Any saved workflows or serialized ChatInput configurations using these styling inputs will also fail to deserialize correctly. Before removing these inputs, either update the affected tests and migrate existing saved workflows, or add backwards compatibility handling to accept and silently ignore these parameters.
src/backend/base/langflow/components/input_output/chat_output.py (1)
35-78: Fix incomplete removal of styling and clean_data inputs—code at line 174 still referencesclean_data.The inputs
background_color,chat_icon,text_color, andclean_datawere removed from the inputs list (lines 35–78), but the code at line 174 still attempts to accessself.clean_dataviagetattr(self, "clean_data", False). Additionally, the test file (test_chat_output_component.py:26–29) still expects these parameters, meaning component instantiation will silently ignore them. Either restore these inputs to the list, or remove the reference toclean_dataat line 174 if the feature is intentionally deprecated.docs/docs/Integrations/Notion/Meeting_Notes_Agent.json (4)
2097-2097: Network calls lack timeouts (risk of hangs) and retries.All Notion components use requests without timeouts. Add a reasonable timeout and a simple retry/backoff to avoid UI hangs and flaky failures.
Example adjustment (apply similarly across components):
- response = requests.post(url, headers=headers, json=data) + response = requests.post(url, headers=headers, json=data, timeout=30) response.raise_for_status()Optional lightweight retry:
import time for attempt in range(3): try: resp = requests.get(url, headers=headers, timeout=30) resp.raise_for_status() break except requests.exceptions.RequestException as e: if attempt == 2: raise time.sleep(2 * (attempt + 1))Also applies to: 2229-2229, 2378-2378, 2504-2504, 2653-2653, 120-120, 229-229
2504-2504: Redact sensitive data from logs in NotionPageUpdate.You log the full request payload and response body. This can leak PII and internal data.
- logger.info(f"Sending request to Notion API: URL: {url}, Data: {json.dumps(data)}") + logger.info(f"Sending request to Notion API: URL: {url}, Data: <redacted>") ... - logger.info(f"Successfully updated Notion page. Response: {json.dumps(updated_page)}") + logger.info("Successfully updated Notion page.")
1081-1104: Dropdown default points to a value not present in options ("gpt-4.1").This breaks the default selection and may block flow import/use in the UI. Add "gpt-4.1" to the options array (or enable combobox) in both OpenAIModel blocks at lines 1096 and 1881.
"model_name": { "options": [ + "gpt-4.1", "gpt-4o-mini", "gpt-4o", "gpt-4-turbo", "gpt-4-turbo-preview", "gpt-4", "gpt-3.5-turbo", "gpt-3.5-turbo-0125" ], "combobox": false, "name": "model_name", "value": "gpt-4.1"
2378-2424: Update Notion API version to latest stable and centralize the constant.The version "2022-06-28" is hardcoded in multiple code blocks embedded in this file (lines 120, 228, 2228, 2377, 2503, 2652, 3156). As of December 2025, the latest stable Notion API version is 2025-09-03. Extract the version to a single constant to prevent drift and enable easier updates across all components using it.
src/backend/base/langflow/components/docling/docling_remote.py (1)
171-192: Fix misalignment betweenfile_listandprocessed_datawhen some files have no pathRight now
processed_datais appended to in submission order while you ignore the stored index fromfutures. If anyfile.pathisNone, the positions inprocessed_datano longer line up withfile_list, which can causerollup_datato associate results with the wrong files.Pre‑allocate
processed_dataand assign by index to keep everything aligned:- processed_data: list[Data | None] = [] + processed_data: list[Data | None] = [None] * len(file_list) with ( httpx.Client(headers=self.api_headers) as client, ThreadPoolExecutor(max_workers=self.max_concurrency) as executor, ): futures: list[tuple[int, Future]] = [] for i, file in enumerate(file_list): if file.path is None: - processed_data.append(None) - continue - - futures.append((i, executor.submit(_convert_document, client, file.path, docling_options))) - - for _index, future in futures: + continue + + futures.append((i, executor.submit(_convert_document, client, file.path, docling_options))) + + for index, future in futures: try: result_data = future.result() - processed_data.append(result_data) + processed_data[index] = result_data except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc: self.log(f"Docling remote processing failed: {exc}") raise
♻️ Duplicate comments (2)
src/backend/base/langflow/components/logic/flow_tool.py (1)
17-17: Inconsistency: Display name no longer indicates deprecation whilelegacyflag remains true.Same issue as in
sub_flow.py: removing "[Deprecated]" from the display name while keepinglegacy: bool = Truecreates confusion. Users won't realize this component is deprecated and should migrate tologic.RunFlow.Apply the same resolution approach as suggested for the SubFlowComponent.
Also applies to: 22-23
src/backend/base/langflow/components/input_output/chat_output.py (1)
87-89: Consider simplifying the getattr usage.Same as the comment for
ChatInput.get_artifacts(): since_artifactsis always initialized in__init__, you could simplify toreturn self._artifacts.
🟡 Minor comments (5)
src/backend/base/langflow/alembic/versions/d37bc4322900_drop_single_constraint_on_files_name_.py-128-154 (1)
128-154: Downgrade may fail silently if duplicate names exist across users.After upgrade, users can have files with the same name. Downgrade will fail on the INSERT if duplicate names exist (violating the restored
UNIQUE(name)constraint). Consider adding a pre-flight check:if db_dialect == "sqlite": # SQLite: Recreate table with both constraints print("SQLite: Recreating table with both constraints") + + # Check for duplicate names that would violate the single UNIQUE constraint + result = conn.execute(sa.text( + "SELECT name, COUNT(*) as cnt FROM file GROUP BY name HAVING COUNT(*) > 1" + )) + duplicates = result.fetchall() + if duplicates: + raise ValueError( + f"Cannot downgrade: {len(duplicates)} file names have duplicates across users. " + "Resolve duplicates before downgrading." + )Committable suggestion skipped: line range outside the PR's diff.
src/backend/base/langflow/api/v1/auth_helpers.py-13-23 (1)
13-23: Docstring is incomplete—missingshould_handle_composerreturn key.The docstring lists only
should_start_composerandshould_stop_composer, but the function also returnsshould_handle_composer(line 74)."""Handle auth settings update including encryption/decryption and MCP Composer logic. Args: existing_project: The project being updated (modified in-place) new_auth_settings: New auth settings (could be dict, Pydantic model, or None) Returns: Dict containing: - should_start_composer: bool - should_stop_composer: bool + - should_handle_composer: bool """src/backend/base/langflow/api/v1/mcp_projects.py-560-561 (1)
560-561: Bug:generated_api_keyis never assigned, so the success message omits API key info.
generated_api_keyis initialized toNone(line 560) but never reassigned. The actual API key is stored inlangflow_api_key(line 635). As a result, the condition at line 714 is alwaysFalseand the message about API key authentication is never included.Apply this diff to fix:
if should_generate_api_key: async with session_scope() as api_key_session: api_key_create = ApiKeyCreate(name=f"MCP Server {project.name}") api_key_response = await create_api_key(api_key_session, api_key_create, current_user.id) langflow_api_key = api_key_response.api_key + generated_api_key = langflow_api_key args.extend(["--headers", "x-api-key", langflow_api_key])Also applies to: 631-636, 714-716
src/backend/base/langflow/api/v1/mcp_projects.py-1152-1156 (1)
1152-1156: Inconsistent encryption: Auth settings not encrypted when resetting OAuth projects.At line 1155, when resetting OAuth projects,
clean_auth.model_dump()is used directly without encryption. However, at line 1138,encrypt_auth_settings()is called. This inconsistency could cause issues if sensitive fields are added later or for consistency in how auth_settings are stored.Consider encrypting for consistency:
fallback_auth_type = "apikey" if not settings_service.auth_settings.AUTO_LOGIN else "none" clean_auth = AuthSettings(auth_type=fallback_auth_type) - project.auth_settings = clean_auth.model_dump(exclude_none=True) + project.auth_settings = encrypt_auth_settings(clean_auth.model_dump(exclude_none=True))src/backend/base/langflow/components/huggingface/huggingface.py-107-109 (1)
107-109: Minor formatting issue: Missing space in display name.The
display_namevalue "HuggingFace HubAPI Token" appears to be missing a space between "Hub" and "API". For consistency and readability, consider:SecretStrInput( - name="huggingfacehub_api_token", display_name="HuggingFace HubAPI Token", password=True, required=True + name="huggingfacehub_api_token", display_name="HuggingFace Hub API Token", password=True, required=True ),
🧹 Nitpick comments (36)
src/backend/base/langflow/components/agentql/agentql_api.py (1)
127-127: Consider refactoring to use async HTTP client.The synchronous
httpx.post()call blocks the event loop. Per coding guidelines, component execution methods should use async patterns withawaitfor I/O operations.Consider refactoring to:
+ async def build_output(self) -> Data: endpoint = "https://api.agentql.com/v1/query-data" headers = { "X-API-Key": self.api_key, "Content-Type": "application/json", "X-TF-Request-Origin": "langflow", } payload = { "url": self.url, "query": self.query, "prompt": self.prompt, "params": { "mode": self.mode, "wait_for": self.wait_for, "is_scroll_to_bottom_enabled": self.is_scroll_to_bottom_enabled, "is_screenshot_enabled": self.is_screenshot_enabled, }, "metadata": { "experimental_stealth_mode_enabled": self.is_stealth_mode_enabled, }, } if not self.prompt and not self.query: self.status = "Either Query or Prompt must be provided." raise ValueError(self.status) if self.prompt and self.query: self.status = "Both Query and Prompt can't be provided at the same time." raise ValueError(self.status) try: + async with httpx.AsyncClient() as client: + response = await client.post(endpoint, headers=headers, json=payload, timeout=self.timeout) - response = httpx.post(endpoint, headers=headers, json=payload, timeout=self.timeout) response.raise_for_status()Based on coding guidelines: "Use FastAPI async patterns with
awaitfor async operations in component execution methods".src/backend/base/langflow/components/milvus/milvus.py (1)
43-46: Fix typo in consistency level labelMinor UX nit:
display_name="Consistencey Level"has a typo. Consider correcting it todisplay_name="Consistency Level"for a more polished UI.src/backend/base/langflow/components/clickhouse/clickhouse.py (1)
29-29: Improved clarity in the display name.The new display name "Clickhouse Password" is clearer and more concise than the previous text.
Optional: Consider brand name consistency.
For consistency with line 28 which uses "ClickHouse" (capital H), consider using the official brand capitalization:
- SecretStrInput(name="password", display_name="Clickhouse Password", required=True), + SecretStrInput(name="password", display_name="ClickHouse Password", required=True),src/backend/base/langflow/components/data/url.py (1)
215-215: Good improvement to filter out None header values.The change correctly prevents None values from being passed as HTTP headers, which improves request robustness.
Consider also checking that
header["key"]is not None for symmetry:- headers_dict = {header["key"]: header["value"] for header in self.headers if header["value"] is not None} + headers_dict = {header["key"]: header["value"] for header in self.headers if header["key"] is not None and header["value"] is not None}src/backend/base/langflow/components/chroma/chroma.py (1)
124-125: DRY violation: duplicate limit normalization logic.The same limit normalization expression appears twice. Consider extracting it to a cached property or helper to avoid duplication and ensure consistent behavior.
+ @property + def _normalized_limit(self) -> int | None: + """Normalize limit to int or None.""" + if self.limit is not None and str(self.limit).strip(): + return int(self.limit) + return None + @override @check_cached_vector_store def build_vector_store(self) -> Chroma:Then replace both occurrences:
- limit = int(self.limit) if self.limit is not None and str(self.limit).strip() else None - self.status = chroma_collection_to_data(chroma.get(limit=limit)) + self.status = chroma_collection_to_data(chroma.get(limit=self._normalized_limit))- limit = int(self.limit) if self.limit is not None and str(self.limit).strip() else None - stored_data = chroma_collection_to_data(vector_store.get(limit=limit)) + stored_data = chroma_collection_to_data(vector_store.get(limit=self._normalized_limit))Also applies to: 142-143
src/backend/base/langflow/base/agents/events.py (1)
100-119: Consider removing duplicate code.Lines 100-106 and 107-113 contain identical logic for handling dictionary items with
"text"keys and"tool_use"types. The duplication can be eliminated by removing one of these blocks.Apply this diff to remove the duplication:
if isinstance(item, dict): if "text" in item: return item["text"] # If the item's type is "tool_use", return an empty string. # This likely indicates that "tool_use" outputs are not meant to be displayed as text. if item.get("type") == "tool_use": return "" - if isinstance(item, dict): - if "text" in item: - return item["text"] - # If the item's type is "tool_use", return an empty string. - # This likely indicates that "tool_use" outputs are not meant to be displayed as text. - if item.get("type") == "tool_use": - return "" # This is a workaround to deal with function calling by Anthropic # since the same data comes in the tool_output we don't need to stream it here # although it would be nice to if "partial_json" in item: return "" msg = f"Output is not a string or list of dictionaries with 'text' key: {output}" raise TypeError(msg)src/backend/base/langflow/alembic/versions/d37bc4322900_drop_single_constraint_on_files_name_.py (1)
96-98: Consider adding a comment explaining the f-string usage for DDL.The constraint name comes from PostgreSQL system catalogs (trusted), and DDL statements don't support parameterized identifiers. While this is acceptable, a brief comment would clarify the safety rationale for future maintainers.
if constraints_to_drop: for constraint_name in constraints_to_drop: + # constraint_name comes from pg_constraint system table (trusted source) + # DDL statements don't support parameterized identifiers op.execute(f'ALTER TABLE "file" DROP CONSTRAINT "{constraint_name}"') print(f"PostgreSQL: Dropped constraint {constraint_name}")src/backend/base/langflow/components/logic/conditional_router.py (1)
146-159: Consider encapsulating conditional exclusion clearing in a graph method.Lines 148-151 directly manipulate
self.graph.conditional_exclusion_sourcesandself.graph.conditionally_excluded_vertices, duplicating the clearing logic already present inexclude_branch_conditionally(). This creates coupling to the graph's internal structure.Consider adding a dedicated method like
clear_conditional_exclusions(vertex_id)to the graph class, or reusing the existing clearing logic inexclude_branch_conditionally()which already clears previous exclusions from the same source vertex before applying new ones.src/backend/base/langflow/__main__.py (1)
669-701: Safer handling of default superuser password, but consider type‑safety and help text accuracyUsing
DEFAULT_SUPERUSER_PASSWORD.get_secret_value()and avoiding logging the default password is a solid security improvement. Two small follow‑ups to consider:
This assumes
DEFAULT_SUPERUSER_PASSWORDis a SecretStr‑like object. If there’s any chance it can be a plain string (now or in future refactors), a defensive pattern would avoid anAttributeError, e.g.:username = DEFAULT_SUPERUSERpassword = DEFAULT_SUPERUSER_PASSWORD.get_secret_value()
username = DEFAULT_SUPERUSERdefault_pw = DEFAULT_SUPERUSER_PASSWORDpassword = (default_pw.get_secret_value()if hasattr(default_pw, "get_secret_value")else str(default_pw))
- The CLI option help still says the password defaults to
'langflow'whenAUTO_LOGINis enabled, which may now be inaccurate if the default comes from configuration/secret settings. You might want to reword it to something like “Defaults to the configured AUTO_LOGIN password” so it stays correct if the secret changes.src/backend/base/langflow/components/crewai/sequential_crew.py (1)
32-32: Pre-existing: Return type annotation mismatch.The method declares
-> Messagebut returns aCrewobject. Consider fixing the return type annotation in a follow-up.- def build_crew(self) -> Message: + def build_crew(self) -> "Crew":src/backend/base/langflow/components/agents/agent.py (3)
45-45: Consider deriving provider list from central source.This hardcoded list could drift from
MODEL_PROVIDERS_DICT. If the subset is intentional, consider adding a comment explaining which providers are intentionally included/excluded and why.
530-530: Remove stray comment.Line 530 has a double comment marker (
# #) that appears to be leftover from development.- # # Update with custom component + # Update with custom component
541-551: Consider extracting duplicatedexternal_optionsstructure.This
external_optionsconfiguration is duplicated from lines 76-86. Extract to a module-level constant to maintain consistency and reduce duplication.EXTERNAL_OPTIONS_CONFIG = { "fields": { "data": { "node": { "name": "connect_other_models", "display_name": "Connect other models", "icon": "CornerDownLeft", } } }, }src/backend/base/langflow/components/input_output/chat.py (1)
82-84: Consider simplifying the getattr usage.Since
_artifactsis always initialized in__init__, thegetattrdefault value will never be used. You could simplify toreturn self._artifacts, though the current defensive pattern is acceptable if you want to guard against edge cases.Apply this diff if you prefer the simpler approach:
def get_artifacts(self) -> dict: """Return artifacts dictionary for framework compatibility.""" - return getattr(self, "_artifacts", {}) + return self._artifactssrc/backend/base/langflow/components/data/api_request.py (2)
188-201: Top-levelDataunwrapping in_process_bodyfixes dropped-body issueUnwrapping any
bodywith a.dataattribute before type checks is a good way to supportData-wrapped bodies (previously these would fall through and become{}).If you don’t expect arbitrary objects with a
.dataattribute here, consider tightening this toisinstance(body, Data)for slightly clearer intent and to avoid surprising unwrapping of unrelated types in the future.
213-235: Clarify/adjust early-return semantics when list items unwrap to non key/value dictsThe per-item
Dataunwrap and use ofcurrent_itemlook good, but this block has two subtle behavior changes:
return unwrapped_datainside the loop means the first item whose.datais a dict but not key/value format will short‑circuit the whole list, discarding any entries already accumulated inprocessed_dict.- That returned dict bypasses
_process_dict_body, so its values won’t go through_parse_json_value, unlike other dict bodies.If the intent is “the body is actually a single dict wrapped in a one‑element list of
Data”, consider tightening this to something like:
- Only doing the direct‑dict return when
len(body) == 1, and/or- Returning
self._process_dict_body(unwrapped_data)for consistency.Otherwise, add a brief comment documenting the assumption so future changes don’t accidentally rely on the current short‑circuit behavior.
src/backend/base/langflow/components/firecrawl/firecrawl_extract_api.py (1)
107-115: Verify optional extraction flags referenced but not defined as inputsIn
paramsyou accessself.ignore_sitemap,self.include_subdomains, andself.show_sources, but the correspondingBoolInputentries are commented out in theinputslist. UnlessComponentguarantees these attributes exist with defaults, callingextractcould raiseAttributeError.Consider either restoring those
BoolInputs (with appropriate defaults) or guarding access viagetattr(self, "ignore_sitemap", False)/ similar for the others.src/backend/base/langflow/components/helpers/memory.py (1)
193-224: Unify “last N messages” semantics between external and internal memoryThe external-memory branch now clearly implements “last N messages, then reverse if order is DESC”, which is easy to reason about:
aget_messages()(ascending),stored = stored[-n_messages:],- optionally
stored = stored[::-1]whenorder == "DESC".For internal memory, you currently:
- pass
orderthrough toaget_messages(..., order=order), and- then do
stored = stored[-n_messages:]without an additional reverse.If
aget_messageshonorsorder, then whenorder == "DESC"you’ll get newest-first from the DB, andstored[-n_messages:]will give you the oldest N messages rather than the latest N, which differs from the external-memory behavior and from the “Get last N messages” comment.To make semantics consistent (last N messages window, with order only affecting presentation), consider always fetching internal messages in ascending order and then mirroring the external pattern:
- else: - # For internal memory, we always fetch the last N messages by ordering by DESC - stored = await aget_messages( - sender=sender_type, - sender_name=sender_name, - session_id=session_id, - limit=10000, - order=order, - ) - if n_messages: - stored = stored[-n_messages:] # Get last N messages + else: + # For internal memory, fetch messages in ascending order, then + # apply the same "last N then optional reverse" pattern as external memory. + stored = await aget_messages( + sender=sender_type, + sender_name=sender_name, + session_id=session_id, + limit=10000, + order="ASC", + ) + if n_messages: + stored = stored[-n_messages:] + if order == "DESC": + stored = stored[::-1]This keeps “Number of Messages” semantics identical across internal/external memory and avoids the window depending on the chosen sort direction.
src/backend/base/langflow/api/v2/files.py (2)
30-33: Remove unnecessaryasyncfrom function.
get_mcp_filedoesn't perform anyawaitoperations; it's pure string manipulation. Making it a regular function avoids the overhead of creating a coroutine.-async def get_mcp_file(current_user: CurrentActiveUser, *, extension: bool = False) -> str: +def get_mcp_file(current_user: CurrentActiveUser, *, extension: bool = False) -> str: # Create a unique MCP servers file with the user id appended return f"{MCP_SERVERS_FILE}_{current_user.id!s}" + (".json" if extension else "")Note: This would require updating all call sites to remove
await, so this may be deferred if the scope is too large.
123-128: Consider consolidating duplicate calls toget_mcp_file.
get_mcp_fileis called twice with differentextensionvalues. You could simplify by computing the base name once:- mcp_file = await get_mcp_file(current_user) - mcp_file_ext = await get_mcp_file(current_user, extension=True) + mcp_file = await get_mcp_file(current_user) + mcp_file_ext = f"{mcp_file}.json"This reduces one async call and makes the relationship explicit.
src/backend/base/langflow/api/v2/mcp.py (1)
88-91: Redundantget_mcp_filecall.
mcp_filewas already computed at line 55 and should still be valid here. The variable can be reused instead of recomputing:# Fetch and download again - mcp_file = await get_mcp_file(current_user) server_config_file = await get_file_by_name(mcp_file, current_user, session)src/backend/base/langflow/api/v1/auth_helpers.py (1)
47-51: Potential KeyError if Pydantic model lacks expected fields.
model_dumpmay not include fields that don't exist on the model. Accessingauth_dict[field]aftergetattrcheck could fail if the field was excluded. Consider using.get()for safer access:secret_fields = ["api_key", "oauth_client_secret"] for field in secret_fields: field_val = getattr(new_auth_settings, field, None) if isinstance(field_val, SecretStr): - auth_dict[field] = field_val.get_secret_value() + if field in auth_dict or field_val is not None: + auth_dict[field] = field_val.get_secret_value()However, since you're checking
isinstance(field_val, SecretStr), this should be safe in practice. Consider this a minor defensive coding suggestion.src/backend/base/langflow/api/v1/projects.py (2)
86-89:new_project.idmay beNonein the log message before commit.At this point,
new_project.idhasn't been assigned by the database yet (happens aftersession.commit()). The log message will showNonefor the project ID.Consider moving this log after the commit/refresh, or log only the project name:
- await logger.adebug( - f"Auto-enabled API key authentication for project {new_project.name} " - f"({new_project.id}) due to AUTO_LOGIN=false" - ) + # Log after commit when ID is available, or just log the name here + await logger.adebug( + f"Auto-enabled API key authentication for project {new_project.name} " + "due to AUTO_LOGIN=false" + )
420-423: Same issue:new_project.idwill beNonein log message.Same as in
create_project—the ID isn't assigned until after commit. Consider logging only the name or moving the log after commit.src/backend/base/langflow/api/v1/mcp_projects.py (3)
82-84: Authentication logic is clear but condition could use clarification.The condition handles two cases:
- No auth_settings AND AUTO_LOGIN=false → require API key
- auth_settings with auth_type="apikey" → require API key
This is correct, but consider adding a brief comment to clarify the intent:
+ # Require API key authentication when: + # 1. No project auth settings but AUTO_LOGIN is disabled globally + # 2. Project explicitly requires API key authentication if (not auth_settings and not settings_service.auth_settings.AUTO_LOGIN) or ( auth_settings and auth_settings.auth_type == "apikey" ):
473-475: Unnecessary None check:get_project_sse_urlalways returns a string.Looking at
get_project_sse_url(line 858), it always returns a string URL. This None check will never trigger.# Provide the direct SSE URL since we're no longer using composer sse_url = await get_project_sse_url(project_id) - if not sse_url: - raise HTTPException(status_code=500, detail="Failed to get direct SSE URL")If you want to keep defensive coding, consider checking for empty string instead, but the function won't return that either.
776-784: Duplicate project fetch: remove the first query.The project is fetched twice: once at lines 777-782, then again via
verify_project_accessat line 784. The first query is now redundant.Apply this diff to remove the duplicate:
try: - # Verify project exists and user has access - async with session_scope() as session: - project = ( - await session.exec(select(Folder).where(Folder.id == project_id, Folder.user_id == current_user.id)) - ).first() - - if not project: - raise HTTPException(status_code=404, detail="Project not found") - project = await verify_project_access(project_id, current_user)docs/docs/Integrations/Notion/Meeting_Notes_Agent.json (3)
116-176: Handle Notion pagination (users, search).Current implementations return only the first page. Iterate on has_more/next_cursor to avoid missing results.
Sketch:
results, cursor = [], None while True: payload = {"query": query, "start_cursor": cursor} if cursor else {"query": query} r = requests.post(url, headers=headers, json=payload, timeout=30) r.raise_for_status() data = r.json() results.extend(data.get("results", [])) if not data.get("has_more"): break cursor = data.get("next_cursor")Also applies to: 2641-2749, 110-160, 3145-3224
151-159: Trim trailing spaces in display_name values.Minor UX polish; trailing spaces can affect search/filter in the UI.
Targets: “List Users ”, “Search ”, “Add Content to Page ”, “Create Page ”, “List Database Properties ”, “Update Page Property ”.
Also applies to: 322-324, 2154-2161, 2303-2306, 2431-2434, 2579-2582
111-119: Remove unused variables.combined_text is built but never used in NotionUserList.run_model and NotionSearch.run_model.
- combined_text = "" ... - combined_text += textAlso applies to: 216-224
src/backend/base/langflow/components/docling/docling_remote.py (1)
105-115: Docling v1 base URL and newsourcespayload shape look fine; consider normalizingapi_urlUsing
/v1and the new"sources"structure matches the updated Docling Serve API, but ifapi_urlis configured with a trailing/, you’ll end up with a double slash in the final URL. It’s harmless for most servers but easy to avoid:- base_url = f"{self.api_url}/v1" + base_url = f"{self.api_url.rstrip('/')}/v1"src/backend/base/langflow/base/data/docling_utils.py (2)
13-19: NewDoclingDependencyErroris reasonable; ensure it’s wired into callersThe custom exception shape (with
dependency_nameandinstall_command) is clear and useful for surfacing actionable messages. Just make sure it’s actually raised somewhere in the codebase (e.g., when mapping queue messages or validating environment) so it doesn’t become dead code.
207-238: Dependency error classification is helpful; consider centralizing/error‑string robustnessClassifying file‑processing errors into structured
dependency_errormessages with adependency_nameis a good improvement for UX. However, the detection currently relies on matching specific substrings inerror_msg, which can be brittle if upstream exceptions change wording.If this starts to drift, consider either:
- Centralizing the mapping of known error messages → dependency metadata in one helper, or
- Raising a dedicated dependency exception closer to where the errors originate and catching that here, instead of relying on string inspection.
For now this is acceptable, but worth keeping in mind for long‑term maintainability.
src/backend/base/langflow/components/docling/docling_inline.py (1)
220-247: Handle workerimport_errorseparately so callers see anImportErrorinstead of a genericRuntimeErrorThe new
dependency_errorhandling and the graceful‑shutdown path look good. However, the worker can also send{"error_type": "import_error", ...}for import‑related failures, and that currently falls through to the genericRuntimeErrorbranch.To keep error semantics consistent, consider treating
import_errorsimilarly to the other ImportError cases:- # Enhanced error checking with dependency-specific handling + # Enhanced error checking with dependency-specific handling if isinstance(result, dict) and "error" in result: error_msg = result["error"] # Handle dependency errors specifically if result.get("error_type") == "dependency_error": @@ - raise ImportError(user_message) + raise ImportError(user_message) + + # Handle import errors from the worker + if result.get("error_type") == "import_error": + raise ImportError(error_msg) # Handle other specific errors if error_msg.startswith("Docling is not installed"): raise ImportError(error_msg) @@ else: raise RuntimeError(error_msg)This way, all import‑related failures surface as
ImportError, while true processing issues remainRuntimeError.src/backend/base/langflow/components/data/file.py (2)
509-511: Enhance file path validation for subprocess security.The current validation checks for shell metacharacters but could be more robust. Consider using
pathlib.Pathto resolve and validate the path, and check for path traversal attempts.Apply this diff to improve path validation:
- # Validate file_path to avoid command injection or unsafe input - if not isinstance(args["file_path"], str) or any(c in args["file_path"] for c in [";", "|", "&", "$", "`"]): - return Data(data={"error": "Unsafe file path detected.", "file_path": args["file_path"]}) + # Validate file_path to avoid command injection or unsafe input + from pathlib import Path + if not isinstance(args["file_path"], str): + return Data(data={"error": "Invalid file path type.", "file_path": str(args["file_path"])}) + try: + resolved_path = Path(args["file_path"]).resolve() + # Ensure the path exists and is a file + if not resolved_path.is_file(): + return Data(data={"error": "File path must be a valid file.", "file_path": args["file_path"]}) + except (ValueError, OSError) as e: + return Data(data={"error": f"Invalid file path: {e}", "file_path": args["file_path"]})
309-312: Optional: Simplify OCR engine conditional logic.The condition can be simplified for better readability.
Apply this diff:
"pipeline": str(self.pipeline), - "ocr_engine": ( - self.ocr_engine if self.ocr_engine and self.ocr_engine != "None" and self.pipeline != "vlm" else None - ), + "ocr_engine": ( + self.ocr_engine if self.ocr_engine not in (None, "None") and self.pipeline == "standard" else None + ),
| # Should enable the MCP composer feature in MCP projects | ||
| # Values: true, false | ||
| # Default: true | ||
| LANGFLOW_FEATURE_MCP_COMPOSER= | ||
| # Default: false | ||
| LANGFLOW_MCP_COMPOSER_ENABLED= | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the new env var name is used in the codebase and old name is not referenced
echo "=== Checking for new env var usage ==="
rg -n "MCP_COMPOSER_ENABLED" --type py
echo ""
echo "=== Checking for old env var references ==="
rg -n "FEATURE_MCP_COMPOSER" --type py
echo ""
echo "=== Checking for both in all file types ==="
echo "New var (all types):"
rg -n "MCP_COMPOSER_ENABLED" -t yaml -t json -t py -t md 2>/dev/null | head -20
echo ""
echo "Old var (all types):"
rg -n "FEATURE_MCP_COMPOSER" -t yaml -t json -t py -t md 2>/dev/null | head -20Repository: langflow-ai/langflow
Length of output: 235
🏁 Script executed:
#!/bin/bash
# Search more broadly for MCP references
echo "=== Searching for MCP references in Python files ==="
rg -i "mcp" --type py | head -30
echo ""
echo "=== Searching for COMPOSER references ==="
rg -i "composer" --type py | head -20
echo ""
echo "=== Checking .env.example for all LANGFLOW_ variables ==="
grep -n "LANGFLOW_" .env.example | head -20
echo ""
echo "=== Checking how environment variables are loaded ==="
rg -n "os.getenv|os.environ" --type py | grep -i "langflow" | head -15Repository: langflow-ai/langflow
Length of output: 8835
🏁 Script executed:
#!/bin/bash
# Check MCP Composer service implementation and how it's conditionally started
echo "=== Checking MCPComposerService implementation ==="
cat -n src/backend/base/langflow/services/mcp_composer/service.py | head -60
echo ""
echo "=== Checking where MCPComposerService is started ==="
rg -A 5 -B 5 "mcp_composer_service.start" src/backend/base/langflow/main.py
echo ""
echo "=== Checking settings/config files for feature flags ==="
fd -e py "settings" | grep -E "(settings|config)" | head -10
echo ""
echo "=== Checking for pydantic settings configuration ==="
rg -n "class Settings" --type py | head -5Repository: langflow-ai/langflow
Length of output: 3792
🏁 Script executed:
#!/bin/bash
# Check Settings class for mcp_composer_enabled field
echo "=== Checking Settings class for mcp_composer_enabled ==="
rg -n "mcp_composer_enabled" src/backend/base/langflow/services/settings/base.py -B 2 -A 2
echo ""
echo "=== Full Settings class to see field definitions ==="
head -200 src/backend/base/langflow/services/settings/base.py | tail -150
echo ""
echo "=== Search for any FEATURE_MCP_COMPOSER in settings ==="
rg -n "FEATURE_MCP_COMPOSER" src/backend/base/langflow/services/settings/base.pyRepository: langflow-ai/langflow
Length of output: 7119
Environment variable naming is correct, but default value in .env.example conflicts with backend code.
The variable rename from LANGFLOW_FEATURE_MCP_COMPOSER to LANGFLOW_MCP_COMPOSER_ENABLED is correct—the backend already uses mcp_composer_enabled in the Settings class (line 281 in src/backend/base/langflow/services/settings/base.py). However, the default value has a critical inconsistency: the Settings class defaults to True, while the .env.example change sets it to empty (which would default to False). Align the .env.example default with the backend implementation, or update the Settings class default if the intent is to disable MCP Composer by default.
🤖 Prompt for AI Agents
In .env.example around lines 104 to 108, the LANGFLOW_MCP_COMPOSER_ENABLED
variable is present but its example/default value is empty, which conflicts with
the backend Settings class that defaults mcp_composer_enabled to True; update
the .env.example to set LANGFLOW_MCP_COMPOSER_ENABLED=true to match the backend
default (or alternatively change the Settings class default to False if the
intended behavior is to disable it), ensuring both the example file and the
Settings default are aligned.
| if: ${{ needs.path-filter.outputs.python == 'true' && needs.set-ci-condition.outputs.should-run-tests == 'true' }} | ||
| uses: ./.github/workflows/python_test.yml | ||
| with: | ||
| ref: ${{ inputs.branch }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and inspect all referenced workflow files
echo "=== Locating workflow files ==="
find .github/workflows -name "python_test.yml" -o -name "jest_test.yml" -o -name "typescript_test.yml" -o -name "lint-py.yml" -o -name "docs_test.yml" | sort
echo ""
echo "=== python_test.yml inputs ==="
cat .github/workflows/python_test.yml 2>/dev/null | grep -A 20 "on:" | grep -A 20 "workflow_call:" || echo "File or section not found"
echo ""
echo "=== jest_test.yml inputs ==="
cat .github/workflows/jest_test.yml 2>/dev/null | grep -A 20 "on:" | grep -A 20 "workflow_call:" || echo "File or section not found"
echo ""
echo "=== typescript_test.yml inputs ==="
cat .github/workflows/typescript_test.yml 2>/dev/null | grep -A 20 "on:" | grep -A 20 "workflow_call:" || echo "File or section not found"
echo ""
echo "=== lint-py.yml inputs ==="
cat .github/workflows/lint-py.yml 2>/dev/null | grep -A 20 "on:" | grep -A 20 "workflow_call:" || echo "File or section not found"
echo ""
echo "=== docs_test.yml inputs ==="
cat .github/workflows/docs_test.yml 2>/dev/null | grep -A 20 "on:" | grep -A 20 "workflow_call:" || echo "File or section not found"Repository: langflow-ai/langflow
Length of output: 2650
Fix typescript_test.yml to accept the ref parameter.
The CI workflow passes ref: ${{ inputs.branch }} to typescript_test.yml (line 195), but typescript_test.yml does not define a ref input parameter. This will cause the parameter to be silently ignored. Add ref as an input to typescript_test.yml's workflow_call trigger to match the other test workflows (python_test.yml and jest_test.yml both define ref), or update the CI workflow to use a different parameter name if appropriate.
Also note: lint-py.yml and docs_test.yml correctly define branch as their input parameter and are called with branch:, which is consistent.
🤖 Prompt for AI Agents
In .github/workflows/ci.yml around line 172 the CI calls typescript_test.yml
with ref: ${{ inputs.branch }} but .github/workflows/typescript_test.yml does
not declare a ref input; add a ref input to typescript_test.yml's workflow_call
inputs (matching the other test workflows) so the passed value is accepted, or
alternatively change the CI call to use branch: instead of ref if you prefer to
keep typescript_test.yml unchanged — ensure the inputs names match between
caller and workflow_call.
| tag: ${{ needs.publish-main.needs.build-main.outputs.version }} | ||
| commit: ${{ github.ref }} | ||
| tag: ${{ needs.build-main.outputs.version }} | ||
| commit: ${{ inputs.release_branch }} |
There was a problem hiding this comment.
CRITICAL: commit parameter should be a commit SHA, not a branch name.
The ncipollo/release-action@v1 action expects the commit parameter to be a git commit SHA (or full commit hash). Using a branch name will either fail or create an incorrect release pointing to the wrong commit. You need to use the actual commit SHA from the release branch.
Consider using github.sha or extracting the commit from the build step output:
- commit: ${{ inputs.release_branch }}
+ commit: ${{ github.sha }}Alternatively, if you need the commit from the specific branch, extract it in a previous step:
+ - name: Get release commit SHA
+ id: get-commit
+ run: |
+ COMMIT=$(git rev-parse ${{ inputs.release_branch }})
+ echo "commit=$COMMIT" >> $GITHUB_OUTPUT
- name: Create Release
uses: ncipollo/release-action@v1
with:
artifacts: "dist/*"
token: ${{ secrets.GITHUB_TOKEN }}
draft: false
generateReleaseNotes: true
prerelease: ${{ inputs.pre_release }}
tag: ${{ needs.build-main.outputs.version }}
- commit: ${{ inputs.release_branch }}
+ commit: ${{ steps.get-commit.outputs.commit }}🤖 Prompt for AI Agents
.github/workflows/release.yml around line 346: the release action is passing a
branch name to the commit parameter which must be a commit SHA; replace the
branch input with a real commit SHA — for example set commit: ${{ github.sha }}
or add a prior step that fetches the target branch and outputs its latest commit
SHA (git fetch origin <branch> && git rev-parse origin/<branch>) and use that
step output as the commit value.
| op.execute(""" | ||
| CREATE TABLE file_new ( | ||
| id CHAR(32) NOT NULL, | ||
| user_id CHAR(32) NOT NULL, | ||
| name VARCHAR NOT NULL, | ||
| path VARCHAR NOT NULL, | ||
| size INTEGER NOT NULL, | ||
| provider VARCHAR, | ||
| created_at DATETIME NOT NULL, | ||
| updated_at DATETIME NOT NULL, | ||
| PRIMARY KEY (id), | ||
| CONSTRAINT file_name_user_id_key UNIQUE (name, user_id), | ||
| FOREIGN KEY(user_id) REFERENCES user (id) | ||
| ) | ||
| """) | ||
|
|
||
| # Copy data from old table to new table | ||
| op.execute(""" | ||
| INSERT INTO file_new (id, user_id, name, path, size, provider, created_at, updated_at) | ||
| SELECT id, user_id, name, path, size, provider, created_at, updated_at | ||
| FROM file | ||
| """) | ||
|
|
||
| # Drop old table and rename new table | ||
| op.execute("DROP TABLE file") | ||
| op.execute("ALTER TABLE file_new RENAME TO file") |
There was a problem hiding this comment.
SQLite table recreation may lose existing indexes.
When recreating the table for SQLite, any existing indexes (other than those implied by constraints) will be lost. Consider querying and recreating indexes to avoid potential performance degradation.
+ # Get existing indexes before dropping table
+ existing_indexes = inspector.get_indexes("file")
+
# Create the new table without the single UNIQUE(name) constraint
op.execute("""
CREATE TABLE file_new (
...
)
""")
# Copy data from old table to new table
op.execute("""
INSERT INTO file_new (...)
SELECT ...
FROM file
""")
# Drop old table and rename new table
op.execute("DROP TABLE file")
op.execute("ALTER TABLE file_new RENAME TO file")
+
+ # Recreate indexes
+ for idx in existing_indexes:
+ if not idx.get("unique"): # Skip unique constraints (already handled)
+ columns = ", ".join(idx["column_names"])
+ idx_name = idx["name"]
+ op.execute(f'CREATE INDEX "{idx_name}" ON file ({columns})')Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
src/backend/base/langflow/alembic/versions/d37bc4322900_drop_single_constraint_on_files_name_.py
around lines 42 to 67, the migration recreates the SQLite table but does not
preserve any non-constraint indexes; to fix, query sqlite_master (or PRAGMA
index_list) for existing indexes on "file" before dropping it, filter out
implicit sqlite_autoindex entries and constraint-backed indexes, store their
CREATE INDEX SQL (or reconstruct them from index info), then after
creating/renaming the new table execute those CREATE INDEX statements so the
same indexes are re-created on the new table; implement this
capture-and-recreate logic in the migration around the table copy/drop/rename
steps.
| except Exception as e: | ||
| print(f"Error during downgrade: {e}") | ||
| # Don't raise in downgrade - log and continue | ||
| pass |
There was a problem hiding this comment.
Silent failure in downgrade can corrupt migration state.
Swallowing exceptions during downgrade leaves Alembic's migration history inconsistent with actual database state. If the constraint restoration fails, the database won't match what Alembic expects after downgrade.
Consider re-raising the exception or at minimum logging at error level:
except Exception as e:
- print(f"Error during downgrade: {e}")
- # Don't raise in downgrade - log and continue
- pass
+ print(f"Error during downgrade: {e}")
+ raise # Let Alembic handle the failure and maintain consistent stateIf there's a specific reason to allow partial downgrades, document it explicitly and consider using proper logging instead of print.
| # --- Vision-Language Model (VLM) pipeline --- | ||
| if pipeline == "vlm": | ||
| try: | ||
| from docling.pipeline.vlm_pipeline import VlmPipeline | ||
| from docling.document_converter import PdfFormatOption # type: ignore | ||
|
|
||
| vl_pipe = VlmPipelineOptions() | ||
|
|
||
| # VLM paths generally don't need OCR; keep OCR off by default here. | ||
| fmt = {} | ||
| if hasattr(input_format, "PDF"): | ||
| fmt[getattr(input_format, "PDF")] = PdfFormatOption(pipeline_cls=VlmPipeline) | ||
| if hasattr(input_format, "IMAGE"): | ||
| fmt[getattr(input_format, "IMAGE")] = PdfFormatOption(pipeline_cls=VlmPipeline) | ||
|
|
||
| return DocumentConverter(format_options=fmt) | ||
| except Exception: | ||
| return DocumentConverter() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Docling VLM pipeline VlmPipelineOptions class documentation
💡 Result:
VlmPipelineOptions — summary of key fields and usage
-
Purpose: pipeline configuration for the VlmPipeline (vision–language model based document conversion). [1][2]
-
Main fields (selection of the most-used):
- vlm_options: InlineVlmOptions or ApiVlmOptions (local or OpenAI‑compatible remote model). [1][2]
- enable_remote_services: allow API/offload to remote inference endpoints. [2]
- force_backend_text: prefer backend/extracted PDF text over VLM output. [1][2]
- generate_page_images, generate_picture_images: whether to create full‑page and cropped picture images. [1][2]
- images_scale: float image scaling factor. [1][2]
- do_picture_classification, do_picture_description and picture_description_options. [1]
- accelerator_options (GPU/accelerator settings). [1]
- document_timeout, allow_external_plugins, artifacts_path, kind. [1]
-
Example usage: create a VlmPipelineOptions with InlineVlmOptions (local model) or with ApiVlmOptions and set enable_remote_services=True to call hosted/OpenAI‑compatible endpoints; then pass to DocumentConverter / PdfFormatOption when using VlmPipeline. [2][3]
Sources
- Docling API reference — VlmPipelineOptions attribute list. [1]
- Docling usage pages and VLM pipeline examples (inline and remote configurations). [2][3]
🏁 Script executed:
# First, let's find and examine the actual file to see the full context
find . -type f -name "file.py" -path "*/components/data/*" | head -5Repository: langflow-ai/langflow
Length of output: 115
🏁 Script executed:
# Check the file around the problematic lines
sed -n '380,420p' ./src/backend/base/langflow/components/data/file.pyRepository: langflow-ai/langflow
Length of output: 2124
🏁 Script executed:
# Check imports at the top of the file
head -50 ./src/backend/base/langflow/components/data/file.pyRepository: langflow-ai/langflow
Length of output: 1538
🏁 Script executed:
# Get the full create_converter function to see complete context
sed -n '310,430p' ./src/backend/base/langflow/components/data/file.pyRepository: langflow-ai/langflow
Length of output: 6148
Critical: Missing import for VlmPipelineOptions and unused variable assignment.
Line 399 references VlmPipelineOptions() but this class is not imported in the try block (lines 396–397 only import VlmPipeline and PdfFormatOption). This will cause a NameError at runtime. Additionally, the vl_pipe variable is assigned but never used—the VLM pipeline is configured via pipeline_cls=VlmPipeline in PdfFormatOption, not through pipeline_options.
Remove the unused line 399 entirely, as the VLM pipeline handles its own configuration automatically without requiring explicit pipeline options:
if pipeline == "vlm":
try:
from docling.pipeline.vlm_pipeline import VlmPipeline
from docling.document_converter import PdfFormatOption # type: ignore
- vl_pipe = VlmPipelineOptions()
-
# VLM paths generally don't need OCR; keep OCR off by default here.
fmt = {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # --- Vision-Language Model (VLM) pipeline --- | |
| if pipeline == "vlm": | |
| try: | |
| from docling.pipeline.vlm_pipeline import VlmPipeline | |
| from docling.document_converter import PdfFormatOption # type: ignore | |
| vl_pipe = VlmPipelineOptions() | |
| # VLM paths generally don't need OCR; keep OCR off by default here. | |
| fmt = {} | |
| if hasattr(input_format, "PDF"): | |
| fmt[getattr(input_format, "PDF")] = PdfFormatOption(pipeline_cls=VlmPipeline) | |
| if hasattr(input_format, "IMAGE"): | |
| fmt[getattr(input_format, "IMAGE")] = PdfFormatOption(pipeline_cls=VlmPipeline) | |
| return DocumentConverter(format_options=fmt) | |
| except Exception: | |
| return DocumentConverter() | |
| # --- Vision-Language Model (VLM) pipeline --- | |
| if pipeline == "vlm": | |
| try: | |
| from docling.pipeline.vlm_pipeline import VlmPipeline | |
| from docling.document_converter import PdfFormatOption # type: ignore | |
| # VLM paths generally don't need OCR; keep OCR off by default here. | |
| fmt = {} | |
| if hasattr(input_format, "PDF"): | |
| fmt[getattr(input_format, "PDF")] = PdfFormatOption(pipeline_cls=VlmPipeline) | |
| if hasattr(input_format, "IMAGE"): | |
| fmt[getattr(input_format, "IMAGE")] = PdfFormatOption(pipeline_cls=VlmPipeline) | |
| return DocumentConverter(format_options=fmt) | |
| except Exception: | |
| return DocumentConverter() |
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/data/file.py around lines 393 to 410,
the try block for the "vlm" pipeline references VlmPipelineOptions() on line 399
which is not imported and the assigned variable vl_pipe is unused; remove the
unused vl_pipe assignment (delete the VlmPipelineOptions() line) and ensure no
other references to VlmPipelineOptions remain in this block so the code relies
on passing pipeline_cls=VlmPipeline to PdfFormatOption as intended, leaving the
except branch unchanged.
| def load_files_helper(self) -> DataFrame: | ||
| result = self.load_files() | ||
|
|
||
| # Error condition - raise error if no text and an error is present | ||
| if not hasattr(result, "text"): | ||
| if hasattr(result, "error"): | ||
| raise ValueError(result.error[0]) | ||
| msg = "No content generated." | ||
| raise ValueError(msg) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Review error handling logic for DataFrame results.
The error checking at lines 615-619 uses hasattr(result, "text") and hasattr(result, "error") on a DataFrame object. DataFrames have columns (accessed via indexing) rather than direct attributes for data fields. This logic may not correctly detect errors or missing content.
Consider checking for column existence or DataFrame emptiness instead:
def load_files_helper(self) -> DataFrame:
result = self.load_files()
- # Error condition - raise error if no text and an error is present
- if not hasattr(result, "text"):
- if hasattr(result, "error"):
- raise ValueError(result.error[0])
- msg = "No content generated."
- raise ValueError(msg)
+ # Error condition - raise error if DataFrame is empty or contains errors
+ if result.empty:
+ msg = "No content generated."
+ raise ValueError(msg)
+
+ if "error" in result.columns and not result["error"].isna().all():
+ raise ValueError(result["error"].iloc[0])
return result🤖 Prompt for AI Agents
In src/backend/base/langflow/components/data/file.py around lines 611 to 621,
the code incorrectly uses hasattr(result, "text") and hasattr(result, "error")
on a pandas DataFrame; instead check DataFrame internals: first ensure result is
a DataFrame and not empty (raise ValueError("No content generated.") if
result.empty), then check for an "error" column (if "error" in result.columns
and result["error"].notna().any() raise ValueError with the first non-null error
string), otherwise check for a "text" column (if "text" not in result.columns or
result["text"].dropna().eq("").all() raise ValueError("No content generated."));
return result only when the DataFrame contains valid text rows.
| collection = chroma._client.get_collection(name=self.knowledge_base) | ||
| embeddings_result = collection.get(where={"_id": {"$in": doc_ids}}, include=["embeddings", "metadatas"]) | ||
| embeddings_result = collection.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"]) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
langchain_chroma Chroma API public methods documentation
💡 Result:
Here are official docs/API references for the langchain_chroma Chroma vector-store and its public methods:
-
LangChain Python Chroma integration (API reference page listing class and methods such as add_documents/add_vectors/get/update/upsert/delete/peek/query, from_documents/from_texts/from_existing_collection, etc.). [1]
-
LangChain docs (overview + how-to with link to API reference). [2]
-
LangChain JS/Community Chroma class reference (method index and signatures). [3]
Links:
[1] API reference — langchain_chroma.vectorstores.Chroma. [python.langchain.com API].
[2] LangChain Docs — Chroma integration (Python).
[3] LangChain.js / langchain-community — Chroma class reference.
If you want, I can fetch and paste the full list of public method names and short signatures from the API page. Which format do you prefer (plain list, code-style signatures, or downloadable link)?
🏁 Script executed:
# Let's examine the actual code to understand the context
fd "retrieval.py" src/backend/base/langflow/components/knowledge_bases/ | head -5Repository: langflow-ai/langflow
Length of output: 129
🏁 Script executed:
# Read the retrieval.py file around the flagged lines
sed -n '220,240p' src/backend/base/langflow/components/knowledge_bases/retrieval.py | cat -nRepository: langflow-ai/langflow
Length of output: 1363
🏁 Script executed:
# Let's see more context around this code to understand what's being done
sed -n '200,250p' src/backend/base/langflow/components/knowledge_bases/retrieval.py | cat -nRepository: langflow-ai/langflow
Length of output: 2694
🌐 Web query:
langchain_chroma Chroma class get method embeddings by id filter
💡 Result:
Use Chroma.get(...) and request embeddings via the include parameter. Example (Python):
from langchain.vectorstores import Chroma
assume chroma = Chroma(...) already initialized
res = chroma.get(
ids=["id1", "id2"], # or None to use where only
where={"category": "invoice"}, # optional metadata filter
include=["embeddings", "metadatas"] # ask for embeddings to be returned
)
embeddings = dict(zip(res["ids"], res["embeddings"])) # map id -> embedding
Notes:
- get returns a dict with keys "ids", "embeddings", "metadatas", "documents". IDs are always returned; include controls the other fields.
- The where filter is a Chroma/Where dict (e.g. {"color":"red"} or nested $and/$or forms).
Sources: LangChain Chroma API docs and Chroma collection reference. [1][2]
References
[1] LangChain Chroma get API.
[2] Chroma Collection get (includes include/where semantics).
🌐 Web query:
chroma python client API get_collection method
💡 Result:
The Chroma (chromadb) Python Client provides client.get_collection(...) to fetch an existing collection. Key points:
- Purpose: return a Collection object by name or id; raises ValueError if the collection does not exist. [1][2]
- Typical signature / arguments: name (or id), optional embedding_function, optional data_loader. Example:
client.get_collection("my_collection") or client.get_collection(name="my_collection", embedding_function=emb_fn). [1][3] - Alternative helper: client.get_or_create_collection(...) will create the collection if missing. [1][3]
- Usage notes: list_collections()/count_collections() are available; metadata can be provided when creating/getting via get_or_create_collection (existing metadata may be overwritten). [1][2]
Sources: Chroma cookbook/docs and PyPI package docs. [1][2][3]
References
[1] Chroma Cookbook — Collections (get_collection / get_or_create_collection).
[2] chromadb package page (PyPI) — client methods.
[3] chromadb client reference / docs and examples (docstring / community Q&A).
Replace private _client access with public Chroma.get() method.
The code accesses chroma._client.get_collection() to retrieve embeddings, but the LangChain Chroma class provides a public get() method that supports the same functionality. Replace the private API calls at lines 231-232 with:
embeddings_result = chroma.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"])This eliminates reliance on internal APIs and ensures compatibility with future langchain_chroma updates.
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/knowledge_bases/retrieval.py around
lines 231 to 232, the code is directly accessing the private
chroma._client.get_collection to fetch embeddings; replace that private API
usage with the public Chroma.get() call by removing the _client.get_collection
call and using chroma.get(where={"_id": {"$in": doc_ids}}, include=["metadatas",
"embeddings"]) so the code relies on the public LangChain Chroma API and remains
compatible with future updates.
| if result or force_output: | ||
| self.status = self.true_case_message | ||
| self.iterate_and_stop_once("false_result") | ||
| if not force_output: # Only stop the other branch if not forcing due to max iterations | ||
| self.iterate_and_stop_once("false_result") | ||
| return self.true_case_message |
There was a problem hiding this comment.
Potential issue: Missing branch stop when force_output is True.
When force_output is True, iterate_and_stop_once("false_result") is not called (line 179-180). This means the false_result branch is not stopped and not excluded.
Since false_response() will also execute (due to group_outputs=True), it will evaluate the condition independently and may also attempt to produce output or stop branches, leading to inconsistent behavior.
Should the forced output case also stop the alternate branch to maintain consistent routing semantics?
if result or force_output:
self.status = self.true_case_message
- if not force_output: # Only stop the other branch if not forcing due to max iterations
- self.iterate_and_stop_once("false_result")
+ self.iterate_and_stop_once("false_result")
return self.true_case_messageOr if the intent is to avoid iteration increment during force_output, consider a flag parameter to iterate_and_stop_once to skip the iteration update while still stopping the branch.
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/logic/conditional_router.py around lines
177 to 181, the code currently skips calling
iterate_and_stop_once("false_result") when force_output is True which leaves the
false branch running; call iterate_and_stop_once("false_result") in the
force_output path as well to ensure the alternate branch is stopped, or modify
iterate_and_stop_once to accept a flag (e.g., skip_iteration=True) and call
iterate_and_stop_once("false_result", skip_iteration=True) when force_output is
True so the branch is stopped without incrementing iteration counters.
|
|
||
| class SubFlowComponent(Component): | ||
| display_name = "Sub Flow [Deprecated]" | ||
| display_name = "Sub Flow" |
There was a problem hiding this comment.
Inconsistency: Display name no longer indicates deprecation while legacy flag remains true.
Removing "[Deprecated]" from the display name while keeping legacy: bool = True creates a disconnect between the internal status and what users see. Users won't realize this component is deprecated and that they should migrate to the replacement. This could hinder adoption of the logic.RunFlow component.
Consider either:
- Keeping "[Deprecated]" in the display name for clarity, or
- Adding a user-facing deprecation warning/notice that references the
replacementattribute, or - If the component is no longer deprecated, remove the
legacy: bool = Trueflag
Also applies to: 18-19
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/logic/sub_flow.py around lines 15 and
18-19, the display_name no longer indicates deprecation while legacy: bool =
True remains, causing a mismatch; update the code so UI and internal state match
by either (A) restoring "[Deprecated]" in display_name (e.g., "Sub Flow
[Deprecated]") and optionally emitting a user-facing deprecation warning
referencing the replacement, or (B) if the component is no longer deprecated,
remove or set legacy: bool = False and clear any deprecation messaging; pick one
of these two consistent actions and apply it across the mentioned lines.
|
@coderabbitai can you refactor the description? |
|
Tip For best results, initiate chat on the files or code changes. @HimavarshaVS I'd be happy to help refactor a description! However, I need a bit more context to provide the most useful response. Could you please clarify which description you'd like me to refactor? Here are some possibilities:
Please let me know which description you're referring to, and I'll be glad to help refactor it! If you can point me to the specific file or context, that would be even more helpful. |
I meant the PR description |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (8.83%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.6.0 #10946 +/- ##
=================================================
- Coverage 35.74% 32.68% -3.07%
=================================================
Files 1220 1220
Lines 58150 58212 +62
Branches 5530 5530
=================================================
- Hits 20787 19027 -1760
- Misses 37202 39024 +1822
Partials 161 161
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
This backend test is broken (not because of the PR) FAILED src/backend/tests/unit/components/processing/test_split_text_component.py::TestSplitTextComponent::test_with_url_loader - AssertionError: Expected DataFrame with more than 2 rows, got 2I believe @erichare fixed this test in main. As for the frontend test I am not sure, again seems like something is broken in the test suite. Really confused by @coderabbitai, it looks like you only made a couple of small changes in chat/chat_output.py |
|
Tip For best results, initiate chat on the files or code changes. @HzaRashid, thank you for the context on the test failures! Regarding the backend test failure in About the chat component changes - I can see why there might be some confusion! While the changes may look small at first glance, there are actually several significant modifications to both Changes made to both components:
So while the diff might appear compact in some views, these changes represent a meaningful refactoring that:
The changes align with the PR's goal to "fix: update chat components" by cleaning up and modernizing the chat component architecture. Does this help clarify the scope of the modifications? |
HzaRashid
left a comment
There was a problem hiding this comment.
as long as we can get ci passing and coderabbit calms down lgtm haha
|
closing this as the fix is already on 1.7 |
Summary by CodeRabbit
Release Notes
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.