Skip to content

feat: add PolarisAIDataInsight component for document parsing and extraction#10238

Open
ernie-c-jeong wants to merge 30 commits into
langflow-ai:mainfrom
ai-solution-dev:polarisai/datainsight-integration
Open

feat: add PolarisAIDataInsight component for document parsing and extraction#10238
ernie-c-jeong wants to merge 30 commits into
langflow-ai:mainfrom
ai-solution-dev:polarisai/datainsight-integration

Conversation

@ernie-c-jeong
Copy link
Copy Markdown

@ernie-c-jeong ernie-c-jeong commented Oct 13, 2025

Summary

  • Added new PolarisAIDataInsight component to integrate Polaris AI DataInsight with Langflow
  • Provides structured extraction of text, images, complex tables, and charts from multiple file formats
  • Returns parsed content as structured JSON that can be consumed as a DataFrame

Features

  • Document Parsing: Extracts text, images, tables, and charts from supported file types
  • Structured Output: Outputs data as structured JSON for downstream processing
  • Resource Management: Saves embedded images to a specified resources directory
  • Secure Access: Supports authentication via Polaris AI DataInsight API key

Changes

  • Added PolarisAIDataInsight component using PolarisAIDataInsightLoader from LangChain
  • Implemented parameters: file_path, api_key, resources_dir
  • Updated documentation with usage details and parameter descriptions

Use Cases

  • Enterprise document analysis pipelines
  • Automatic extraction of structured data from office files
  • Preprocessing documents for downstream RAG or analytics workflows

Summary by CodeRabbit

  • New Features
    • Added a Polaris AI bundle to the sidebar with a new Polaris Office icon.
    • Introduced a Polaris AI Data Insight component to extract structured data from documents, with inputs for file selection, API key, and resources directory.
  • Documentation
    • Added a dedicated page detailing Polaris AI Data Insight usage, returned data format, and parameter descriptions, with links to external references.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 13, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds Polaris AI DataInsight integration: new Python component with lazy import; updates dependencies; adds docs; introduces Polaris Office icon and registers it in eager/lazy mappings; updates sidebar bundles and icon maps.

Changes

Cohort / File(s) Summary
Documentation: Polaris AI DataInsight
docs/docs/Components/bundles-polarisai.mdx
New docs page detailing PolarisAIDataInsightLoader usage, returned data, and parameters; imports shared Parameters partial.
Dependency Update
pyproject.toml
Adds dependency: langchain-polaris-ai-datainsight>=1.0.0.
Frontend Icon: Polaris Office (SVG + wrapper + mappings)
src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx, src/frontend/src/icons/PolarisOffice/index.tsx, src/frontend/src/icons/eagerIconImports.ts, src/frontend/src/icons/lazyIconImports.ts
Adds SVGPolarisOfficeLogo component; exports PolarisOfficeIcon via forwardRef; registers icon in eager and lazy icon maps.
Frontend Sidebar/Icons Map
src/frontend/src/utils/styleUtils.ts
Adds Polaris AI bundle to SIDEBAR_BUNDLES; maps PolarisOffice to display icon.
Backend Component Lazy Export
src/lfx/src/lfx/components/polarisai/__init__.py
Implements lazy import for PolarisAIDataInsightComponent via getattr; updates all and dir; TYPE_CHECKING guard.
Backend Component: PolarisAI DataInsight
src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py
Introduces PolarisAIDataInsightComponent with inputs (file_path, api_key, resources_dir), output (data); validates inputs; loads first document via PolarisAIDataInsightLoader and returns Data.from_document.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant UI as Frontend UI
  participant Icons as Icon Registry
  participant Py as lfx.components.polarisai
  participant Comp as PolarisAIDataInsightComponent
  participant Loader as PolarisAIDataInsightLoader

  User->>UI: Select "Polaris AI" bundle
  UI->>Icons: Resolve "PolarisOffice" icon
  alt Eager mapping
    Icons->>UI: PolarisOfficeIcon
  else Lazy mapping
    Icons->>Icons: dynamic import("./PolarisOffice")
    Icons->>UI: PolarisOfficeIcon
  end

  User->>Py: Use PolarisAIDataInsightComponent
  note over Py: __getattr__ lazy-imports component
  Py->>Comp: Instantiate with inputs (file_path, api_key, resources_dir)
  Comp->>Comp: validate required inputs
  Comp->>Loader: new Loader(file_path, api_key, resources_dir, mode="single")
  Loader-->>Comp: documents[0]
  Comp->>Py: Data.from_document(doc)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

size:M, lgtm

Suggested reviewers

  • ogabrielluiz
  • mfortman11
  • mendonk

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the main feature added in this changeset: the PolarisAIDataInsight component for document parsing and extraction. It aligns with the PR objectives by highlighting the introduction of this new component and its purpose. The phrasing is clear, concise, and free of extraneous details.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 13, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f24a064 and 4f1901d.

⛔ Files ignored due to path filters (1)
  • src/frontend/src/icons/PolarisOffice/PolarisOffice-logo.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • docs/docs/Components/bundles-polarisai.mdx (1 hunks)
  • pyproject.toml (1 hunks)
  • src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx (1 hunks)
  • src/frontend/src/icons/PolarisOffice/index.tsx (1 hunks)
  • src/frontend/src/icons/eagerIconImports.ts (2 hunks)
  • src/frontend/src/icons/lazyIconImports.ts (1 hunks)
  • src/frontend/src/utils/styleUtils.ts (2 hunks)
  • src/lfx/src/lfx/components/polarisai/__init__.py (1 hunks)
  • src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
src/frontend/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{ts,tsx,js,jsx}: All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.
Use React 18 with TypeScript for all UI components in the frontend.
Format all TypeScript and JavaScript code using the make format_frontend command.
Lint all TypeScript and JavaScript code using the make lint command.

Files:

  • src/frontend/src/icons/lazyIconImports.ts
  • src/frontend/src/icons/eagerIconImports.ts
  • src/frontend/src/icons/PolarisOffice/index.tsx
  • src/frontend/src/utils/styleUtils.ts
  • src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx
src/frontend/src/icons/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

Use Lucide React for icons in the frontend.

Files:

  • src/frontend/src/icons/lazyIconImports.ts
  • src/frontend/src/icons/eagerIconImports.ts
  • src/frontend/src/icons/PolarisOffice/index.tsx
  • src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx
src/frontend/src/icons/lazyIconImports.ts

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

Add your icon to the lazyIconsMapping object in src/frontend/src/icons/lazyIconImports.ts with a key that matches the backend icon string exactly.

Files:

  • src/frontend/src/icons/lazyIconImports.ts
src/frontend/src/icons/*/*.@(js|jsx|ts|tsx)

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

Create a new directory for your icon in src/frontend/src/icons/YourIconName/ and add your SVG as a React component (e.g., YourIconName.jsx). The SVG component must use the isDark prop to support both light and dark mode.

Files:

  • src/frontend/src/icons/PolarisOffice/index.tsx
  • src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx
src/frontend/src/icons/*/index.tsx

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

Create an index.tsx in your icon directory that exports your icon using forwardRef and passes the isDark prop.

Files:

  • src/frontend/src/icons/PolarisOffice/index.tsx
src/frontend/src/utils/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

All utility functions should be placed in the utils directory.

Files:

  • src/frontend/src/utils/styleUtils.ts
docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)

docs/**/*.{md,mdx}: All Markdown/MDX pages must start with front matter including at least title and description; include sidebar_position for docs pages when applicable
Code blocks must specify a language and may include a title (```lang title="…")
Use sentence case for headings and keep paragraphs short and scannable
Write in second person, present tense, with a professional but approachable tone
Use inline code with backticks for code terms; use bold for UI elements and italics for emphasis; keep lists in parallel structure
Ensure internal links are functional and navigation works (update cross-references as needed)
Verify all code examples in docs and blog actually run as shown
Use correct terminology capitalization: Langflow, Component, Flow, API, JSON
Reference images with absolute paths under /img/... and provide descriptive alt text

Files:

  • docs/docs/Components/bundles-polarisai.mdx
docs/docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)

Use Docusaurus admonitions (:::+tip|warning|danger) instead of custom callouts in docs pages

Files:

  • docs/docs/Components/bundles-polarisai.mdx
🧠 Learnings (5)
📚 Learning: 2025-07-28T15:56:47.865Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-07-28T15:56:47.865Z
Learning: Applies to src/frontend/src/icons/lazyIconImports.ts : Add your icon to the `lazyIconsMapping` object in `src/frontend/src/icons/lazyIconImports.ts` with a key that matches the backend icon string exactly.

Applied to files:

  • src/frontend/src/icons/lazyIconImports.ts
  • src/frontend/src/icons/eagerIconImports.ts
📚 Learning: 2025-06-23T12:46:52.420Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-06-23T12:46:52.420Z
Learning: Export custom icon components in React using React.forwardRef to ensure proper ref forwarding and compatibility with parent components.

Applied to files:

  • src/frontend/src/icons/PolarisOffice/index.tsx
📚 Learning: 2025-07-28T15:56:47.865Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-07-28T15:56:47.865Z
Learning: Applies to src/frontend/src/icons/*/index.tsx : Create an `index.tsx` in your icon directory that exports your icon using `forwardRef` and passes the `isDark` prop.

Applied to files:

  • src/frontend/src/icons/PolarisOffice/index.tsx
📚 Learning: 2025-06-16T11:14:04.200Z
Learnt from: dolfim-ibm
PR: langflow-ai/langflow#8394
File: src/frontend/src/icons/Docling/index.tsx:4-6
Timestamp: 2025-06-16T11:14:04.200Z
Learning: The Langflow codebase consistently uses `React.PropsWithChildren<{}>` as the prop type for all icon components using forwardRef, rather than `React.SVGProps<SVGSVGElement>`. This is an established pattern across hundreds of icon files in src/frontend/src/icons/.

Applied to files:

  • src/frontend/src/icons/PolarisOffice/index.tsx
📚 Learning: 2025-07-28T15:56:47.865Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-07-28T15:56:47.865Z
Learning: Always use clear, recognizable, and consistent icon names for both backend and frontend (e.g., "AstraDB", "Postgres", "OpenAI").

Applied to files:

  • src/frontend/src/utils/styleUtils.ts
🧬 Code graph analysis (3)
src/frontend/src/icons/eagerIconImports.ts (1)
src/frontend/src/icons/PolarisOffice/index.tsx (1)
  • PolarisOfficeIcon (4-9)
src/lfx/src/lfx/components/polarisai/__init__.py (1)
src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py (1)
  • PolarisAIDataInsightComponent (9-65)
src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py (1)
src/lfx/src/lfx/custom/custom_component/component.py (1)
  • log (1475-1492)
🔇 Additional comments (7)
pyproject.toml (1)

132-132: LGTM!

The dependency addition is appropriate for the new Polaris AI DataInsight integration. The version constraint >=1.0.0 is reasonable.

src/frontend/src/icons/eagerIconImports.ts (1)

116-116: LGTM!

The PolarisOffice icon import and mapping follow the established patterns in the codebase. The alphabetical ordering is maintained correctly.

Also applies to: 202-202

src/frontend/src/utils/styleUtils.ts (1)

308-308: LGTM!

The Polaris AI sidebar bundle and icon mapping additions are consistent with existing patterns and maintain proper alphabetical ordering.

Also applies to: 461-461

src/frontend/src/icons/lazyIconImports.ts (1)

271-274: LGTM!

The lazy loading mapping for PolarisOffice follows the established pattern and maintains alphabetical ordering.

docs/docs/Components/bundles-polarisai.mdx (1)

1-27: LGTM!

The documentation is well-structured with proper front matter, clear descriptions, and a comprehensive parameter table. The external documentation links provide helpful references for users.

src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py (1)

48-65: Verify exception handling in upstream code.

Ensure that any exceptions raised by PolarisAIDataInsightLoader (e.g., file not found, API errors, network issues) are properly handled either in this method or by the component framework. Consider wrapping the loader initialization and load call in a try-except block for better error reporting.

src/lfx/src/lfx/components/polarisai/__init__.py (1)

1-34: LGTM! Clean lazy import implementation.

The lazy import pattern is correctly implemented following PEP 562. Key strengths:

  • TYPE_CHECKING guard prevents runtime overhead while enabling static analysis
  • Comprehensive error handling converts import failures to appropriate AttributeError messages
  • Caching in globals() prevents redundant imports
  • __dir__() supports introspection and IDE autocompletion

Comment thread src/frontend/src/icons/PolarisOffice/index.tsx
Comment thread src/frontend/src/icons/PolarisOffice/PolarisOfficeLogo.jsx Outdated
Comment thread src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py Outdated
Comment thread src/lfx/src/lfx/components/polarisai/polaris_ai_data_insight.py Outdated
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 13, 2025
- Enhance error handling in PolarisAIDataInsightComponent
- Update isDark prop in SVGPolarisOfficeLogo
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 13, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 13, 2025
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Component index needs to be updated

Please run the following command locally and commit the changes:

make build_component_index

Or alternatively:

LFX_DEV=1 uv run python scripts/build_component_index.py

Then commit and push the updated src/lfx/src/lfx/_assets/component_index.json file.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Component index needs to be updated

Please run the following command locally and commit the changes:

make build_component_index

Or alternatively:

LFX_DEV=1 uv run python scripts/build_component_index.py

Then commit and push the updated src/lfx/src/lfx/_assets/component_index.json file.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Component index needs to be updated

Please run the following command locally and commit the changes:

make build_component_index

Or alternatively:

LFX_DEV=1 uv run python scripts/build_component_index.py

Then commit and push the updated src/lfx/src/lfx/_assets/component_index.json file.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@sonarqubecloud
Copy link
Copy Markdown

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 13, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 13, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 17, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 19, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 20, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 20, 2025
@ernie-c-jeong
Copy link
Copy Markdown
Author

Hi, @ogabrielluiz @mfortman11 @mendonk
I wanted to kindly follow up on this PR.

It has been about a month since I submitted it, and I haven’t received any additional feedback yet. I noticed that the 'Update Component Index' workflow is failing, but it also fails even when I pull the latest changes from main or rebuild the component_index.json. I also found other merged PRs where the same workflow failed, which suggests this may not be a blocking issue.

If there are any steps I've missed or any issue that prevents this PR from being reviewed, I would really appreciate your feedback. I’m happy to make any updates needed.

Thank you!

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 24, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 26, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant