Skip to content

feat: Consolidate Web Search, News Search, and RSS Reader into unified component#9975

Merged
erichare merged 5 commits into
mainfrom
feat/unified-web-search
Sep 30, 2025
Merged

feat: Consolidate Web Search, News Search, and RSS Reader into unified component#9975
erichare merged 5 commits into
mainfrom
feat/unified-web-search

Conversation

@rodrigosnader
Copy link
Copy Markdown
Contributor

@rodrigosnader rodrigosnader commented Sep 25, 2025

Summary

  • Merged three separate components (Web Search, News Search, RSS Reader) into a single unified Web Search component
  • Added tab-based mode selection for intuitive switching between Web, News, and RSS functionality
  • Reduced code duplication and maintenance overhead

Changes

  • Unified Web Search Component: Single component with three modes:
    • Web mode (default): DuckDuckGo search functionality
    • News mode: Google News RSS feed search with topic/location support
    • RSS mode: Generic RSS feed reader
  • Removed obsolete components: Deleted separate News Search and RSS Reader components
  • Comprehensive test coverage: Added 25 tests covering all three modes and edge cases

Benefits

  • Improved user experience: Single component for all search-related tasks
  • Reduced complexity: Less code to maintain, fewer components to manage
  • Consistent interface: Unified API and behavior across all search modes
  • Better discoverability: Users can easily find all search options in one place

Test Results

All 25 tests passing:

  • URL validation and sanitization
  • Web search with DuckDuckGo
  • News search with Google News RSS
  • RSS feed parsing
  • Error handling for network issues, invalid URLs, empty results
  • Mode switching and configuration updates

Breaking Changes

  • NewsSearchComponent removed - use WebSearchComponent with search_mode="News"
  • RSSReaderComponent removed - use WebSearchComponent with search_mode="RSS"

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Unified Web Search component with selectable modes: Web, News, and RSS.
    • Added mode-specific inputs (topic, location, language/region settings) and timeout.
    • Improved URL validation/normalization and HTML cleaning for clearer results.
    • Output label updated to “Results.”
  • Refactor

    • Consolidated functionality; removed separate News and RSS components from the catalog.
  • Tests

    • Added comprehensive tests for mode routing, URL handling, input sanitization, HTML cleaning, and success/error/empty-result scenarios for Web, News, and RSS searches.

…d component

Merge three separate components into a single Web Search component with tab-based mode selection:
- Web mode: DuckDuckGo search functionality (default)
- News mode: Google News RSS feed search with topic/location support
- RSS mode: Generic RSS feed reader

This reduces code duplication and provides a more intuitive user experience with a single component that handles all search-related functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 25, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Removes NewsSearch and RSS components and their tests; updates public exports accordingly. Introduces a unified WebSearchComponent supporting Web, News, and RSS modes with routing, URL handling, sanitization, and RSS parsing. Adds comprehensive tests for the new unified component and its behaviors.

Changes

Cohort / File(s) Summary
Component removals (News/RSS)
src/lfx/src/lfx/components/data/news_search.py, src/lfx/src/lfx/components/data/rss.py, src/lfx/src/lfx/components/data/__init__.py
Deleted NewsSearchComponent and RSSReaderComponent; removed their exports and dynamic imports from __init__.py.
Unified WebSearch enhancement
src/lfx/src/lfx/components/data/web_search.py
Expanded WebSearchComponent to support Web/News/RSS modes; added routing (perform_search), helpers (update_build_config, URL validation/normalization, query sanitization, HTML cleaning), and per-mode handlers with HTTP/RSS parsing and error handling; updated outputs label.
Removed legacy tests
src/backend/tests/unit/components/data/test_news_search.py, src/backend/tests/unit/components/data/test_rss.py
Deleted unit tests covering the removed News and RSS components.
Added/expanded unified tests
src/backend/tests/unit/components/data/test_web_search.py
Added extensive tests for URL handling, sanitization, build-config updates, and per-mode behaviors (Web/News/RSS), including success, empty, and error scenarios, with HTTP mocking.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant W as WebSearchComponent
  participant R as Router (perform_search)
  participant H as HTTP Client
  participant P as Parser (HTML/XML)
  participant DF as DataFrame

  User->>W: Trigger results
  W->>R: perform_search(mode, inputs)
  alt Mode: Web
    R->>W: perform_web_search()
    W->>H: GET DuckDuckGo results
    H-->>W: HTML
    W->>P: Parse/clean links & snippets
    P-->>W: Items
    W->>H: GET each result (optional)
    H-->>W: Content/Errors
    W->>DF: Build results
    DF-->>User: DataFrame
  else Mode: News
    R->>W: perform_news_search()
    W->>H: GET Google News RSS
    H-->>W: XML or Error
    W->>P: Parse RSS items
    P-->>W: Articles or Empty/Error
    W->>DF: Build results
    DF-->>User: DataFrame
  else Mode: RSS
    R->>W: perform_rss_read()
    W->>H: GET RSS URL
    H-->>W: XML or Error
    W->>P: Validate/parse
    P-->>W: Items or Empty/Error
    W->>DF: Build results
    DF-->>User: DataFrame
  end
  note over W,DF: Errors produce consistent DataFrame rows with status messages.
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

size:M

Suggested reviewers

  • edwinjosechittilappilly
  • Yukiyukiyeah

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The provided title succinctly conveys the core change of consolidating Web Search, News Search, and RSS Reader into a unified component, mirroring the removal of separate modules and introduction of mode selection while avoiding unnecessary detail. It clearly aligns with the PR objectives and major code modifications.
Docstring Coverage ✅ Passed Docstring coverage is 97.22% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 25, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 25, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lfx/src/lfx/components/data/web_search.py (1)

191-205: Prevent invalid result URLs from crashing the web search.

self.ensure_url raises ValueError for empty/relative/unsupported URLs, and DuckDuckGo occasionally emits such links (e.g., missing uddg parameter). Because we only catch requests.RequestException, a single bad result aborts the whole search with an uncaught exception. While fixing that, please close the missing parenthesis in the fallback message.

-                try:
-                    final_url = self.ensure_url(decoded_link)
-                    page = requests.get(final_url, headers=headers, timeout=self.timeout)
-                    page.raise_for_status()
-                    content = BeautifulSoup(page.text, "lxml").get_text(separator=" ", strip=True)
-                except requests.RequestException as e:
-                    final_url = decoded_link
-                    content = f"(Failed to fetch: {e!s}"
+                try:
+                    final_url = self.ensure_url(decoded_link)
+                    page = requests.get(final_url, headers=headers, timeout=self.timeout)
+                    page.raise_for_status()
+                    content = BeautifulSoup(page.text, "lxml").get_text(separator=" ", strip=True)
+                except (ValueError, requests.RequestException) as e:
+                    final_url = decoded_link
+                    content = f"(Failed to fetch: {e!s})"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9cf4c and 916bab1.

📒 Files selected for processing (7)
  • src/backend/tests/unit/components/data/test_news_search.py (0 hunks)
  • src/backend/tests/unit/components/data/test_rss.py (0 hunks)
  • src/backend/tests/unit/components/data/test_web_search.py (3 hunks)
  • src/lfx/src/lfx/components/data/__init__.py (0 hunks)
  • src/lfx/src/lfx/components/data/news_search.py (0 hunks)
  • src/lfx/src/lfx/components/data/rss.py (0 hunks)
  • src/lfx/src/lfx/components/data/web_search.py (4 hunks)
💤 Files with no reviewable changes (5)
  • src/lfx/src/lfx/components/data/news_search.py
  • src/backend/tests/unit/components/data/test_news_search.py
  • src/lfx/src/lfx/components/data/init.py
  • src/lfx/src/lfx/components/data/rss.py
  • src/backend/tests/unit/components/data/test_rss.py
🧰 Additional context used
📓 Path-based instructions (5)
src/backend/tests/unit/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components

Files:

  • src/backend/tests/unit/components/data/test_web_search.py
{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

  • src/backend/tests/unit/components/data/test_web_search.py
src/backend/tests/unit/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Test component integration within flows using create_flow, build_flow, and get_build_events utilities

Files:

  • src/backend/tests/unit/components/data/test_web_search.py
src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...

Files:

  • src/backend/tests/unit/components/data/test_web_search.py
src/backend/**/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

  • src/backend/tests/unit/components/data/test_web_search.py
🧠 Learnings (1)
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Create comprehensive unit tests for all new components

Applied to files:

  • src/backend/tests/unit/components/data/test_web_search.py
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/data/web_search.py (1)
src/lfx/src/lfx/custom/custom_component/component.py (1)
  • log (1475-1492)
src/backend/tests/unit/components/data/test_web_search.py (1)
src/lfx/src/lfx/components/data/web_search.py (9)
  • ensure_url (137-144)
  • validate_url (129-135)
  • _sanitize_query (146-148)
  • clean_html (150-152)
  • update_build_config (106-127)
  • perform_web_search (154-208)
  • perform_news_search (210-268)
  • perform_rss_read (270-309)
  • perform_search (311-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
  • GitHub Check: Lint Backend / Run Mypy (3.10)
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
  • GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
  • GitHub Check: Lint Backend / Run Mypy (3.12)
  • GitHub Check: Lint Backend / Run Mypy (3.13)
  • GitHub Check: Lint Backend / Run Mypy (3.11)
  • GitHub Check: Test Starter Templates
  • GitHub Check: Update Starter Projects
  • GitHub Check: Ruff Style Check (3.13)

Comment on lines 31 to 40
async def test_invalid_url_handling(self):
# Create a test instance of the component
"""Test invalid URL handling."""
component = WebSearchComponent()

# Set an invalid URL
# Test invalid URL
invalid_url = "htp://invalid-url"

# Ensure the URL is invalid
with pytest.raises(ValueError, match="Invalid URL"):
component.ensure_url(invalid_url)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the unnecessary async declaration from this test

Declaring the test as async def without decorating it with @pytest.mark.asyncio makes pytest collect it as a coroutine and it never runs (or outright errors, depending on plugin config). This breaks coverage of the URL validation path. Convert it back to a plain synchronous test.

-    async def test_invalid_url_handling(self):
+    def test_invalid_url_handling(self):
         """Test invalid URL handling."""
         component = WebSearchComponent()
 
         # Test invalid URL
         invalid_url = "htp://invalid-url"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_invalid_url_handling(self):
# Create a test instance of the component
"""Test invalid URL handling."""
component = WebSearchComponent()
# Set an invalid URL
# Test invalid URL
invalid_url = "htp://invalid-url"
# Ensure the URL is invalid
with pytest.raises(ValueError, match="Invalid URL"):
component.ensure_url(invalid_url)
def test_invalid_url_handling(self):
"""Test invalid URL handling."""
component = WebSearchComponent()
# Test invalid URL
invalid_url = "htp://invalid-url"
# Ensure the URL is invalid
with pytest.raises(ValueError, match="Invalid URL"):
component.ensure_url(invalid_url)
🤖 Prompt for AI Agents
In src/backend/tests/unit/components/data/test_web_search.py around lines 31 to
40, the test is declared as async which causes pytest to collect it as a
coroutine and not execute it; change the test signature from async def
test_invalid_url_handling(self): to a regular def
test_invalid_url_handling(self): (remove the async keyword) so pytest runs it
synchronously, and do not add @pytest.mark.asyncio since the test body is
synchronous — keep the rest of the test (creating WebSearchComponent, setting
invalid_url, and asserting pytest.raises(ValueError, match="Invalid URL") on
component.ensure_url(invalid_url)) unchanged.

Comment on lines +218 to +236
ceid = f"{gl}:{hl.split('-')[0]}"

# Build RSS URL based on parameters
if topic:
# Topic-based feed
base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif location:
# Location-based feed
base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif query:
# Keyword search feed
base_url = "https://news.google.com/rss/search?q="
query_encoded = quote_plus(query)
params = f"&hl={hl}&gl={gl}&ceid={ceid}"
rss_url = f"{base_url}{query_encoded}{params}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Honor the user-provided ceid parameter for News mode.

We surface ceid as an advanced input, but this code recomputes it from gl/hl and ignores whatever the user set (e.g., ceid="BR:pt" keeps resolving to US:en). That breaks locale targeting parity with the retired NewsSearch component. Please respect the supplied value while retaining the current fallback for users who leave it blank.

-        ceid = f"{gl}:{hl.split('-')[0]}"
+        ceid_default = f"{gl}:{hl.split('-')[0]}"
+        ceid_value = getattr(self, "ceid", "") or ""
+        ceid = ceid_value.strip() or ceid_default

(If we want the auto-derived value to stay in sync when gl/hl change, consider making the input default empty so ceid_value only overrides when the user explicitly sets it.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ceid = f"{gl}:{hl.split('-')[0]}"
# Build RSS URL based on parameters
if topic:
# Topic-based feed
base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif location:
# Location-based feed
base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif query:
# Keyword search feed
base_url = "https://news.google.com/rss/search?q="
query_encoded = quote_plus(query)
params = f"&hl={hl}&gl={gl}&ceid={ceid}"
rss_url = f"{base_url}{query_encoded}{params}"
ceid_default = f"{gl}:{hl.split('-')[0]}"
ceid_value = getattr(self, "ceid", "") or ""
ceid = ceid_value.strip() or ceid_default
# Build RSS URL based on parameters
if topic:
# Topic-based feed
base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif location:
# Location-based feed
base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}"
params = f"?hl={hl}&gl={gl}&ceid={ceid}"
rss_url = base_url + params
elif query:
# Keyword search feed
base_url = "https://news.google.com/rss/search?q="
query_encoded = quote_plus(query)
params = f"&hl={hl}&gl={gl}&ceid={ceid}"
rss_url = f"{base_url}{query_encoded}{params}"
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/data/web_search.py around lines 218 to 236, the
code always recomputes ceid from gl/hl and ignores a user-supplied ceid; change
it to honor the user-provided ceid when present and only compute the fallback
from gl/hl when ceid is empty/None. Concretely, read the incoming ceid parameter
into a local ceid_value (or reuse the param), set ceid = ceid_value if truthy
else f"{gl}:{hl.split('-')[0]}", and then use that ceid in the existing URL
construction paths so user-specified locale targeting is respected while
preserving the existing fallback behavior.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 24.10%. Comparing base (0d3c7d9) to head (a8db154).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (46.96%) is below the target coverage (55.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9975      +/-   ##
==========================================
+ Coverage   23.97%   24.10%   +0.13%     
==========================================
  Files        1091     1091              
  Lines       40014    40014              
  Branches     5543     5543              
==========================================
+ Hits         9594     9647      +53     
+ Misses      30249    30196      -53     
  Partials      171      171              
Flag Coverage Δ
backend 46.96% <ø> (+0.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@erichare erichare assigned erichare and unassigned erichare Sep 30, 2025
@erichare erichare self-requested a review September 30, 2025 21:37
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@sonarqubecloud
Copy link
Copy Markdown

@erichare erichare enabled auto-merge September 30, 2025 21:51
@erichare erichare added this pull request to the merge queue Sep 30, 2025
Merged via the queue into main with commit 181237f Sep 30, 2025
32 of 33 checks passed
@erichare erichare deleted the feat/unified-web-search branch September 30, 2025 22:32
from lfx.schema import DataFrame


class NewsSearchComponent(Component):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we safely remove instead of deprecating?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants