feat: Consolidate Web Search, News Search, and RSS Reader into unified component#9975
Conversation
…d component Merge three separate components into a single Web Search component with tab-based mode selection: - Web mode: DuckDuckGo search functionality (default) - News mode: Google News RSS feed search with topic/location support - RSS mode: Generic RSS feed reader This reduces code duplication and provides a more intuitive user experience with a single component that handles all search-related functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughRemoves NewsSearch and RSS components and their tests; updates public exports accordingly. Introduces a unified WebSearchComponent supporting Web, News, and RSS modes with routing, URL handling, sanitization, and RSS parsing. Adds comprehensive tests for the new unified component and its behaviors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant W as WebSearchComponent
participant R as Router (perform_search)
participant H as HTTP Client
participant P as Parser (HTML/XML)
participant DF as DataFrame
User->>W: Trigger results
W->>R: perform_search(mode, inputs)
alt Mode: Web
R->>W: perform_web_search()
W->>H: GET DuckDuckGo results
H-->>W: HTML
W->>P: Parse/clean links & snippets
P-->>W: Items
W->>H: GET each result (optional)
H-->>W: Content/Errors
W->>DF: Build results
DF-->>User: DataFrame
else Mode: News
R->>W: perform_news_search()
W->>H: GET Google News RSS
H-->>W: XML or Error
W->>P: Parse RSS items
P-->>W: Articles or Empty/Error
W->>DF: Build results
DF-->>User: DataFrame
else Mode: RSS
R->>W: perform_rss_read()
W->>H: GET RSS URL
H-->>W: XML or Error
W->>P: Validate/parse
P-->>W: Items or Empty/Error
W->>DF: Build results
DF-->>User: DataFrame
end
note over W,DF: Errors produce consistent DataFrame rows with status messages.
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lfx/src/lfx/components/data/web_search.py (1)
191-205: Prevent invalid result URLs from crashing the web search.
self.ensure_urlraisesValueErrorfor empty/relative/unsupported URLs, and DuckDuckGo occasionally emits such links (e.g., missinguddgparameter). Because we only catchrequests.RequestException, a single bad result aborts the whole search with an uncaught exception. While fixing that, please close the missing parenthesis in the fallback message.- try: - final_url = self.ensure_url(decoded_link) - page = requests.get(final_url, headers=headers, timeout=self.timeout) - page.raise_for_status() - content = BeautifulSoup(page.text, "lxml").get_text(separator=" ", strip=True) - except requests.RequestException as e: - final_url = decoded_link - content = f"(Failed to fetch: {e!s}" + try: + final_url = self.ensure_url(decoded_link) + page = requests.get(final_url, headers=headers, timeout=self.timeout) + page.raise_for_status() + content = BeautifulSoup(page.text, "lxml").get_text(separator=" ", strip=True) + except (ValueError, requests.RequestException) as e: + final_url = decoded_link + content = f"(Failed to fetch: {e!s})"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/backend/tests/unit/components/data/test_news_search.py(0 hunks)src/backend/tests/unit/components/data/test_rss.py(0 hunks)src/backend/tests/unit/components/data/test_web_search.py(3 hunks)src/lfx/src/lfx/components/data/__init__.py(0 hunks)src/lfx/src/lfx/components/data/news_search.py(0 hunks)src/lfx/src/lfx/components/data/rss.py(0 hunks)src/lfx/src/lfx/components/data/web_search.py(4 hunks)
💤 Files with no reviewable changes (5)
- src/lfx/src/lfx/components/data/news_search.py
- src/backend/tests/unit/components/data/test_news_search.py
- src/lfx/src/lfx/components/data/init.py
- src/lfx/src/lfx/components/data/rss.py
- src/backend/tests/unit/components/data/test_rss.py
🧰 Additional context used
📓 Path-based instructions (5)
src/backend/tests/unit/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components
Files:
src/backend/tests/unit/components/data/test_web_search.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/tests/unit/components/data/test_web_search.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/components/data/test_web_search.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/components/data/test_web_search.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/data/test_web_search.py
🧠 Learnings (1)
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Create comprehensive unit tests for all new components
Applied to files:
src/backend/tests/unit/components/data/test_web_search.py
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/data/web_search.py (1)
src/lfx/src/lfx/custom/custom_component/component.py (1)
log(1475-1492)
src/backend/tests/unit/components/data/test_web_search.py (1)
src/lfx/src/lfx/components/data/web_search.py (9)
ensure_url(137-144)validate_url(129-135)_sanitize_query(146-148)clean_html(150-152)update_build_config(106-127)perform_web_search(154-208)perform_news_search(210-268)perform_rss_read(270-309)perform_search(311-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Test Starter Templates
- GitHub Check: Update Starter Projects
- GitHub Check: Ruff Style Check (3.13)
| async def test_invalid_url_handling(self): | ||
| # Create a test instance of the component | ||
| """Test invalid URL handling.""" | ||
| component = WebSearchComponent() | ||
|
|
||
| # Set an invalid URL | ||
| # Test invalid URL | ||
| invalid_url = "htp://invalid-url" | ||
|
|
||
| # Ensure the URL is invalid | ||
| with pytest.raises(ValueError, match="Invalid URL"): | ||
| component.ensure_url(invalid_url) |
There was a problem hiding this comment.
Remove the unnecessary async declaration from this test
Declaring the test as async def without decorating it with @pytest.mark.asyncio makes pytest collect it as a coroutine and it never runs (or outright errors, depending on plugin config). This breaks coverage of the URL validation path. Convert it back to a plain synchronous test.
- async def test_invalid_url_handling(self):
+ def test_invalid_url_handling(self):
"""Test invalid URL handling."""
component = WebSearchComponent()
# Test invalid URL
invalid_url = "htp://invalid-url"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_invalid_url_handling(self): | |
| # Create a test instance of the component | |
| """Test invalid URL handling.""" | |
| component = WebSearchComponent() | |
| # Set an invalid URL | |
| # Test invalid URL | |
| invalid_url = "htp://invalid-url" | |
| # Ensure the URL is invalid | |
| with pytest.raises(ValueError, match="Invalid URL"): | |
| component.ensure_url(invalid_url) | |
| def test_invalid_url_handling(self): | |
| """Test invalid URL handling.""" | |
| component = WebSearchComponent() | |
| # Test invalid URL | |
| invalid_url = "htp://invalid-url" | |
| # Ensure the URL is invalid | |
| with pytest.raises(ValueError, match="Invalid URL"): | |
| component.ensure_url(invalid_url) |
🤖 Prompt for AI Agents
In src/backend/tests/unit/components/data/test_web_search.py around lines 31 to
40, the test is declared as async which causes pytest to collect it as a
coroutine and not execute it; change the test signature from async def
test_invalid_url_handling(self): to a regular def
test_invalid_url_handling(self): (remove the async keyword) so pytest runs it
synchronously, and do not add @pytest.mark.asyncio since the test body is
synchronous — keep the rest of the test (creating WebSearchComponent, setting
invalid_url, and asserting pytest.raises(ValueError, match="Invalid URL") on
component.ensure_url(invalid_url)) unchanged.
| ceid = f"{gl}:{hl.split('-')[0]}" | ||
|
|
||
| # Build RSS URL based on parameters | ||
| if topic: | ||
| # Topic-based feed | ||
| base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}" | ||
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | ||
| rss_url = base_url + params | ||
| elif location: | ||
| # Location-based feed | ||
| base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}" | ||
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | ||
| rss_url = base_url + params | ||
| elif query: | ||
| # Keyword search feed | ||
| base_url = "https://news.google.com/rss/search?q=" | ||
| query_encoded = quote_plus(query) | ||
| params = f"&hl={hl}&gl={gl}&ceid={ceid}" | ||
| rss_url = f"{base_url}{query_encoded}{params}" |
There was a problem hiding this comment.
Honor the user-provided ceid parameter for News mode.
We surface ceid as an advanced input, but this code recomputes it from gl/hl and ignores whatever the user set (e.g., ceid="BR:pt" keeps resolving to US:en). That breaks locale targeting parity with the retired NewsSearch component. Please respect the supplied value while retaining the current fallback for users who leave it blank.
- ceid = f"{gl}:{hl.split('-')[0]}"
+ ceid_default = f"{gl}:{hl.split('-')[0]}"
+ ceid_value = getattr(self, "ceid", "") or ""
+ ceid = ceid_value.strip() or ceid_default(If we want the auto-derived value to stay in sync when gl/hl change, consider making the input default empty so ceid_value only overrides when the user explicitly sets it.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ceid = f"{gl}:{hl.split('-')[0]}" | |
| # Build RSS URL based on parameters | |
| if topic: | |
| # Topic-based feed | |
| base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}" | |
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = base_url + params | |
| elif location: | |
| # Location-based feed | |
| base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}" | |
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = base_url + params | |
| elif query: | |
| # Keyword search feed | |
| base_url = "https://news.google.com/rss/search?q=" | |
| query_encoded = quote_plus(query) | |
| params = f"&hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = f"{base_url}{query_encoded}{params}" | |
| ceid_default = f"{gl}:{hl.split('-')[0]}" | |
| ceid_value = getattr(self, "ceid", "") or "" | |
| ceid = ceid_value.strip() or ceid_default | |
| # Build RSS URL based on parameters | |
| if topic: | |
| # Topic-based feed | |
| base_url = f"https://news.google.com/rss/headlines/section/topic/{quote_plus(topic.upper())}" | |
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = base_url + params | |
| elif location: | |
| # Location-based feed | |
| base_url = f"https://news.google.com/rss/headlines/section/geo/{quote_plus(location)}" | |
| params = f"?hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = base_url + params | |
| elif query: | |
| # Keyword search feed | |
| base_url = "https://news.google.com/rss/search?q=" | |
| query_encoded = quote_plus(query) | |
| params = f"&hl={hl}&gl={gl}&ceid={ceid}" | |
| rss_url = f"{base_url}{query_encoded}{params}" |
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/data/web_search.py around lines 218 to 236, the
code always recomputes ceid from gl/hl and ignores a user-supplied ceid; change
it to honor the user-provided ceid when present and only compute the fallback
from gl/hl when ceid is empty/None. Concretely, read the incoming ceid parameter
into a local ceid_value (or reuse the param), set ceid = ceid_value if truthy
else f"{gl}:{hl.split('-')[0]}", and then use that ceid in the existing URL
construction paths so user-specified locale targeting is respected while
preserving the existing fallback behavior.
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (46.96%) is below the target coverage (55.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #9975 +/- ##
==========================================
+ Coverage 23.97% 24.10% +0.13%
==========================================
Files 1091 1091
Lines 40014 40014
Branches 5543 5543
==========================================
+ Hits 9594 9647 +53
+ Misses 30249 30196 -53
Partials 171 171
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
| from lfx.schema import DataFrame | ||
|
|
||
|
|
||
| class NewsSearchComponent(Component): |
There was a problem hiding this comment.
can we safely remove instead of deprecating?



Summary
Changes
Benefits
Test Results
All 25 tests passing:
Breaking Changes
NewsSearchComponentremoved - useWebSearchComponentwithsearch_mode="News"RSSReaderComponentremoved - useWebSearchComponentwithsearch_mode="RSS"🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Refactor
Tests