[WEB-4172] feat: Crawl work item links for title and favicon#7117
[WEB-4172] feat: Crawl work item links for title and favicon#7117sriramveeraghanta merged 21 commits intopreviewfrom
Conversation
WalkthroughA background task module is added to asynchronously retrieve webpage titles and favicons for issue links. The view triggers this task after creating or updating an issue link, updating the record with the fetched metadata. The frontend now prioritizes displaying the favicon or a fallback icon with an optional subtitle. Error handling and helper functions are included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant IssueLinkViewSet
participant CeleryWorker
participant ExternalSite
participant Database
User->>IssueLinkViewSet: Create/Update IssueLink (with URL)
IssueLinkViewSet->>Database: Save IssueLink
IssueLinkViewSet->>CeleryWorker: crawl_work_item_link_title.delay(id, url)
CeleryWorker->>ExternalSite: HTTP GET url
CeleryWorker->>ExternalSite: Fetch favicon
CeleryWorker->>Database: Update IssueLink with title and favicon data
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Pull Request Linked with Plane Work Items Comment Automatically Generated by Plane |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
156-158: Use proper logging instead of print statements.Using
print()for error logging is not ideal for production applications as it doesn't integrate with logging frameworks and can't be easily managed.Replace print with proper logging:
+import logging + +logger = logging.getLogger(__name__) except Exception as e: - print(f"Failed to fetch favicon: {e}") + logger.warning(f"Failed to fetch favicon: {e}") return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apiserver/plane/app/views/issue/link.py(2 hunks)apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (2)
apiserver/plane/app/views/issue/link.py (2)
48-50: Good integration of the background task.The task is properly integrated:
- Called asynchronously with
.delay()- Executed after the model is saved, ensuring the ID exists
- Parameters correctly extracted from serializer data
- Maintains the existing activity logging flow
18-18: Import integration looks good.The import is correctly placed with other background task imports and follows the project's import organization.
|
@sangeethailango Go through the comments from code rabbit and make necessary changes. Also add type hints |
|
@sangeethailango |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (4)
apiserver/plane/bgtasks/work_item_link_task.py (4)
53-60: Consider merging favicon-related functionsAs suggested in the previous review, consider moving
find_favicon_urllogic intofetch_and_encode_faviconfor better encapsulation. This would also allow you to handle the default favicon case more cleanly.Refactor to merge the functions:
- # Find favicon URL - favicon_url = find_favicon_url(soup, url) - - # Fetch and encode favicon - favicon_base64 = ( - fetch_and_encode_favicon(favicon_url, headers) - if favicon_url - else DEFAULT_FAVICON - ) + # Fetch and encode favicon + favicon_base64 = fetch_and_encode_favicon(soup, url, headers)And update
fetch_and_encode_faviconto include the URL finding logic:def fetch_and_encode_favicon(soup, base_url, headers): """ Find, fetch and encode favicon as base64. """ # Find favicon URL logic here favicon_url = None favicon_selectors = [ 'link[rel="icon"]', 'link[rel="shortcut icon"]', 'link[rel="apple-touch-icon"]', 'link[rel="apple-touch-icon-precomposed"]', ] for selector in favicon_selectors: favicon_tag = soup.select_one(selector) if favicon_tag and favicon_tag.get("href"): favicon_url = urljoin(base_url, favicon_tag["href"]) break # Try fallback if no favicon found if not favicon_url: parsed_url = urlparse(base_url) fallback_url = f"{parsed_url.scheme}://{parsed_url.netloc}/favicon.ico" try: response = requests.head(fallback_url, timeout=5) if response.status_code == 200: favicon_url = fallback_url except Exception: pass # Return default if no favicon URL found if not favicon_url: return DEFAULT_FAVICON # Fetch and encode favicon try: response = requests.get(favicon_url, headers=headers, timeout=10) response.raise_for_status() content_type = response.headers.get("content-type", "image/x-icon") favicon_base64 = base64.b64encode(response.content).decode("utf-8") return f"data:{content_type};base64,{favicon_base64}" except Exception as e: print(f"Failed to fetch favicon: {e}") return DEFAULT_FAVICONAlso applies to: 94-131, 133-160
15-23:⚠️ Potential issueAdd error handling and set the title field
The function has two issues:
- No error handling if the IssueLink doesn't exist
- The
titlefield is not being set despite the function name suggesting it should beApply this fix to handle errors and set the title field:
@shared_task def crawl_work_item_link_title(id, url): - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + meta_dict = json.loads(meta_data) + issue_link = IssueLink.objects.get(id=id) + issue_link.title = meta_dict.get("title") + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
25-92:⚠️ Potential issueAdd URL validation to prevent SSRF attacks
The function lacks URL validation, which could be exploited for Server-Side Request Forgery (SSRF) attacks to access internal services.
Add URL validation before making requests:
def crawl_work_item_link_title_and_favicon(url): """ Crawls a URL to extract the title and favicon. Args: url (str): The URL to crawl Returns: str: JSON string containing title and base64-encoded favicon """ try: + # Validate URL scheme and prevent internal network access + parsed = urlparse(url) + if parsed.scheme not in ['http', 'https']: + raise ValueError("Only HTTP and HTTPS URLs are allowed") + + # Prevent access to private IP ranges + import ipaddress + try: + ip = ipaddress.ip_address(parsed.hostname) + if ip.is_private or ip.is_loopback or ip.is_reserved: + raise ValueError("Access to private/internal networks is not allowed") + except ValueError: + # Not an IP address, continue with domain validation + pass + # Set up headers to mimic a real browser headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" # noqa: E501 }
94-131:⚠️ Potential issueReturn None instead of base64 string
The function returns
DEFAULT_FAVICON(a base64-encoded string) on lines 128 and 130, but the calling code expects a URL or None. This will causefetch_and_encode_favicon()to fail when trying to fetch the base64 string as a URL.Fix by returning
Noneinstead ofDEFAULT_FAVICON:# Check if fallback exists try: response = requests.head(fallback_url, timeout=5) if response.status_code == 200: return fallback_url except Exception: - return DEFAULT_FAVICON + return None - return DEFAULT_FAVICON + return None
🧹 Nitpick comments (1)
apiserver/plane/app/views/issue/link.py (1)
62-65: Remove unnecessary re-fetching of issue linkThe re-fetching and re-serialization of the issue link immediately after calling the asynchronous background task is unnecessary. Since
crawl_work_item_link_titleis an async task, it won't have completed by the time these lines execute, so the fetched data won't include the updated title/metadata.The original serializer data already contains the correct response.
Remove the redundant re-fetch in both methods:
notification=True, origin=base_host(request=request, is_app=True), ) - - issue_link = self.get_queryset().get(id=serializer.data.get("id")) - serializer = IssueLinkSerializer(issue_link) - return Response(serializer.data, status=status.HTTP_201_CREATED)notification=True, origin=base_host(request=request, is_app=True), ) - issue_link = self.get_queryset().get(id=serializer.data.get("id")) - serializer = IssueLinkSerializer(issue_link) - return Response(serializer.data, status=status.HTTP_200_OK)Also applies to: 95-98
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apiserver/plane/app/views/issue/link.py(5 hunks)apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
116-116:Details
✅ Verification successful
Verify URL joining handles edge cases correctly
The PR comments mention a potential issue with URL joining. While
urljoinshould handle most cases correctly, let's verify it handles edge cases properly.
🌐 Web query:
Does Python's urljoin function correctly handle edge cases like protocol-relative URLs (//example.com/favicon.ico) and absolute URLs when joining with a base URL?💡 Result:
Python's
urljoinfunction from theurllib.parsemodule is designed to construct absolute URLs by combining a base URL with another URL, but it has specific behaviors when handling edge cases.Protocol-Relative URLs
When joining a protocol-relative URL (starting with
//) with a base URL,urljoincorrectly preserves the scheme (protocol) from the base URL while using the hostname from the relative URL:from urllib.parse import urljoin # Base URL with http scheme, joined with protocol-relative URL result = urljoin('http://www.cwi.nl/%7Eguido/Python.html', '//www.python.org/%7Eguido') # Result: 'http://www.python.org/%7Eguido'In this example, the function takes the
httpscheme from the base URL and combines it with the protocol-relative URL, resulting in a proper absolute URL[10].Absolute URLs
When the second argument to
urljoinis an absolute URL (containing a scheme likehttp:orhttps:), the function will return that URL largely unchanged, ignoring most parts of the base URL:from urllib.parse import urljoin # Joining with an absolute URL base = "http://example.com/path/" url = "https://another-site.com/resource" result = urljoin(base, url) # Result: 'https://another-site.com/resource'Edge Cases and Limitations
There are some known issues and edge cases:
Path-relative URLs with
.and..: There are documented issues when joining two path-relative URLs that involve dot notation. This has been reported as a bug in the Python issue tracker[6].URLs with scheme but no authority: URLs like
http:gare treated in a special way for backward compatibility, being translated tohttp:///g, which can lead to unexpected results[4].Relative paths with drive letters: Earlier versions had issues with joining relative paths when a drive was present, but this was fixed in Python updates[8].
Practical Usage in Web Scraping
For web scraping,
urljoinis commonly used to convert relative URLs found in HTML to absolute URLs:from urllib.parse import urljoin from bs4 import BeautifulSoup def make_absolute_links(url, soup): # Convert relative links in <a> tags for a in soup.findAll('a'): if a.get('href'): a['href'] = urljoin(url, a.get('href')) # Similar processing for other tags with links for img in soup.findAll('img'): if img.get('src'): img['src'] = urljoin(url, img.get('src'))This pattern is commonly used in web scrapers to ensure all links are absolute before following them[7][5].
Despite some edge cases,
urljoingenerally works correctly for the most common URL joining scenarios, making it a valuable tool for URL manipulation in Python.Citations:
- 1: https://coderbook.com/@marcus/python-tips-and-tricks-for-path-and-url-parsing/
- 2: https://discuss.python.org/t/deprecating-urllib-parse-urlparse/35028
- 3: https://webscraping.ai/faq/lxml/how-do-i-handle-relative-urls-when-scraping-with-lxml
- 4: python/cpython#85110
- 5: https://sanjayasubedi.com.np/python/scraping/web-scraping-with-python-first-scraper/
- 6: python/cpython#96015
- 7: https://mycoding.uk/a/python__how_to_make_absolute_links_in_beautifulsoup.html
- 8: https://python.readthedocs.io/fr/hack-in-language/whatsnew/changelog.html
- 9: https://stackabuse.com/bytes/parsing-urls-with-python/
- 10: https://docs.python.org/3/library/urllib.parse.html
URL joining usage is sound
We’ve confirmed that Python’s
urljoincorrectly handles:
- Protocol-relative URLs (
//example.com/favicon.icoretains the base scheme)- Absolute URLs (returns the href unchanged)
While there are rare edge cases (dot segments, scheme-only URLs), they’re unlikely for typical favicon
hrefvalues. No changes are needed here.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
16-24:⚠️ Potential issueAdd error handling for missing IssueLink records.
The task will crash if the IssueLink with the given ID doesn't exist, which is possible in distributed systems where records might be deleted between task creation and execution.
@shared_task def crawl_work_item_link_title(id, url): - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
144-148:⚠️ Potential issueFix DEFAULT_FAVICON handling logic.
When
find_favicon_urlreturnsNone, the code setsfavicon_urltoDEFAULT_FAVICON(a base64 string) and then tries to make an HTTP request to it, which will fail since it's not a URL.try: favicon_url = find_favicon_url(soup, url) if favicon_url is None: - favicon_url = DEFAULT_FAVICON - - response = requests.get(favicon_url, headers=headers, timeout=10) - response.raise_for_status() - - # Get content type - content_type = response.headers.get("content-type", "image/x-icon") - - # Convert to base64 - favicon_base64 = base64.b64encode(response.content).decode("utf-8") - - # Return as data URI - return { - "favicon_url": favicon_url, - "favicon_base64": f"data:{content_type};base64,{favicon_base64}", - } + # Use default favicon when no URL is found + return { + "favicon_url": None, + "favicon_base64": f"data:image/svg+xml;base64,{DEFAULT_FAVICON}", + } + + response = requests.get(favicon_url, headers=headers, timeout=10) + response.raise_for_status() + + # Get content type + content_type = response.headers.get("content-type", "image/x-icon") + + # Convert to base64 + favicon_base64 = base64.b64encode(response.content).decode("utf-8") + + # Return as data URI + return { + "favicon_url": favicon_url, + "favicon_base64": f"data:{content_type};base64,{favicon_base64}", + }
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
7-7: Remove unused import.The
jsonmodule is imported but never used in the code.-import json🧰 Tools
🪛 Ruff (0.11.9)
7-7:
jsonimported but unusedRemove unused import:
json(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
apiserver/plane/bgtasks/work_item_link_task.py
7-7: json imported but unused
Remove unused import: json
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
26-91: Good implementation with SSRF protection.The function correctly implements URL validation to prevent SSRF attacks and returns a proper dictionary structure with comprehensive error handling.
93-130: LGTM - Fixed return type issue.The function now correctly returns
Noneinstead ofDEFAULT_FAVICONwhen no favicon URL is found, addressing the previous type mismatch issue.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
169-171: Improve error handling return value.Returning
Noneon error will cause issues for the caller who expects a dictionary with specific keys (favicon_urlandfavicon_base64).This matches the previous review feedback. Apply this fix:
except Exception as e: print(f"Failed to fetch favicon: {e}") - return None + return { + "favicon_url": None, + "favicon_base64": f"data:image/svg+xml;base64,{DEFAULT_FAVICON}", + }
18-26:⚠️ Potential issueAdd error handling and missing title assignment.
This function has two critical issues:
- No error handling for
IssueLink.objects.get(id=id)which will raiseDoesNotExistif the record doesn't exist- Missing title assignment - the crawled title should be assigned to
issue_link.titleApply this diff to fix both issues:
@shared_task def crawl_work_item_link_title(id: str, url: str) -> None: - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + + issue_link.title = meta_data.get("title") + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
🧹 Nitpick comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
28-37: Fix docstring inconsistency.The function signature indicates it returns
Dict[str, Any]but the docstring states it returns a "JSON string". The implementation actually returns a dictionary.Update the docstring:
""" Crawls a URL to extract the title and favicon. Args: url (str): The URL to crawl Returns: - str: JSON string containing title and base64-encoded favicon + Dict[str, Any]: Dictionary containing title, favicon, url, and favicon_url """
96-97: Remove debug print statements.These debug print statements should be removed from production code.
def find_favicon_url(soup: BeautifulSoup, base_url: str) -> Optional[str]: - print(soup, "PRint soup") - print(base_url, "BaseURL") """ Find the favicon URL from HTML soup.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
1-16: LGTM! Clean imports and constant definition.The imports are well-organized and include all necessary dependencies. The DEFAULT_FAVICON constant provides a good fallback SVG icon.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
apiserver/plane/bgtasks/work_item_link_task.py (3)
38-49: Security vulnerability: Incomplete URL validation.The URL validation is missing scheme validation and has a potential bug when
parsed.hostnameis None, as identified in previous reviews.Apply this diff to fix the security issues:
try: # Prevent access to private IP ranges parsed = urlparse(url) + + # Validate URL scheme + if parsed.scheme not in ['http', 'https']: + raise ValueError("Only HTTP and HTTPS URLs are allowed") + + # Validate hostname exists + if not parsed.hostname: + raise ValueError("Invalid URL: no hostname found") try: ip = ipaddress.ip_address(parsed.hostname) if ip.is_private or ip.is_loopback or ip.is_reserved: raise ValueError("Access to private/internal networks is not allowed") - except ValueError: + except ValueError as ve: + # Re-raise if it's our security check + if "private/internal networks" in str(ve): + raise # Not an IP address, continue with domain validation pass
149-156: Critical bug: Using base64 string as URL.When
find_favicon_urlreturnsNone, the code incorrectly assignsDEFAULT_FAVICON(a base64 string) tofavicon_urland then attempts to fetch it as a URL, which will cause a request failure. This issue was identified in previous reviews but remains unfixed.Apply this diff to handle the None case properly:
try: favicon_url = find_favicon_url(soup, url) if favicon_url is None: - favicon_url = DEFAULT_FAVICON - - response = requests.get(favicon_url, headers=headers, timeout=10) - response.raise_for_status() - - # Get content type - content_type = response.headers.get("content-type", "image/x-icon") - - # Convert to base64 - favicon_base64 = base64.b64encode(response.content).decode("utf-8") - - # Return as data URI - return { - "favicon_url": favicon_url, - "favicon_base64": f"data:{content_type};base64,{favicon_base64}", - } + # Use default favicon directly + return { + "favicon_url": None, + "favicon_base64": f"data:image/svg+xml;base64,{DEFAULT_FAVICON}", + } + + response = requests.get(favicon_url, headers=headers, timeout=10) + response.raise_for_status() + + # Get content type + content_type = response.headers.get("content-type", "image/x-icon") + + # Convert to base64 + favicon_base64 = base64.b64encode(response.content).decode("utf-8") + + # Return as data URI + return { + "favicon_url": favicon_url, + "favicon_base64": f"data:{content_type};base64,{favicon_base64}", + }
18-26:⚠️ Potential issueCritical: Missing title assignment and error handling.
The shared task is missing the title assignment that was discussed in previous reviews and lacks proper error handling for database operations.
Apply this diff to fix both issues:
@shared_task def crawl_work_item_link_title(id: str, url: str) -> None: - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + issue_link.title = meta_data.get("title") + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
96-97: Remove debug print statements.The debug print statements should be removed from production code.
Apply this diff to remove the debug statements:
def find_favicon_url(soup: BeautifulSoup, base_url: str) -> Optional[str]: - print(soup, "PRint soup") - print(base_url, "BaseURL") """ Find the favicon URL from HTML soup.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
apiserver/plane/bgtasks/work_item_link_task.py (3)
39-49:⚠️ Potential issueAdd URL scheme validation and fix IP validation bug.
The current implementation has the same security issues identified in previous reviews:
- Missing URL scheme validation - should only allow HTTP/HTTPS
- IP validation bug: if
parsed.hostnameis None,ipaddress.ip_address(None)will raiseValueErrorApply this diff to fix both issues:
try: # Prevent access to private IP ranges parsed = urlparse(url) + + # Validate URL scheme + if parsed.scheme not in ['http', 'https']: + raise ValueError("Only HTTP and HTTPS URLs are allowed") + + # Validate hostname exists + if not parsed.hostname: + raise ValueError("Invalid URL: no hostname found") try: ip = ipaddress.ip_address(parsed.hostname) if ip.is_private or ip.is_loopback or ip.is_reserved: raise ValueError("Access to private/internal networks is not allowed") - except ValueError: + except ValueError as ve: + # Re-raise if it's our security check + if "private/internal networks" in str(ve): + raise # Not an IP address, continue with domain validation pass
147-166:⚠️ Potential issueCritical bug: Using base64 string as URL.
This is the same critical issue identified in previous reviews that remains unfixed. When
find_favicon_urlreturnsNone, the code setsfavicon_urltoDEFAULT_FAVICON(a base64 string) and then tries to fetch it as a URL on line 152. This will cause a request failure.Apply this diff to handle the None case properly:
try: favicon_url = find_favicon_url(soup, url) if favicon_url is None: - favicon_url = DEFAULT_FAVICON - - response = requests.get(favicon_url, headers=headers, timeout=10) - response.raise_for_status() - - # Get content type - content_type = response.headers.get("content-type", "image/x-icon") - - # Convert to base64 - favicon_base64 = base64.b64encode(response.content).decode("utf-8") - - # Return as data URI - return { - "favicon_url": favicon_url, - "favicon_base64": f"data:{content_type};base64,{favicon_base64}", - } + # Use default favicon directly + return { + "favicon_url": None, + "favicon_base64": f"data:image/svg+xml;base64,{DEFAULT_FAVICON}", + } + + response = requests.get(favicon_url, headers=headers, timeout=10) + response.raise_for_status() + + # Get content type + content_type = response.headers.get("content-type", "image/x-icon") + + # Convert to base64 + favicon_base64 = base64.b64encode(response.content).decode("utf-8") + + # Return as data URI + return { + "favicon_url": favicon_url, + "favicon_base64": f"data:{content_type};base64,{favicon_base64}", + }
18-26:⚠️ Potential issueAdd error handling for database operations.
The task lacks error handling for
IssueLink.objects.get(id=id)which will raiseDoesNotExistif the record is not found, causing the background task to fail.Apply this diff to add proper error handling:
@shared_task def crawl_work_item_link_title(id: str, url: str) -> None: - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
35-37: Fix incorrect docstring return type.The docstring states the function returns a "JSON string" but it actually returns
Dict[str, Any].Apply this diff to correct the docstring:
Returns: - str: JSON string containing title and base64-encoded favicon + Dict[str, Any]: Dictionary containing title and base64-encoded favicon
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
95-132: LGTM! Favicon URL handling has been corrected.The function now properly returns
Nonewhen no favicon is found, addressing the previous feedback about not returningDEFAULT_FAVICONas a URL.
@sriramveeraghanta I have checked it. If href is an absolute URL, urljoin returns it directly. If not, it joins the href with the |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
38-49: Add missing URL scheme validation.The URL validation is incomplete and has the same issues identified in past comments. The code lacks URL scheme validation and has potential issues with hostname validation.
try: # Prevent access to private IP ranges parsed = urlparse(url) + + # Validate URL scheme + if parsed.scheme not in ['http', 'https']: + raise ValueError("Only HTTP and HTTPS URLs are allowed") + + # Validate hostname exists + if not parsed.hostname: + raise ValueError("Invalid URL: no hostname found") try: ip = ipaddress.ip_address(parsed.hostname) if ip.is_private or ip.is_loopback or ip.is_reserved: raise ValueError("Access to private/internal networks is not allowed") - except ValueError: + except ValueError as ve: + # Re-raise if it's our security check + if "private/internal networks" in str(ve) or "Only HTTP" in str(ve) or "Invalid URL" in str(ve): + raise # Not an IP address, continue with domain validation pass
18-25:⚠️ Potential issueAdd error handling for IssueLink.objects.get().
The task function lacks error handling for the database query, which could cause the background task to fail if the IssueLink record doesn't exist.
@shared_task def crawl_work_item_link_title(id: str, url: str) -> None: - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
170-170: Remove unused variable assignment.Static analysis correctly identifies that the exception variable
eis assigned but never used in the first except block.- except Exception as e: + except Exception:🧰 Tools
🪛 Ruff (0.11.9)
170-170: Local variable
eis assigned to but never usedRemove assignment to unused variable
e(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
apiserver/plane/bgtasks/work_item_link_task.py
170-170: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
67-67:Details
❌ Incorrect review comment
Handle potential None return from fetch_and_encode_favicon.
The
fetch_and_encode_faviconfunction can returnNonein some error cases, but the code assumes it always returns a dictionary. This could cause aTypeErrorwhen accessingfavicon_base64["favicon_base64"].
🏁 Script executed:
#!/bin/bash # Check if fetch_and_encode_favicon can return None in any code paths ast-grep --pattern $'def fetch_and_encode_favicon($$$) { $$$ return None $$$ }'Length of output: 86
🏁 Script executed:
#!/bin/bash # Locate and print the definition of fetch_and_encode_favicon file=$(rg -l "def fetch_and_encode_favicon" -n) if [[ -z "$file" ]]; then echo "No definition found for fetch_and_encode_favicon" exit 0 fi echo "Function defined in: $file" # Print the first 200 lines to capture its return logic sed -n '1,200p' "$file"Length of output: 6289
fetch_and_encode_favicon never returns None – guard is unnecessary
The
fetch_and_encode_faviconfunction always returns a dictionary in every code path (even on errors it falls back to a default dict), so there’s no case where it returnsNone. You can safely accessfavicon_base64["favicon_base64"]without an extraNonecheck.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
apiserver/plane/bgtasks/work_item_link_task.py (2)
38-49: Add URL scheme validation and fix hostname validation bug.The security validation is incomplete and has bugs that were previously identified.
The issues are:
- Missing URL scheme validation (should only allow HTTP/HTTPS)
- If
parsed.hostnameis None,ipaddress.ip_address(None)will raiseValueErrortry: # Prevent access to private IP ranges parsed = urlparse(url) + + # Validate URL scheme + if parsed.scheme not in ['http', 'https']: + raise ValueError("Only HTTP and HTTPS URLs are allowed") + + # Validate hostname exists + if not parsed.hostname: + raise ValueError("Invalid URL: no hostname found") try: ip = ipaddress.ip_address(parsed.hostname) if ip.is_private or ip.is_loopback or ip.is_reserved: raise ValueError("Access to private/internal networks is not allowed") - except ValueError: + except ValueError as ve: + # Re-raise if it's our security check + if "private/internal networks" in str(ve) or "Only HTTP" in str(ve) or "Invalid URL" in str(ve): + raise # Not an IP address, continue with domain validation pass
18-25:⚠️ Potential issueAdd error handling for database operations.
The function lacks error handling for
IssueLink.objects.get(id=id)which will raiseDoesNotExistif the record is not found, causing the background task to fail.@shared_task def crawl_work_item_link_title(id: str, url: str) -> None: - meta_data = crawl_work_item_link_title_and_favicon(url) - issue_link = IssueLink.objects.get(id=id) - - issue_link.metadata = meta_data - - issue_link.save() + try: + meta_data = crawl_work_item_link_title_and_favicon(url) + issue_link = IssueLink.objects.get(id=id) + issue_link.metadata = meta_data + issue_link.save() + except IssueLink.DoesNotExist: + print(f"IssueLink with id {id} not found") + except Exception as e: + print(f"Failed to update issue link {id}: {e}")
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
28-37: Fix inconsistent return type documentation.The docstring incorrectly states the function returns a "JSON string" but it actually returns a
Dict[str, Any]as indicated by the type hint and implementation.""" Crawls a URL to extract the title and favicon. Args: url (str): The URL to crawl Returns: - str: JSON string containing title and base64-encoded favicon + Dict[str, Any]: Dictionary containing title, favicon, and metadata """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/work_item_link_task.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
apiserver/plane/bgtasks/work_item_link_task.py
63-63: f-string without any placeholders
Remove extraneous f prefix
(F541)
169-169: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
apiserver/plane/bgtasks/work_item_link_task.py (1)
103-190: Good modular structure and error handling improvements.The separation of concerns between
find_favicon_urlandfetch_and_encode_faviconfunctions is well-implemented. The error handling with fallback to default favicon and appropriate timeout values (2 seconds) address previous feedback effectively.🧰 Tools
🪛 Ruff (0.11.9)
169-169: f-string without any placeholders
Remove extraneous
fprefix(F541)
…ane/plane into feat-crawl-work-item-link
Crawl links UI
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
web/core/components/issues/issue-detail/links/link-item.tsx (1)
67-70: Review subtitle display logic for potential redundancy.The subtitle display shows
linkTitle(metadata title) alongsidelinkDetail.title. This might create redundancy or confusion if both titles are similar or identical.Consider this alternative approach to avoid redundant information:
- {linkDetail.title && linkDetail.title !== "" ? linkDetail.title : linkDetail.url} - - {linkTitle && linkTitle !== "" && <span className="text-custom-text-400 text-xs">{linkTitle}</span>} + {linkDetail.title && linkDetail.title !== "" ? linkDetail.title : linkDetail.url} + + {linkTitle && linkTitle !== "" && linkTitle !== linkDetail.title && ( + <span className="text-custom-text-400 text-xs">{linkTitle}</span> + )}This ensures the metadata title is only shown when it's different from the user-provided title, reducing visual clutter.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
web/core/components/issues/issue-detail/links/link-item.tsx(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
web/core/components/issues/issue-detail/links/link-item.tsx (1)
packages/ui/src/tooltip/tooltip.tsx (1)
Tooltip(36-110)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
web/core/components/issues/issue-detail/links/link-item.tsx (3)
5-5: Import addition looks good.The
Linkicon import is correctly added for the fallback icon display.
40-42: Good implementation of metadata extraction.The optional chaining correctly handles cases where metadata might not be available, and the type annotations are appropriate.
65-65: Verify flex container layout.The flex layout addition supports the new subtitle feature correctly.
|
This PR contains both frontend and backend changes for this feature, and all the code in the frontend has also been approved by prateek in a separate PR. |
* feat: added a python bg task to crawl work item links for title and description * fix: return meta_data in the response * fix: add validation for accessing IP ranges * fix: remove json.dumps * fix: handle exception by returning None * refactor: call find_favicon_url inside fetch_and_encode_favicon function * chore: type hints * fix: Handle None * fix: remove print statementsg * chore: added favicon and title of links * fix: return null if no title found * Update apiserver/plane/bgtasks/work_item_link_task.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix: remove exception handling * fix: reduce timeout seconds * fix: handle timeout exception * fix: remove request timeout handling * feat: add Link icon to issue detail links and update rendering logic * fix: use logger for exception --------- Co-authored-by: sangeethailango <sangeethailango21@gmail.com> Co-authored-by: JayashTripathy <jayashtripathy371@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Description
Created a function that provides a solution for crawling URLs to extract titles and favicons.
Type of Change
Summary by CodeRabbit
** Reference
#7124