fix(alerts): implement Slack channel pagination to resolve timeout issues by gabotorresruiz · Pull Request #35832 · apache/superset

gabotorresruiz · 2025-10-24T15:13:41Z

SUMMARY

Problem

Users were unable to edit Slack recipients in Alerts & Reports when their Slack workspace had a large number of channels. The dropdown would fail to load, showing a timeout error after 30+ seconds, making the feature completely unusable for large organizations.

Root Cause

The previous implementation attempted to load all Slack channels at once without proper pagination:

For large workspaces, this meant trying to fetch thousands of channels in a single blocking request
The Slack API is paginated by design but doesn't support server-side search/filtering
Timeouts occurred because the backend tried to load all channels before returning any results

Solution

This PR implements proper pagination with client-side search filtering:

Backend:

New get_channels_with_search() function with two optimized paths:
- Without search: Returns paginated results directly from Slack API (fast, 1-page responses)
- With search: Streams through Slack API pages, filtering matches client-side until limit reached
Increased page size from 200 → 1000 to reduce API calls by 5× and minimize rate limiting
Added support for comma-separated search strings (e.g., "engineering,marketing" for OR logic)
Deleted legacy get_channels() function and migrated cache_channels Celery task to use new pagination

Frontend:

Replaced manual pagination logic with AsyncSelect's built-in pagination support
Added cursor tracking and caching for seamless scrolling
Graceful fallback to Slack V1 (manual input) on errors

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before

After

TESTING INSTRUCTIONS

Prerequisites

Enable feature flags in superset/config.py:

     FEATURE_FLAGS = {
         "ALERT_REPORTS": True,
         "ALERT_REPORT_SLACK_V2": True,
     }

Set SLACK_API_TOKEN in superset/config.py
Navigate to Settings --> Alerts & Reports --> Create Alert/Report
Select Notification Method: Slack
Verify initial load: Dropdown should load first 100 channels in ~1 second
Test pagination: Scroll down in dropdown to load more channels
Test search: Type in search box to filter channels
Test refresh button:

Select some channels
Click the sync icon to the right of dropdown
Icon should spin while loading
Selected channels should clear
Fresh channel list should load

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Missing limit validation constraints ▹ view
	Unbounded cursor cache growth ▹ view
	Redundant string lowercasing in search loop ▹ view
	Premature schema loading before search filtering ▹ view

Files scanned

File Path	Reviewed
superset/utils/slack.py	✅
superset/reports/schemas.py	✅
superset/reports/api.py	✅
superset-frontend/src/features/alerts/components/NotificationMethod.tsx	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

bito-code-review · 2025-10-24T15:26:12Z

Interaction Diagram by Bito

sequenceDiagram
participant User as User<br/>
participant Modal as AlertReportModal
participant NotifComp as NotificationMethod<br/>🔄 Updated | ●●● High
participant AsyncSel as AsyncSelect<br/>🔄 Updated | ●●● High
participant API as REST API<br/>🔄 Updated | ●●○ Medium
participant SlackUtil as get_channels_with_search<br/>🔄 Updated | ●●● High
participant SlackSDK as Slack SDK
User->>Modal: Open alert/report settings
Modal->>NotifComp: Render notification method selector
User->>NotifComp: Select SlackV2 method
NotifComp->>AsyncSel: Initialize with fetchSlackChannels callback
User->>AsyncSel: Search or scroll for channels
AsyncSel->>NotifComp: Call fetchSlackChannels(search, page, pageSize)
NotifComp->>API: GET /api/v1/report/slack_channels/?q=...
API->>SlackUtil: get_channels_with_search(search_string, cursor, limit)
SlackUtil->>SlackSDK: conversations_list(cursor, limit, types)
SlackSDK-->>SlackUtil: Return channels page + next_cursor
SlackUtil-->>API: &#123;result, next_cursor, has_more&#125;
API-->>NotifComp: Paginated channel list
NotifComp-->>AsyncSel: &#123;data: channels, totalCount&#125;
AsyncSel-->>User: Display channels with pagination

Critical path: User->AlertReportModal->NotificationMethod->AsyncSelect->REST API->get_channels_with_search->Slack SDK

Note: The diff refactors Slack channel selection from eager-loaded Select to lazy-loaded AsyncSelect with cursor-based pagination. Frontend now streams channels on-demand from backend, which implements two-mode pagination: fast direct fetch without search, and streaming search through pages until limit reached. This improves performance for large workspaces.

bito-code-review

Code Review Agent Run #7b7415

Actionable Suggestions - 2

superset-frontend/src/features/alerts/components/NotificationMethod.tsx - 1
- Missing method update on SlackV2 error fallback · Line 267-268
tests/unit_tests/utils/slack_test.py - 1
- Broken pagination for large limits in non-search case · Line 420-420

Additional Suggestions - 13

tests/unit_tests/utils/slack_test.py - 5
- Magic number 50 used in comparison · Line 455-455
  
  Magic number `50` used in comparison. This occurs in multiple locations. Consider defining a constant like `MAX_PAGES = 50` for better maintainability.
  Code suggestion
  @@ -435,3 +435,4 @@ def test_streaming_search_max_pages_safety_limit(self, mocker): """Test that streaming search stops after 50 pages to prevent runaway requests.""" + MAX_PAGES = 50
- Unused variable result assigned but never used · Line 610-610
  
  Variable `result` is assigned but never used in `test_cursor_format_with_special_characters`. Consider removing the assignment or using the variable in assertions.
  Code suggestion
  @@ -609,2 +609,1 @@ # Call with special cursor - result = get_channels_with_search(cursor=special_cursor, limit=100) + get_channels_with_search(cursor=special_cursor, limit=100)
- Missing trailing comma in dictionary literal · Line 43-43
  
  Dictionary literal is missing trailing comma after `"is_member": True`. This issue occurs in multiple dictionary literals throughout the file. Consider adding trailing commas for consistency.
  Code suggestion
  @@ -42,2 +42,2 @@ - "is_member": True, - } + "is_member": True, + },
- Docstring missing ending period punctuation · Line 322-322
  
  Docstring `"Test pagination returns single page with cursor"` should end with a period. This issue occurs in multiple docstrings throughout the file.
  Code suggestion
  @@ -322,1 +322,1 @@ - """Test pagination returns single page with cursor""" + """Test pagination returns single page with cursor."""
- Line exceeds maximum length of 88 characters · Line 436-436
  
  Line exceeds maximum length of 88 characters (89 characters). Consider breaking the docstring into multiple lines or shortening the text.
  Code suggestion
  @@ -436,1 +436,2 @@ - """Test that streaming search stops after 50 pages to prevent runaway requests""" + """Test that streaming search stops after 50 pages to prevent + runaway requests."""

superset/utils/slack.py - 5

Exception message formatting issues · Line 188-188

Avoid f-string in exception and long messages. Assign f-string to variable first, then use in exception.

Code suggestion

 @@ -187,1 +187,2 @@
-    except (SlackClientError, SlackApiError) as ex:
-        raise SupersetException(f"Failed to list channels: {ex}") from ex
+    except (SlackClientError, SlackApiError) as ex:
+        error_msg = f"Failed to list channels: {ex}"
+        raise SupersetException(error_msg) from ex

Import Callable from collections.abc instead · Line 20-20

Import `Callable` from `collections.abc` instead of `typing` for better compatibility with Python 3.9+.
Code suggestion
```
 @@ -20,1 +20,1 @@
-from typing import Any, Callable, Optional
+from typing import Any, Optional
+from collections.abc import Callable
```
Use union syntax for cursor parameter · Line 95-95

Use `str | None` syntax for type annotation instead of `Optional[str]`.
Code suggestion
```
 @@ -95,1 +95,1 @@
-    cursor: Optional[str] = None,
+    cursor: str | None = None,
```

Docstring format and imperative mood issues · Line 98-98

Move docstring summary to first line and use imperative mood: start with 'Fetch' instead of 'Fetches'.

Code suggestion

 @@ -97,2 +97,1 @@
-) -> dict[str, Any]:
-    """
-    Fetches Slack channels with pagination and search support.
+) -> dict[str, Any]:
+    """Fetch Slack channels with pagination and search support.

Use elif instead of else if · Line 163-163

Use `elif` instead of `else` followed by `if` to reduce indentation level.

Code suggestion

 @@ -162,4 +162,3 @@
-                        matches.append(channel)
-                else:
-                    if (
-                        search_string.lower() in channel["name"].lower()
+                        matches.append(channel)
+                elif (
+                    search_string.lower() in channel["name"].lower()

tests/integration_tests/reports/api_tests.py - 3

Missing return type annotation for test method · Line 2054-2054

Test method `test_slack_channels_api_without_pagination` is missing return type annotation. Add `-> None` to follow type annotation standards. Multiple similar issues exist in other test methods.
Code suggestion
```
 @@ -2054,1 +2054,1 @@
-    def test_slack_channels_api_without_pagination(self, mock_get_channels):
+    def test_slack_channels_api_without_pagination(self, mock_get_channels) -> None:
```

Docstring formatting issues need correction · Line 2055-2055

Docstring should be a one-line format and end with a period. Consider: `"Test /api/v1/report/slack_channels/ endpoint without pagination."`

Code suggestion

 @@ -2055,3 +2055,1 @@
-        """
-        Test /api/v1/report/slack_channels/ endpoint without pagination
-        """
+        """Test /api/v1/report/slack_channels/ endpoint without pagination."""

Line length exceeds maximum allowed characters · Line 2063-2063

Lines 2063-2064 exceed 88 character limit. Consider breaking long dictionary entries across multiple lines for better readability.

Code suggestion

 @@ -2062,4 +2062,6 @@
-            "result": [
-                {"id": "C001", "name": "general", "is_private": False, "is_member": True},
-                {"id": "C002", "name": "random", "is_private": False, "is_member": True},
-            ],
+            "result": [
+                {"id": "C001", "name": "general", "is_private": False,
+                 "is_member": True},
+                {"id": "C002", "name": "random", "is_private": False,
+                 "is_member": True},
+            ],

Review Details

Files reviewed - 7 · Commit Range: 2738d23..2738d23
- superset-frontend/src/features/alerts/components/NotificationMethod.test.tsx
- superset-frontend/src/features/alerts/components/NotificationMethod.tsx
- superset/reports/api.py
- superset/reports/schemas.py
- superset/utils/slack.py
- tests/integration_tests/reports/api_tests.py
- tests/unit_tests/utils/slack_test.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by

codecov · 2025-10-24T15:27:01Z

Codecov Report

❌ Patch coverage is 20.12579% with 127 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.21%. Comparing base (5320730) to head (4f8a504).
⚠️ Report is 1316 commits behind head on master.

Files with missing lines	Patch %	Lines
superset/utils/slack.py	6.95%	107 Missing ⚠️
superset/tasks/slack.py	23.07%	20 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #35832       +/-   ##
===========================================
+ Coverage        0   68.21%   +68.21%     
===========================================
  Files           0      629      +629     
  Lines           0    46314    +46314     
  Branches        0     5030     +5030     
===========================================
+ Hits            0    31591    +31591     
- Misses          0    13472    +13472     
- Partials        0     1251     +1251

Flag	Coverage Δ
hive	`43.82% <11.94%> (?)`
mysql	`67.32% <20.12%> (?)`
postgres	`67.37% <20.12%> (?)`
presto	`47.43% <11.94%> (?)`
python	`68.17% <20.12%> (?)`
sqlite	`66.99% <20.12%> (?)`
unit	`100.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Vitor-Avila · 2025-11-18T05:20:03Z

hey @gabotorresruiz thanks for working on this! I left a couple of comments.

Currently, this is the flow for an API request to get Slack channels:

First, we'll check if there's any cached results. The cached list is always complete and never filtered. If cached results are available (and force was not passed) we'll filter in memory the cached results (if any search_string was passed) and then return filtered results.
If no cached data available (or force passed), we'll hit conversations.list with limit set to 999 (maximum allowed value).
We'll monitor the presence of a cursor value in the response and paginate on it, extending the full list of channels.
Once all channels are returned, we'll cache these results and filter it out in-memory.

This would cause issues mainly because Slack has rate limit on this endpoint, so for instances with a considerable amount of channels, it could take several API calls, hitting rate limits and eventually timeouts. for smaller instances, this works pretty well and keeps the full list of channels cached.

The benefit of your PR, is that we're not trying to fetch the full list. Instead, we're requesting either filtered/unfiltered values via the API, and would paginate only up until the limit is reached. I think this makes sense: for example, if I have hundreds of Slack channels starting with customer- and I search for this, I would get for example the first 100 values, which might not be all but as I type more in the field (like customer-ac) it would fire new requests and narrow down the results up until the desired channel is reached. The counterpart here is that more API calls will be fired (typing changes to the field would always produce new API calls that are not cached) which could lead to more frequent rate limits (depending on what's available in cache).

With that in mind, I wonder if we might want to explore other alternatives. Do we know if this is still an issue after this PR? #35622

One potential alternative would be to have a toggle in the Alert/Report modal UI to choose between using the Slack APIs to search for the name and get an ID (which is subject to the issues discussed here) or manually enter a channel ID (we could point the UI to docs showing how to get that info from Slack via the UI). It's not as intuitive as search, but I think until Slack supports filtering on the endpoint we might not be able to avoid rate limits to larger enterprises.

gabotorresruiz · 2025-11-19T23:38:38Z

Hey @Vitor-Avila, thanks for the thorough review! Let me address your concerns:

Rate Limits Concern

Your concern about "more API calls from typing" is valid but largely mitigated by the caching architecture:

With cache warm (normal operation):

All user requests hit cache --> 0 API calls
In-memory filtering for searches --> instant results
No rate limit risk during normal usage

API calls only happen:

During initial cache warmup (once, in background)
When user clicks "Refresh channels" button
When cache expires (configurable TTL via SLACK_CACHE_TIMEOUT)

So the typing scenario you described (each keystroke = new API call) doesn't happen when cache is available. The cache stores all channels and filtering happens entirely in-memory

How Search Works

Since Slack's conversations.list API doesn't support server-side filtering, I've implemented search "client-side":

Cache path (normal operation):

All channels are pre-fetched and stored in cache
When a user types some text to search a channel, this gets filtered in-memory against cached
Results returned instantly with no Slack API calls

Search features:

Partial matching: "eng" matches "engineering-team", "frontend-eng", etc.
Comma-separated OR logic: "engineering,marketing" matches channels containing either term
Matches both name and ID: Search works on channel names and Slack channel IDs
Pagination: Large result sets are paginated with synthetic cursors for cache data

Cache Architecture

The implementation uses a background Celery task to warm the cache:

Background warmup: cache_channels task fetches all channels with pagination, respecting rate limits via RateLimitErrorRetryHandler
Complete cache: Stores ALL channels (no arbitrary limit)
Configurable retry: SLACK_API_RATE_LIMIT_RETRY_COUNT controls retry behavior for rate limits

Non-Blocking Cache Warmup

The cache warmup is designed to never block the UI or degrade user experience:

Asynchronous execution:

Cache warmup runs as a Celery background task in a separate worker process
The API endpoint returns immediately after triggering the task
UI shows a brief "refreshing" state but remains fully interactive

User experience flow:

User clicks "Refresh channels" button
API clears cache and triggers cache_channels.delay() (async)
API returns immediately --> UI updates instantly
Celery worker fetches channels in background (can take minutes for large workspaces)
Next user request sees the newly warmed cache

Cold cache behavior:

If cache is empty, the first request fetches one page from API (up to 999 channels)
User sees immediate results while background task continues warming
Subsequent requests benefit from progressively warmer cache

No hanging or timeouts:

Individual API requests have reasonable timeouts
Long-running warmup happens entirely in background worker

How This PR Relates to #35622:

They're complementary, not alternatives:

fix(alerts): improve Slack API rate limiting for large workspaces #35622: Better retry handling WHEN rate limits occur
This PR: Reduce API calls to PREVENT rate limits via caching

#35622 helps us recover from rate limits; this PR helps avoid them entirely

…sues

Vitor-Avila · 2025-11-21T03:56:21Z

+SLACK_CHANNELS_CONTINUATION_CURSOR_KEY = (
+    f"{SLACK_CHANNELS_CACHE_KEY}_continuation_cursor"
+)


Is this still being used? I don't see a set event for this key anymore

Vitor-Avila · 2025-11-21T03:56:38Z

+    next_cursor: Optional[str]
+    if not has_more and continuation_cursor:
+        # Return special cursor that signals transition to API
+        next_cursor = "api:continue"


Same here: is this still being used? I think this was related with the SLACK_CHANNELS_CONTINUATION_CURSOR_KEY usage.

Vitor-Avila · 2025-11-21T04:06:01Z


 logger = logging.getLogger(__name__)

+CACHE_WARMUP_TIME_LIMIT = 300  # 5 minutes


One thing I thought about is if we could have this constant to receive a value from a constant from config.py? This way users could set a custom value in there that would get assigned here (considering that this local constant works).

Do you know if that would work? I'm just asking because I imagine there are some larger organizations that might need more than 5 minutes. Having the ability to customize from there would be better.

Vitor-Avila · 2025-11-21T04:14:31Z

+            if force:
+                cache_manager.cache.delete(SLACK_CHANNELS_CACHE_KEY)
+                cache_manager.cache.delete(
+                    SLACK_CHANNELS_CONTINUATION_CURSOR_KEY
+                )
+                logger.info("Slack channels cache cleared due to force=True")
+
+                # Trigger async cache warmup if caching is enabled
+                if current_app.config.get("SLACK_ENABLE_CACHING", True):
+                    cache_channels.delay()
+                    logger.info("Triggered async cache warmup task")
+
+            channels_data = get_channels_with_search(
                search_string=search_string,
                types=types,
                exact_match=exact_match,
-                force=force,
+                cursor=cursor,
+                limit=limit,
            )


This might be an issue. When force is True, here we're calling cache_channels.delay() to warm up the cache + also calling get_channels_with_search(). This will hit the Slack APIs in parallel (one thread for async cache warm up and another for the frontend search) which will likely cause a lot more rate limits.

Would it make sense to have the force button to show a toast that the channel list is being updated and prevent the user from searching until the cache is warmed up? @eschutho what do you think?

Vitor-Avila · 2025-11-21T04:42:12Z

thanks @gabotorresruiz, left a few more comments with the latest changes 🙏 one important detail is that currently we support caching the list of channels without celery workers. This PR on the other hand limits caching to only instances that have Celery workers configured -- @eschutho do you think this a concern? I would imagine all large organizations that would really need several calls to the Slack APIs would have celery configured, but wondering if we want to keep this functionality.

jkleinkauff · 2026-03-02T20:19:28Z

+1, this is affecting our instance and forcing us to use v1 instead.

pull-request-size Bot added the size/XXL label Oct 24, 2025

github-actions Bot added the api Related to the REST API label Oct 24, 2025

dosubot Bot added the alert-reports Namespace | Anything related to the Alert & Reports feature label Oct 24, 2025

korbit-ai Bot suggested changes Oct 24, 2025

View reviewed changes

Comment thread superset/reports/schemas.py Outdated

Comment thread superset-frontend/src/features/alerts/components/NotificationMethod.tsx Outdated

Comment thread superset/utils/slack.py Outdated

Comment thread superset/utils/slack.py Outdated

bito-code-review Bot reviewed Oct 24, 2025

View reviewed changes

Comment thread superset-frontend/src/features/alerts/components/NotificationMethod.tsx Outdated

Comment thread tests/unit_tests/utils/slack_test.py

gabotorresruiz force-pushed the fix/alerts-reports-unable-to-edit-slack-recipients branch 9 times, most recently from c2e80c3 to 84b3930 Compare October 27, 2025 23:46

sadpandajoe requested review from Vitor-Avila and eschutho October 28, 2025 17:20