Skip to content

feat(starrocks): add catalog support for StarRocks database connections#37026

Merged
nytai merged 1 commit into
masterfrom
feat/starrocks-catalog
Jan 12, 2026
Merged

feat(starrocks): add catalog support for StarRocks database connections#37026
nytai merged 1 commit into
masterfrom
feat/starrocks-catalog

Conversation

@nytai
Copy link
Copy Markdown
Member

@nytai nytai commented Jan 10, 2026

SUMMARY

Adds comprehensive catalog support to the StarRocks database engine specification, enabling users to browse and query data across multiple catalogs (e.g., iceberg, hive, default_catalog).

Key changes:

  • Add catalog support flags (supports_catalog, supports_dynamic_catalog, supports_dynamic_schema)
  • Implement get_catalog_names() to retrieve available catalogs via SHOW CATALOGS
  • Implement get_default_catalog() to identify the default catalog from the URI
  • Update adjust_engine_params() to handle catalog.schema URI format
  • Override get_schema_names() to query schemas using SHOW DATABASES
  • Update URI placeholder to show catalog.db is optional
  • Add comprehensive test coverage (32 passing tests)
  • Refactor types in starrocks.py to use union operator

Connection string format:

  • starrocks://host:port/catalog.schema - connects to specific schema in catalog
  • starrocks://host:port/catalog. - sets catalog context for browsing schemas
  • starrocks://host:port - no catalog or schema specified

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - Backend-only change

TESTING INSTRUCTIONS

  1. Connect to a StarRocks instance with multiple catalogs (e.g., default_catalog, iceberg, hive)
  2. Enable "Allow changing catalogs" in database connection settings
  3. Verify that the catalog dropdown shows all available catalogs
  4. Select different catalogs and verify that the schema dropdown shows schemas from the selected catalog
  5. Create queries using tables from different catalogs
  6. Run the test suite: pytest tests/unit_tests/db_engine_specs/test_starrocks.py -v

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codeant-ai-for-open-source
Copy link
Copy Markdown
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jan 10, 2026

Code Review Agent Run #1da233

Actionable Suggestions - 0
Additional Suggestions - 2
  • superset/db_engine_specs/starrocks.py - 2
    • Schema extraction bug · Line 234-234
      Use `database.split('.', 1)[1]` to extract the schema portion for multi-part names (e.g., 'catalog.schema.sub' → 'schema.sub'), and retain the existing guard (`if not database or '._' not in database: return None`) to avoid errors on single-part database names.
    • Try-except within loop performance issue · Line 274-274
      Move the try-except block outside the loop to improve performance, or restructure the logic to avoid exception handling within the loop.
      Code suggestion
       @@ -263,15 +263,18 @@
                    result = inspector.bind.execute("SHOW CATALOGS")
                    catalogs = set()
       
                    for row in result:
      -                try:
      -                    if hasattr(row, "keys") and "Catalog" in row.keys():
      +                if hasattr(row, "keys"):
      +                    try:
      +                        if "Catalog" in row:
      +                            catalogs.add(row["Catalog"])
      +                    except (KeyError, TypeError):
      +                        pass
      +                elif hasattr(row, "Catalog"):
      +                    try:
                                catalogs.add(row["Catalog"])
      -                    elif hasattr(row, "Catalog"):
      +                    except AttributeError:
      +                        pass
      +                else:
      +                    try:
                                catalogs.add(row.Catalog)
      -                    else:
      -                        catalogs.add(row[0])
      -                except (AttributeError, TypeError, IndexError, KeyError) as ex:
      -                    logger.warning("Unable to extract catalog name from row: %s (%s)", row, ex)
      -                    continue
Review Details
  • Files reviewed - 2 · Commit Range: 74996ff..74996ff
    • superset/db_engine_specs/starrocks.py
    • tests/unit_tests/db_engine_specs/test_starrocks.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codeant-ai-for-open-source
Copy link
Copy Markdown
Contributor

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Inspector usage
    The code calls raw execute on inspector.bind (e.g., inspector.bind.execute("SHOW CATALOGS") / inspector.bind.execute("SHOW DATABASES")). Relying on inspector.bind may be fragile across SQLAlchemy versions and the connection is not acquired/closed explicitly. Consider using an explicit connection (context manager) and SQLAlchemy text() for better resource handling and compatibility.

  • Catalog quoting
    In adjust_engine_params, schema is URL-quoted but catalog isn't. If catalogs can contain characters that need quoting (or could be user-provided), this may lead to invalid URLs or injection-like issues. Ensure catalog is validated/quoted consistently before being embedded into the database field.

  • Schema parsing ambiguity
    get_schema_from_engine_params treats a database value without a dot as "ambiguous" and returns None. This is reasonable for the new "catalog.schema" convention, but callers must be aware that a plain database value will be ignored as schema. Double-check consumers expect this behavior and that get_default_catalog handles those cases correctly.

  • Mock getitem signature
    The tests create MagicMock rows and assign getitem using lambdas that accept two arguments (self, key). When getitem is set on an instance, Python will call it with one argument (key). This mismatch can cause TypeError when test code indexes the mock row. Use dicts, SimpleNamespace, or set the MagicMock's getitem.side_effect with a single-arg callable.

  • Trailing dot expectation
    Several tests expect the database portion of the URL to include a trailing dot (e.g. "db1.", "default_catalog."). Confirm this behaviour is intentional and consistent with production code (DEFAULT_CATALOG, adjust_engine_params). In particular, ensure other codepaths that consume Database.database handle the trailing dot correctly (and that the trailing dot isn't accidentally interpreted as a schema name).

@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 10, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 74996ff
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6961a0e72ac4e50008e93728
😎 Deploy Preview https://deploy-preview-37026--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment on lines +225 to +235
mock_row_1 = mocker.MagicMock()
mock_row_1.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_1.__getitem__ = lambda self, key: "default_catalog" if key == "Catalog" else None

mock_row_2 = mocker.MagicMock()
mock_row_2.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_2.__getitem__ = lambda self, key: "hive" if key == "Catalog" else None

mock_row_3 = mocker.MagicMock()
mock_row_3.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_3.__getitem__ = lambda self, key: "iceberg" if key == "Catalog" else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The mocks for catalog rows override __getitem__ on MagicMock instances, but special methods are looked up on the class, so row["Catalog"] still returns MagicMock objects instead of the expected strings, causing get_catalog_names to return a set of mocks instead of {"default_catalog", "hive", "iceberg"} and making the test assertions fail. [logic error]

Severity Level: Minor ⚠️

Suggested change
mock_row_1 = mocker.MagicMock()
mock_row_1.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_1.__getitem__ = lambda self, key: "default_catalog" if key == "Catalog" else None
mock_row_2 = mocker.MagicMock()
mock_row_2.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_2.__getitem__ = lambda self, key: "hive" if key == "Catalog" else None
mock_row_3 = mocker.MagicMock()
mock_row_3.keys.return_value = ["Catalog", "Type", "Comment"]
mock_row_3.__getitem__ = lambda self, key: "iceberg" if key == "Catalog" else None
mock_row_1 = {"Catalog": "default_catalog", "Type": None, "Comment": None}
mock_row_2 = {"Catalog": "hive", "Type": None, "Comment": None}
mock_row_3 = {"Catalog": "iceberg", "Type": None, "Comment": None}
Why it matters? ⭐

The claim is correct: assigning getitem on MagicMock instances does not affect Python's special-method lookup which resolves special methods on the object's class. As written the test may yield MagicMock values when code does row["Catalog"] (or similar), causing the assertion to fail or be flaky. Replacing the MagicMock rows with plain dicts (or configuring the mock's class-level getitem) produces real strings and makes the test deterministic. This is a real logic fix for the unit test rather than a cosmetic change.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/db_engine_specs/test_starrocks.py
**Line:** 225:235
**Comment:**
	*Logic Error: The mocks for catalog rows override `__getitem__` on `MagicMock` instances, but special methods are looked up on the class, so `row["Catalog"]` still returns MagicMock objects instead of the expected strings, causing `get_catalog_names` to return a set of mocks instead of `{"default_catalog", "hive", "iceberg"}` and making the test assertions fail.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai-for-open-source
Copy link
Copy Markdown
Contributor

CodeAnt AI finished reviewing your PR.

@nytai nytai force-pushed the feat/starrocks-catalog branch from 74996ff to 738d37e Compare January 10, 2026 02:34
@nytai nytai force-pushed the feat/starrocks-catalog branch from 738d37e to 19a83c2 Compare January 10, 2026 02:36
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jan 10, 2026

Code Review Agent Run #a8e334

Actionable Suggestions - 0
Additional Suggestions - 2
  • tests/unit_tests/db_engine_specs/test_starrocks.py - 1
    • Misleading Test Comment · Line 82-82
      The comment incorrectly states that a single value is treated as schema in the default catalog, but the implementation treats it as a catalog. This could mislead developers reading the test.
  • superset/db_engine_specs/starrocks.py - 1
    • Try-except within loop performance issue · Line 271-271
      Move the try-except block outside the loop to avoid performance overhead, or restructure the logic to handle exceptions more efficiently.
      Code suggestion
       @@ -259,20 +259,20 @@
                try:
                    result = inspector.bind.execute("SHOW CATALOGS")
                    catalogs = set()
       
                    for row in result:
      -                try:
      -                    if hasattr(row, "keys") and "Catalog" in row.keys():
      +                if hasattr(row, "keys") and "Catalog" in row:
      +                    try:
                                catalogs.add(row["Catalog"])
      -                    elif hasattr(row, "Catalog"):
      +                    except (KeyError, TypeError) as ex:
      +                        logger.warning(
      +                            "Unable to extract catalog name from row: %s (%s)", row, ex,
      +                        )
      +                elif hasattr(row, "Catalog"):
      +                    try:
                                catalogs.add(row.Catalog)
      -                    else:
      +                    except AttributeError as ex:
      +                        logger.warning(
      +                            "Unable to extract catalog name from row: %s (%s)", row, ex,
      +                        )
      +                else:
      +                    try:
                                catalogs.add(row[0])
      -                except (AttributeError, TypeError, IndexError, KeyError) as ex:
      +                    except (IndexError, TypeError) as ex:
                                logger.warning(
      -                            "Unable to extract catalog name from row: %s (%s)", row, ex
      +                            "Unable to extract catalog name from row: %s (%s)", row, ex,
                                )
                                continue
Review Details
  • Files reviewed - 2 · Commit Range: 19a83c2..19a83c2
    • superset/db_engine_specs/starrocks.py
    • tests/unit_tests/db_engine_specs/test_starrocks.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Copy Markdown
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of great improvements to this spec, thanks for this!

@nytai nytai merged commit 481bfa0 into master Jan 12, 2026
61 checks passed
@nytai nytai deleted the feat/starrocks-catalog branch January 12, 2026 21:02
jesperct pushed a commit to jesperct/superset that referenced this pull request Jan 19, 2026
rusackas pushed a commit that referenced this pull request Apr 17, 2026
- theming.mdx: document ECharts array property overrides (PR #37965) —
  array values like color palettes are fully supported and replaced entirely
  (not merged); adds Array Property Overrides section with color palette example
- configuring-superset.mdx: document PKCE support for database OAuth2
  (PR #37067) — add PKCE section under Custom OAuth2 with code_challenge_method
  config and when to use it
- cache.mdx: document ETag support for thumbnail/screenshot endpoints
  (PR #37663) — conditional GET with If-None-Match to avoid downloading
  unchanged images
- exploring-data.mdx: document SQL Lab UX improvements (PRs #37298, #37694,
  #37756) — treeview table browser, Ctrl+F find widget, resizable panels;
  also adds time range natural language expressions section (PR #37098)
- creating-your-first-dashboard.mdx: document Table chart features — boolean
  and categorical conditional formatting (PRs #36338, #37756), gradient toggle
  (PR #36280), HTML cell rendering with security note (PR #37685), column
  header tooltips from dataset descriptions (PR #37179), Display Controls
  modal in dashboard view (PR #36062)
- databases.json: update StarRocks supports_catalog and supports_dynamic_catalog
  to true — the engine spec (PR #37026) already implemented full catalog support
  with get_catalog_names(), get_default_catalog(), and SHOW CATALOGS; the
  committed JSON was stale and did not reflect this

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants