Skip to content

feat(wren): add YTsaurus (CHYT) connector#2258

Open
nar3k wants to merge 9 commits into
Canner:mainfrom
nar3k:feat/ytsaurus-connector
Open

feat(wren): add YTsaurus (CHYT) connector#2258
nar3k wants to merge 9 commits into
Canner:mainfrom
nar3k:feat/ytsaurus-connector

Conversation

@nar3k
Copy link
Copy Markdown

@nar3k nar3k commented May 12, 2026

Closes #2257.

Summary

Adds a first-class ytsaurus data source that talks to a YTsaurus cluster through its CHYT (ClickHouse-over-YT) clique. CHYT exposes a ClickHouse-compatible HTTP protocol on the YT proxy, so the bulk of Wren's existing ClickHouse / Ibis / sqlglot path is reused — only the auth and routing shim is new.

What's in scope

  • DataSource.ytsaurus enum + factory entry + sqlglot dialect map (ytsaurus -> clickhouse)
  • YTsaurusConnectionInfo Pydantic model (proxy, clique, token w/ YT_TOKEN env fallback, secure, port, query_path, settings, kwargs)
  • wren.connector.ytsaurus.YTsaurusConnector — subclass of the Ibis ClickHouse connector
  • wren-engine[ytsaurus] pip extra (pulls ibis-framework[clickhouse])
  • Optional MDL-level physical-SQL rewrite in engine.py: when data_source == ytsaurus, <schema>.<table> references are replaced with backticked YT paths sourced from each model's properties.ytPath. Path map is cached; gated entirely on data_source == ytsaurus, so other connectors are unaffected
  • Docs: core/wren/docs/connectors/ytsaurus.md, README install row, docs/connections.md JSON example

CHYT-specific details (why this is more than a ClickHouse alias)

  1. clickhouse_connect.get_client() does not accept http_headers as a kwarg → headers are injected post-construction, with a belt-and-braces fallback for clients that re-snapshot state.
  2. CHYT mounts at URL path /query, not / → new query_path field (default /query) maps to clickhouse_connect's proxy_path.
  3. YT routes by chyt.clique_alias=<*alias> URL param, not by database= → the connector patches HttpClient.params at construction so the alias is present before the startup SELECT version() runs.
  4. YT proxy accepts only Authorization: OAuth <token> (rejects Basic / Bearer) → the connector subclasses HttpClient with a one-line override of _init_common_settings that injects the OAuth header before the parent's startup query.
  5. ibis backend.sql() introspects via CREATE VIEW, which CHYT rejects with std::out_of_rangeYTsaurusConnector.query / dry_run bypass ibis and call clickhouse_connect directly (native protocol → pyarrow.Table; EXPLAIN AST for dry-runs).

Test plan

  • wren context build and wren --sql ... against a live YT cluster with a CHYT clique
  • COUNT(*), JOIN-and-aggregate, and dialect-specific functions (toUnixTimestamp, startsWith, INTERVAL arithmetic, COUNT(DISTINCT ...)) all return matching results vs. the same SQL run against the cluster directly
  • connector.dry_run(sql) returns the CHYT EXPLAIN AST without executing
  • cargo / ruff / existing wren test suite untouched — no other connector paths altered
  • Reviewers: please flag any preferred test fixtures for a CHYT-against-mock test; happy to add a clickhouse_connect-mock-based unit test if useful

Out of scope

  • Native YQL (Query Tracker) — CHYT only
  • SPYT (Spark-on-YT)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • YTsaurus (CHYT) added as a supported data source for queries, including connector and engine integration.
    • OAuth token support with optional YT_TOKEN environment fallback.
    • Statement-timeout handling for YTsaurus queries.
    • SQL rewriting to resolve YTsaurus-backed tables to YT paths at runtime.
  • Documentation

    • New installation, connector, and connection-format docs with examples and field references.
  • Chores

    • New optional install extra to ease YTsaurus dependency installation.

Review Change Stack

Adds a new data source that talks to a YTsaurus cluster through its CHYT
(ClickHouse-over-YT) clique. CHYT exposes a ClickHouse HTTP protocol on
the YT proxy, so the connector reuses the existing ClickHouse / Ibis
path with YT-flavored auth and routing.

Wiring:
- DataSource.ytsaurus enum + factory entry
- YTsaurusConnectionInfo (proxy, clique, token, secure, port, query_path,
  settings, kwargs); token falls back to YT_TOKEN env var
- pip extra `wren-engine[ytsaurus]` (pulls ibis-framework[clickhouse])
- sqlglot dialect map: ytsaurus -> clickhouse
- docs: docs/connectors/ytsaurus.md, README install row,
  docs/connections.md JSON example

CHYT integration details (the reason this is not a 5-line ClickHouse
alias):

1. clickhouse_connect.get_client() does not accept http_headers as a
   kwarg. Headers are injected post-construction with a belt-and-braces
   fallback for clients that re-snapshot state.
2. CHYT mounts at URL path /query, not /. New `query_path` field on
   YTsaurusConnectionInfo (default "/query") is mapped to
   clickhouse_connect's proxy_path.
3. YT routes by chyt.clique_alias=<*alias> URL param, not by the
   `database=` field clickhouse_connect would send as a header. The
   connector patches HttpClient.params at construction so the alias is
   present before the startup `SELECT version()` runs.
4. YT proxy accepts only `Authorization: OAuth <token>` (rejects Basic
   and Bearer). The connector subclasses HttpClient with a one-line
   override of _init_common_settings that injects the OAuth header
   before the parent's startup query.
5. ibis backend.sql() introspects via CREATE VIEW, which CHYT rejects
   (std::out_of_range). YTsaurusConnector.query / dry_run bypass ibis
   and call clickhouse_connect directly (native protocol -> pyarrow.Table;
   EXPLAIN AST for dry-runs).

Physical-SQL rewrite (engine.py):
WrenEngine applies a YT-specific post-transpile sqlglot rewrite that
replaces <schema>.<table> references with backticked YT paths sourced
from each model's `properties.ytPath` in the manifest. Only runs when
`data_source == ytsaurus`; path map is cached on first build, so other
connectors are unaffected.

Verified end-to-end against a live YT cluster via a CHYT clique:
COUNT(*), JOIN-and-aggregate, and dialect-specific functions
(toUnixTimestamp, startsWith, INTERVAL arithmetic, COUNT DISTINCT) all
return matching results vs. the same SQL run against the cluster
directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file python Pull requests that update Python code core labels May 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a first-class YTsaurus data source (CHYT): Pydantic connection model, connection factory with token/patching, ClickHouse-compatible connector, engine SQL rewriting for YT paths, registry wiring, optional pyproject extra, and documentation.

Changes

YTsaurus connector integration

Layer / File(s) Summary
Connection model and types
core/wren/src/wren/model/__init__.py
Add YTsaurusConnectionInfo and extend ConnectionInfo union and exported types.
Connection factory and patching
core/wren/src/wren/model/data_source.py
Add DataSource.ytsaurus, token resolution (info.token or YT_TOKEN), statement-timeout handling, reject connectionUrl for ytsaurus, and implement get_ytsaurus_connection() with _YTSAURUS_PATCH_LOCK to temporarily monkey-patch clickhouse_connect HttpClient for OAuth header and chyt.clique_alias.
YTsaurus connector implementation
core/wren/src/wren/connector/ytsaurus.py
Add YTsaurusConnector that runs SQL via CHYT HTTP client (optional validated LIMIT wrapper), converts column_names+result_columnspyarrow.Table, remaps non-timeout backend errors to WrenError(INVALID_SQL), implements dry_run() using EXPLAIN AST, and exposes create_connector().
Registry and dialect mapping
core/wren/src/wren/connector/factory.py, core/wren/src/wren/mdl/cte_rewriter.py
Register DataSource.ytsaurus in the connector registry and map sqlglot dialect ytsaurus -> clickhouse.
Engine SQL physical overrides
core/wren/src/wren/engine.py
Add _apply_physical_overrides() and _yt_path_map() to build a manifest-derived map from schema.table/tableproperties.ytPath, rewrite sqlglot exp.Table nodes to backtick-quoted YT paths for DataSource.ytsaurus, and call this in dry_plan().
Documentation and examples
core/wren/README.md, core/wren/docs/connections.md, core/wren/docs/connectors/ytsaurus.md
Document wren-engine[ytsaurus] install extra; add YTsaurus connection JSON example and link to connectors/ytsaurus.md which details installation, usage, connection fields, token resolution, timeout handling, and limitations.
Optional dependency extra
core/wren/pyproject.toml
Add ytsaurus = ["ibis-framework[clickhouse]"] and include it in the aggregated all extra.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • yichieh-lu
  • cyyeh

Poem

🐰 I hopped through manifests and patch,
Backticked paths snug in my batch,
OAuth snug in a header,
Cliques whisper to CHYT,
Queries bloom, docs light the thatch.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat(wren): add YTsaurus (CHYT) connector' accurately and concisely summarizes the main change: introducing a new YTsaurus data source connector to Wren. The title is specific, clear, and directly reflects the primary objective.
Linked Issues check ✅ Passed All coding requirements from issue #2257 are fully met: DataSource.ytsaurus with factory entry, YTsaurusConnectionInfo model, YTsaurusConnector implementation with OAuth/clique routing, sqlglot dialect mapping, ytsaurus pip extra, optional MDL physical-SQL rewrite, and comprehensive documentation.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #2257 requirements. No out-of-scope modifications detected; even the engine.py physical-SQL rewrite is explicitly listed as an optional objective and represents necessary YTsaurus-specific functionality.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/ytsaurus.py`:
- Around line 60-63: The code assumes result.column_names and
result.result_columns are same length before building a PyArrow table (variables
result, columns, data in the block after self._ch_client.query(wrapped)), which
can silently drop columns due to zip(); add an explicit check comparing
len(columns) and len(data) immediately after those lists are created, and if
they differ either raise a descriptive exception (e.g., ValueError including
both lengths and samples) or log an error and fail fast so callers won't get a
truncated/malformed table; keep the pa.table construction only when lengths
match.
- Around line 53-55: The code interpolates limit directly into the SQL (wrapped,
sql, limit) which allows injection; validate and sanitize limit before using it:
if limit is not None, ensure it's an integer (and not a bool) and optionally
non-negative (e.g., if not isinstance(limit, int) or isinstance(limit, bool) or
limit < 0: raise TypeError/ValueError), then safely build wrapped = f"SELECT *
FROM (\n{sql}\n) LIMIT {limit}"; this prevents non-integer strings like "100;
DROP ..." from being injected.

In `@core/wren/src/wren/engine.py`:
- Around line 172-189: The _rewrite function can accidentally match an
empty-string table name because _yt_path_map generates an empty key when a
model's tableReference.table is missing; update _rewrite (inside the
transformation that handles exp.Table) to early-return the original node if
tbl_name is empty or falsy (i.e., add a guard after computing tbl_name) so it
does not lookup path_map with an empty key; reference exp.Table, the local
variables db/name/tbl_name, and _yt_path_map to locate related logic and ensure
no empty-key matches occur.
- Around line 209-221: The manifest parsing can still produce mappings for
models with no valid table name; after computing table = (tr.get("table") or
m.get("name") or "").strip() in the loop over manifest.get("models", []), add an
explicit guard (if not table: continue) so models without a non-empty table/name
are skipped entirely and neither the schema.table nor the table key is written
into out; update the loop that builds out and the use of self._yt_path_map_cache
accordingly.

In `@core/wren/src/wren/model/data_source.py`:
- Around line 372-423: The code mutates global class state
(_BaseHttpClient.params, _ch_http.HttpClient, _ch_driver.HttpClient, and
_CHYTHttpClient._wren_yt_token) around the ibis.clickhouse.connect() call, which
can leak settings/tokens under concurrency; fix by introducing a module-level
lock (e.g., a threading.Lock named something like _ytsaurus_client_lock) and
wrap the whole patching/connection/finally block with that lock (using a context
manager) so only one thread can perform the patch, connect, and restore sequence
at a time; ensure the lock is declared at module scope and used to guard the
region that sets _CHYTHttpClient._wren_yt_token, mutates _BaseHttpClient.params
and replaces _ch_http.HttpClient/_ch_driver.HttpClient, and that the finally
block still restores originals while inside the locked section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4b259c5a-4ba6-464f-8296-069715ef549e

📥 Commits

Reviewing files that changed from the base of the PR and between ba187b7 and ed5270b.

📒 Files selected for processing (10)
  • core/wren/README.md
  • core/wren/docs/connections.md
  • core/wren/docs/connectors/ytsaurus.md
  • core/wren/pyproject.toml
  • core/wren/src/wren/connector/factory.py
  • core/wren/src/wren/connector/ytsaurus.py
  • core/wren/src/wren/engine.py
  • core/wren/src/wren/mdl/cte_rewriter.py
  • core/wren/src/wren/model/__init__.py
  • core/wren/src/wren/model/data_source.py

Comment thread core/wren/src/wren/connector/ytsaurus.py
Comment thread core/wren/src/wren/connector/ytsaurus.py
Comment thread core/wren/src/wren/engine.py
Comment thread core/wren/src/wren/engine.py
Comment thread core/wren/src/wren/model/data_source.py
- ytsaurus.py: validate `limit` is a non-negative int before f-string
  interpolation (defense in depth against SQL injection if a caller
  bypasses the type hint).
- ytsaurus.py: fail fast with WrenError when CHYT returns mismatched
  column metadata (column_names vs result_columns length), instead of
  silently truncating via zip().
- engine.py: in the YT path-rewrite, early-return when the parsed
  table name is empty so we never look up an empty key in the path
  map.
- engine.py: skip manifest models whose resolved `table` is empty,
  so the path map can't grow an empty-string entry.
- data_source.py: introduce a module-level threading.Lock and hold it
  around the HttpClient monkey-patching block in
  `get_ytsaurus_connection`. Concurrent connections were racing on
  `HttpClient.params` and the per-class `_wren_yt_token`, which could
  leak one caller's token into another caller's connection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nar3k
Copy link
Copy Markdown
Author

nar3k commented May 12, 2026

Addressed all five CodeRabbit findings in b234b45:

# File Issue Fix
1 connector/ytsaurus.py:53-55 limit interpolated into f-string without runtime type check Reject anything that isn't a non-negative int (also rejects bool) before building the wrapper SQL
2 connector/ytsaurus.py:60-63 zip() could silently truncate if column_names / result_columns disagree Raise WrenError(INVALID_SQL) when lengths differ
3 engine.py:172-189 _rewrite could match an empty-key path map entry Early-return the node when tbl_name is empty
4 engine.py:209-221 Path map could grow an empty-string key when a model has no resolvable table continue on empty table; also drop the redundant if table: guard around setdefault
5 model/data_source.py:372-423 Class-level monkey-patching of HttpClient.params and _wren_yt_token raced under concurrency — could leak one caller's token into another's connection Added module-level _YTSAURUS_PATCH_LOCK (threading.Lock) and wrapped the patch/connect/finally block in with _YTSAURUS_PATCH_LOCK:

The lock is at module scope rather than on DataSourceExtension because the latter is an Enum and a plain threading.Lock() attribute would be treated as an enum member.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/wren/src/wren/model/data_source.py (1)

109-114: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or normalize connectionUrl for YTsaurus.

Adding DataSource.ytsaurus to the structured-connection flow here is still incompatible with the existing connectionUrl fast path. If a caller sends connectionUrl, _build_connection_info() returns a generic ConnectionUrl, so this branch can blow up on info.settings, and the later hasattr(info, "connection_url") path will bypass get_ytsaurus_connection() entirely. That skips the CHYT OAuth/clique setup and makes the new datasource behave incorrectly for a valid-looking input shape.

Either explicitly reject connectionUrl for YTsaurus up front, or parse it into a real YTsaurusConnectionInfo before this branch runs.

Possible minimal fix
 def _build_connection_info(self, data: dict) -> ConnectionInfo:
     if "connectionUrl" in data or "connection_url" in data:
+        if self == DataSource.ytsaurus:
+            raise WrenError(
+                ErrorCode.INVALID_CONNECTION_INFO,
+                "YTsaurus does not support connectionUrl; use proxy/clique/token fields.",
+            )
         if self == DataSource.clickhouse:
             return self._handle_clickhouse_url(
                 urllib.parse.urlparse(
                     data.get("connectionUrl", data.get("connection_url"))
                 )
             )
         return ConnectionUrl.model_validate(data)

Also applies to: 189-190

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/model/data_source.py` around lines 109 - 114, The YTsaurus
branch in _build_connection_info is incompatible with callers that pass a
generic ConnectionUrl: before touching info.settings or branching on
DataSource.ytsaurus, detect if info is a ConnectionUrl and either raise/reject
connectionUrl for YTsaurus or convert/normalize it into a YTsaurusConnectionInfo
(populate required fields including settings and OAuth/clique fields) so
subsequent code (including get_ytsaurus_connection) runs correctly; also guard
any access to info.settings (create if None) and apply the same
normalization/guard to the analogous block around the later DataSource.ytsaurus
handling (the code referenced at 189-190).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@core/wren/src/wren/model/data_source.py`:
- Around line 109-114: The YTsaurus branch in _build_connection_info is
incompatible with callers that pass a generic ConnectionUrl: before touching
info.settings or branching on DataSource.ytsaurus, detect if info is a
ConnectionUrl and either raise/reject connectionUrl for YTsaurus or
convert/normalize it into a YTsaurusConnectionInfo (populate required fields
including settings and OAuth/clique fields) so subsequent code (including
get_ytsaurus_connection) runs correctly; also guard any access to info.settings
(create if None) and apply the same normalization/guard to the analogous block
around the later DataSource.ytsaurus handling (the code referenced at 189-190).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9132143d-dad2-4006-ac60-3525c6d359e7

📥 Commits

Reviewing files that changed from the base of the PR and between ed5270b and b234b45.

📒 Files selected for processing (3)
  • core/wren/src/wren/connector/ytsaurus.py
  • core/wren/src/wren/engine.py
  • core/wren/src/wren/model/data_source.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/wren/src/wren/engine.py
  • core/wren/src/wren/connector/ytsaurus.py

Adding YTsaurus to the connectionUrl fast path returned a generic
ConnectionUrl that lacks .settings (used in the clickhouse/ytsaurus
session-timeout branch) and bypassed get_ytsaurus_connection() via
the hasattr(info, "connection_url") path, skipping CHYT OAuth/clique
setup. Reject connectionUrl upfront with INVALID_CONNECTION_INFO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/model/data_source.py`:
- Around line 409-437: get_clickhouse_connection currently calls
ibis.clickhouse.connect without holding _YTSAURUS_PATCH_LOCK, allowing a race
that can leak _CHYTHttpClient._wren_yt_token and modified _BaseHttpClient.params
into plain ClickHouse clients; to fix, ensure the ibis.clickhouse.connect
invocation in get_clickhouse_connection is executed while holding
_YTSAURUS_PATCH_LOCK (same lock used by get_ytsaurus_connection) or refactor
both into a shared helper that acquires _YTSAURUS_PATCH_LOCK before temporarily
patching _BaseHttpClient and swapping _ch_http.HttpClient/_ch_driver.HttpClient
to _CHYTHttpClient and restores them in a finally block, making sure
_CHYTHttpClient._wren_yt_token and original_class_params handling mirrors the
existing pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 425b1d47-a9bc-47ce-9d8e-9fe41b0d978d

📥 Commits

Reviewing files that changed from the base of the PR and between b234b45 and 494a0ed.

📒 Files selected for processing (1)
  • core/wren/src/wren/model/data_source.py

Comment thread core/wren/src/wren/model/data_source.py
Narek Tatevosyan and others added 3 commits May 12, 2026 14:17
get_ytsaurus_connection temporarily monkey-patches clickhouse_connect's
HttpClient class and `params` dict to inject CHYT auth. Without holding
the same lock in get_clickhouse_connection, a concurrent plain ClickHouse
connect could snapshot the patched state and leak the YT OAuth token and
chyt.clique_alias into a regular ClickHouse client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit docstring-coverage check flagged the new YTsaurus code as
under the 80% threshold. Add one-line docstrings to YTsaurusConnector
methods, the create_connector factory, get_ytsaurus_connection (and the
serialization comment for get_clickhouse_connection), the inner
_CHYTHttpClient hook, and the inner _rewrite transformer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit docstring coverage was still under the 80% threshold because
the modified DataSource / DataSourceExtension classes and their public
dispatch methods (get_connection, get_connection_info,
_build_connection_info) lacked docstrings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nar3k
Copy link
Copy Markdown
Author

nar3k commented May 12, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/engine.py`:
- Around line 223-224: The unqualified-table mapping (the out dict entry created
via out.setdefault(table, yt_path)) can silently collide; change the
construction of the path_map/out so you detect when table is already present
with a different yt_path and handle it explicitly: either log a warning (use the
module logger) and keep both qualified mappings while removing or avoiding the
unqualified key, or omit creating the unqualified mapping when a collision is
detected; update the code that currently does out[f"{schema}.{table}"] = yt_path
and out.setdefault(table, yt_path) to check existing out.get(table) and compare
values before setting, and ensure this behavior is documented where path_map is
used by the rewrite logic that calls path_map.get(f"{db_name}.{tbl_name}") or
path_map.get(tbl_name).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8a0d1862-4543-4605-858c-a5e5495c0ce5

📥 Commits

Reviewing files that changed from the base of the PR and between e9ce428 and 7a9113e.

📒 Files selected for processing (3)
  • core/wren/src/wren/connector/ytsaurus.py
  • core/wren/src/wren/engine.py
  • core/wren/src/wren/model/data_source.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/wren/src/wren/connector/ytsaurus.py
  • core/wren/src/wren/model/data_source.py

Comment thread core/wren/src/wren/engine.py Outdated
When two models in different schemas share the same table name,
out.setdefault(table, ...) silently kept the first-seen yt_path. A
query referencing the bare table name could then rewrite to the wrong
YT path. Detect the conflict, drop the bare-name mapping, and log a
warning telling the user to qualify with a schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nar3k
Copy link
Copy Markdown
Author

nar3k commented May 12, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/engine.py`:
- Around line 117-118: dry_plan currently returns SQL with synthetic
schema.table identifiers that differ from what YTsaurus actually executes
because _apply_physical_overrides is only applied later in query() and
dry_run(); update dry_plan (or the code path used by DataSource.ytsaurus) to
call _apply_physical_overrides(dialect_sql) before returning so the public
dry_plan API returns the same physically-overridden SQL as executed by
query()/dry_run(). Ensure you modify the dry_plan implementation (and any helper
used by DataSource.ytsaurus) so the returned dialect_sql has had
_apply_physical_overrides applied, keeping behavior consistent with query() and
dry_run().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fdb5f585-0325-47c3-b780-99144b24cb9d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a9113e and 58f5e58.

📒 Files selected for processing (1)
  • core/wren/src/wren/engine.py

Comment thread core/wren/src/wren/engine.py Outdated
dry_plan() returned the raw dialect SQL without the YT path substitution
that query() and dry_run() applied later. Callers that used dry_plan as
the public planning API for YTsaurus saw synthetic schema.table
identifiers that don't match what CHYT actually executes. Move the
override into dry_plan so all three APIs agree, and drop the now-
redundant call sites in query()/dry_run().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nar3k
Copy link
Copy Markdown
Author

nar3k commented May 12, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@goldmedal
Copy link
Copy Markdown
Collaborator

Thanks @nar3k, there are some CI failures. Could you fix them before I review this PR?

Comment on lines +354 to +362
return ibis.clickhouse.connect(
host=info.host,
port=int(info.port),
database=info.database,
user=info.user,
password=(info.password and info.password.get_secret_value()),
settings=info.settings if info.settings else {},
**info.kwargs if info.kwargs else {},
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familer with ytsaurus. Is it fully compatible with ClickHouse? Because we're planning to remove the ibis-framework dependency and use the native clickhouse_connect driver instead.
I think you can connect with ytsaurus through clickhouse_connect directly in this PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, i will check it out and get back!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to #2275
You can also wait for me to finish #2275, then follow the way to connect with ytsaurus.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, #2313 is merged, which includes the change for the clickhouse (#2275). I think you can check how we are using clickhouse_connect to connect with Clickhouse.

CI surfaced two failures on the YTsaurus PR:

- ui tests: test_all_datasources_covered failed because the new
  DataSource.ytsaurus enum value was missing from
  field_registry.DATASOURCE_MODELS. Added the entry plus UI label
  overrides for proxy/clique/token/query_path.

- lint: ruff format wanted ytsaurus.py reflowed (one ValueError onto a
  single line) and ruff check (PLC0415) flagged the deferred
  clickhouse_connect imports inside get_ytsaurus_connection. The
  imports are intentionally deferred because clickhouse_connect is an
  optional extra, so noqa them and document the reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goldmedal
Copy link
Copy Markdown
Collaborator

@nar3k, could you add some tests or document how to test YTSaurus? It will be helpful to protect the follow-up change so that it won't break the YTsaurus connector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(wren): add YTsaurus (CHYT) connector

2 participants