aitools: parallelize discover-schema across tables and probes#5097
Merged
simonfaltum merged 14 commits intomainfrom Apr 28, 2026
Merged
aitools: parallelize discover-schema across tables and probes#5097simonfaltum merged 14 commits intomainfrom
simonfaltum merged 14 commits intomainfrom
Conversation
This was referenced Apr 27, 2026
5be29eb to
ff58192
Compare
ff18743 to
b5fc38d
Compare
Contributor
|
Column names containing backticks break the null-counts SQL
🔍 Reviewed by nitpicker |
arsenyinfo
approved these changes
Apr 28, 2026
Contributor
arsenyinfo
left a comment
There was a problem hiding this comment.
Approved, but I think this change should be also propagated to https://github.com/databricks/databricks-agent-skills/tree/main/skills/databricks-core
ff58192 to
a34f39e
Compare
b5fc38d to
0d9fecf
Compare
Refactor `executeAndPoll` in `experimental/aitools/cmd/query.go` to extract a pure `pollStatement(ctx, api, resp)` helper. The helper polls until the statement reaches a terminal state and returns the response without any signal handling, spinner, or server-side cancellation; those concerns stay in `executeAndPoll` where they belong. Also pin `OnWaitTimeout: CONTINUE` explicitly on the `ExecuteStatement` call. The SDK default happens to be CONTINUE today, but relying on it is a hidden coupling: a server-side default flip would silently break the poll loop by killing the statement before our first GET. Behavior is unchanged for the existing `query` command. Follow-up PRs (parallel batch queries, statement lifecycle command tree) will reuse the helper. Co-authored-by: Isaac
Allow `databricks experimental aitools tools query` to accept several SQLs
in a single invocation and run them in parallel against the warehouse.
Pass multiple positional arguments and/or repeat `--file` to fan out:
databricks experimental aitools tools query \
--warehouse <wh> --output json \
"SELECT count(*) FROM t" \
"SELECT min(ts), max(ts) FROM t" \
"SELECT col, count(*) FROM t GROUP BY 1"
Multi-query output is always a JSON array of one object per input,
preserving input order. The shape is `{sql, statement_id, state,
elapsed_ms, columns, rows, error}`. Individual statement failures don't
abort siblings; each is encoded in the per-result `error` field, and the
exit code is non-zero when any statement failed.
A new `--concurrency` flag (default 8) caps in-flight statements. On
Ctrl+C the still-running statements are cancelled server-side via
CancelExecution before exit.
Single-query behavior is unchanged. The previous restriction that
forbade mixing `--file` and a positional SQL is lifted, since both now
contribute to the batch.
Co-authored-by: Isaac
Address two findings from a cursor PR review: 1. --concurrency was passed straight into errgroup.SetLimit. A value of 0 deadlocks (errgroup refuses to add goroutines), and a negative value silently removes the cap. Add a PreRunE check that rejects anything <= 0 with errInvalidBatchConcurrency, matching the shape used by cmd/fs/cp.go for the same flag. 2. The Long help previously said multi-query results come back "in input order", which was ambiguous when --file and positional SQLs are mixed. The actual behavior (already covered by TestResolveSQLsMixedFileAndPositional) is: --file inputs first in flag order, then positional SQLs in arg order. Tighten the help text to state that contract precisely. Adds two unit tests that verify --concurrency 0 and -1 are rejected before any API call. Co-authored-by: Isaac
… cases Two pairs of cobra-level tests were each testing one rejection code path with two flag values. Fold them into table-driven subtests so the shared assertion lives in one place: - TestQueryCommandBatchTextOutputRejected + ...CsvOutputRejected → TestQueryCommandBatchOutputRejection (text, csv subtests) - TestQueryCommandConcurrencyZeroRejected + ...NegativeRejected → TestQueryCommandConcurrencyRejection (0, -1 subtests) Same coverage, half the test functions. Co-authored-by: Isaac
Address Arseni's P2 finding on the batch PR. cancelInFlight (batch.go) and cancelStatement (query.go) used to derive the cancel-RPC ctx via context.WithTimeout(ctx, cancelTimeout). On the actual hot path (Ctrl+C or parent ctx cancelled), the inbound ctx is already cancelled by the time we reach the cancel sweep. The SDK then short-circuits on ctx.Err() and the cancel RPC never reaches the warehouse, leaving in-flight statements running server-side. Wrap with context.WithoutCancel(ctx) (Go 1.21+) so the timeout context keeps the caller's values but drops the cancellation signal. The cancel RPC now actually fires. Also tighten the existing tests: - TestExecuteBatchContextCancellationCancelsInFlight - TestExecuteAndPollCancelledContextCallsCancelExecution Both previously matched mock.Anything for the ctx argument, so they passed regardless of whether the bug was present. They now use mock.MatchedBy(c.Err() == nil) to assert the cancel-RPC ctx is alive. This is a regression guard; reverting the production fix makes the tests fail with "unexpected call" because the matcher no longer matches. Co-authored-by: Isaac
Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac
Address a cursor PR review finding: 'statement get' and 'statement status' previously only set info.Error when pollResp.Status.Error was non-nil. The Statements API can return a non-success terminal state (FAILED, CANCELED, CLOSED) with no Error payload, so the JSON contract "emits rows on success or an error object on failure" wasn't actually guaranteed. Skill consumers couldn't branch on `error == null` alone: they had to also inspect `state`. Especially bad for 'get', which exits non-zero on non-success terminal states without giving the caller structured failure detail. Add a shared helper, statementErrorFromStatus, that returns a batchResultError for any terminal non-success state, populated from the SDK's ServiceError when present and synthesizing "statement reached terminal state X" when the backend doesn't supply one. Mirrors the pattern already used by runOneBatchQuery in batch.go, so the contract is uniform across batch and single-statement paths. Both 'get' and 'status' now use the helper. PENDING and RUNNING still emit no error (legitimately mid-flight). New tests: - table-driven coverage of statementErrorFromStatus across nil, succeeded, running, pending, failed-with-error, failed-no-error, canceled-no-error, closed-no-error - getStatementResult with CLOSED state and no Error - getStatementResult with FAILED state and no Error - getStatementStatus with FAILED state and no Error - getStatementStatus with RUNNING state confirms no error is set Co-authored-by: Isaac
…tests Self-review pass on the test suite found ~8 functions worth trimming without losing coverage: Drop (cobra built-ins, not our contract): - TestStatementSubmitArgsBound: tests cobra's MaximumNArgs(1) - TestStatementGetRequiresStatementID: tests cobra's ExactArgs(1) - TestStatementCancelRequiresStatementID: tests cobra's ExactArgs(1) Drop (already covered by TestStatementErrorFromStatus, the table-driven helper test added with the cursor-fix commit): - TestGetStatementResultClosedTerminalSynthesizesError - TestGetStatementResultFailedWithoutBackendErrorSynthesizesError - TestGetStatementStatusFailedWithoutBackendErrorSynthesizesError - TestGetStatementStatusRunningHasNoError Fold: - TestRenderStatementInfo + TestRenderStatementInfoOmitsEmptyFields → one table-driven TestRenderStatementInfo with the full and minimal cases as subtests. Kept the validation we actually wrote (TestStatementSubmitRejectsMultipleSQLs) and the wiring tests that pin distinct contracts (TestGetStatementResultPolls, TestGetStatementResultFailedStateReportsError, TestGetStatementResultDoesNotCancelServerSideOnContextCancel, TestGetStatementStatusSinglePoll, TestGetStatementStatusReportsError, the cancel pair, and submit pair). Co-authored-by: Isaac
…put before auth Address two findings from Arseni's review. P2 (statement_get.go): getStatementResult used to return (info, err) when fetchAllRows failed after a SUCCEEDED state. RunE then discarded the populated info and surfaced only the raw Go error, so the user got an unstructured "fetch result chunk N: ..." string with no statement_id and no machine-readable error field. That contradicts the contract in the failed-terminal path two cases above, which renders JSON and returns root.ErrAlreadyPrinted. Now: on chunk-fetch failure, populate info.Error with the chunk-fetch message and return (info, nil). RunE renders the partial info as JSON and signals exit-non-zero based on info.Error != nil. The caller still gets statement_id and columns; the error field carries the failure detail. New test TestGetStatementResultChunkFetchFailureRendersPartialInfo locks this in. P3 (statement_submit.go): The PR description claims submit validates input before accessing WorkspaceClient. The code didn't actually deliver that: PreRunE was root.MustWorkspaceClient (auth/profile setup), then RunE did the resolveSQLs / "exactly one" checks. So a malformed invocation hit auth errors before ever surfacing the input error. Move resolveSQLs and the length check into a custom PreRunE that runs before root.MustWorkspaceClient, mirroring the pattern in query.go:113-118. The result is stashed in a closure variable (sqlStatement) for RunE to consume. Existing test TestStatementSubmitRejectsMultipleSQLs is renamed to ...BeforeWorkspaceClient and no longer needs to stub out PreRunE: the new ordering means a bad invocation gets the validation error without ever attempting workspace-client setup. Co-authored-by: Isaac
dba7285 to
9b52b65
Compare
discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac
…l on Ctrl+C
Address two findings from a cursor PR review.
1. --concurrency previously capped table-level fan-out via
errgroup.SetLimit, but each table issued up to two probes after
DESCRIBE, so peak warehouse load was 2*concurrency rather than the
advertised "max in-flight statements." A user setting
--concurrency 1 to stay under a warehouse cap still saw two
statements concurrently. Replace the table-level limit with a
shared sqlGate (chan struct{} of capacity N + statement_id
tracking) that wraps every executeSQL call. Now --concurrency
really means "max statements in flight at any moment, across all
tables and probes." Update the help text to match.
2. After switching from ExecuteAndWait to ExecuteStatement +
pollStatement (PR4 first commit), Ctrl+C left up to 2*concurrency
statements running server-side because nothing called
CancelExecution. Add the same cancellation discipline used in
batch.go: signal handler cancels a derived pollCtx, gate records
each statement_id post-submission, and on cancellation we sweep
the recorded IDs via CancelExecution before returning
root.ErrAlreadyPrinted.
Also addressed:
- Move table-name validation into PreRunE so a malformed identifier
is rejected before MustWorkspaceClient runs (real CLI-lifecycle
improvement, not just a test trick).
- Replace the timing-based parallelism test with a deterministic
barrier (atomic counter + sync.OnceFunc + channel close): both
probes must arrive before either is allowed to leave; if they ran
sequentially the first probe times out and surfaces an error.
Tests reorganized:
- sqlGate.run: pins OnWaitTimeout, propagates FAILED, wraps transport
errors, records ids, respects cancelled context
- cancelDiscoverInFlight: per-id calls, empty list is a no-op
- discoverTable: deterministic concurrent-probes assertion;
per-probe failure does not abort siblings
- cobra-level: invalid table name and injection attempts rejected
before any workspace client setup
Co-authored-by: Isaac
…tion Self-review pass on the test suite found ~3 functions worth trimming: Drop: - TestDiscoverSchemaInjectionAttemptRejected: TestQuoteTableName already has an "injection in catalog" case; the cobra-level wiring is already tested by TestDiscoverSchemaInvalidTableNameRejectedBeforeWorkspaceClient with a different bad input. - TestCancelDiscoverInFlightHandlesEmptyList: just verifies "no API calls when list is empty"; the mock would fail loudly on any unexpected call, making this a tautology. Fold: - TestDiscoverSchemaConcurrencyZeroRejected + ...NegativeRejected → TestDiscoverSchemaConcurrencyRejection (0, -1 subtests). Co-authored-by: Isaac
sqlGate.run used to enter a select with two ready cases when the caller
passed an already-cancelled context: the gate had free slots, so
`g.sem <- struct{}{}` was ready, and `<-ctx.Done()` was also ready. Go
picks pseudo-randomly between simultaneously-ready cases, so on roughly
half of those calls we proceeded to submit a statement under a
cancelled context.
Added an early `ctx.Err()` check before the select. The flaky test
TestSQLGateRunRespectsCancelledContext is deterministic now (verified
with -count=20).
Surfaced by rebasing PR 4 on top of the trimmed PR 3, which changed
test execution conditions enough to flip the coin.
Co-authored-by: Isaac
Address Arseni's P3 finding on the discover-schema PR. parseDescribeResult returned column names verbatim and discoverTable interpolated them into the null-counts SQL inside backtick-quoted identifier positions, e.g. SUM(CASE WHEN `<col>` IS NULL THEN 1 ELSE 0 END) AS `<col>_nulls` Databricks/Delta DDL allows column names containing backticks via doubled-backtick escaping (`weird``col`). Without escaping in the SQL we generate, an embedded backtick in the column name terminates the quoted identifier mid-string and produces a PARSE_SYNTAX_ERROR. Sample-data uses SELECT * so it succeeds, and the user sees only a confusing "NULL COUNTS: Error - ..." line that's easy to misattribute to the warehouse. Escape via strings.ReplaceAll(col, "`", "``") in both the identifier and the alias positions before interpolation. New test TestDiscoverTableEscapesBackticksInColumnNames pins the doubled form in both spots and asserts the no-error code path. Co-authored-by: Isaac
b2396af to
82d2052
Compare
3 tasks
mkazia
pushed a commit
to mkazia/cli
that referenced
this pull request
Apr 30, 2026
…ks#5092) ## Stack This PR is part of a 4-PR stack making `aitools` data exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order. 1. **databricks#5092 — aitools: extract pollStatement helper and pin OnWaitTimeout** *(base: `main`)* — **this PR** 2. databricks#5093 — aitools: run multiple SQL queries in parallel from one query invocation *(base: databricks#5092)* 3. databricks#5095 — aitools: add 'tools statement' lifecycle commands *(base: databricks#5093)* 4. databricks#5097 — aitools: parallelize discover-schema across tables and probes *(base: databricks#5095)* Use `git diff <base>...HEAD` or set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff against `main` includes ancestor PRs. --- ## Why The query command in `experimental/aitools/cmd/query.go` works today, but two things make it fragile and hard to reuse: 1. The polling loop, signal handling, spinner, and server-side cancellation are entangled in one ~100-line function. Upcoming features (parallel batch queries, a statement lifecycle command tree) need pure polling without the signal-handler side effects, so the helper has to come out cleanly. 2. The `ExecuteStatement` request sets `WaitTimeout: 0s` but does not set `OnWaitTimeout`. That relies on the SDK's default being `CONTINUE`. It is today, but a flip would silently break the command: the statement would be cancelled before our first GET and we'd never see the result. This PR is a pure refactor + one explicit-default fix. No user-visible behavior change. ## Changes - Extract `pollStatement(ctx, api, resp)` from `executeAndPoll`. The helper polls until the statement reaches a terminal state and returns the response. It does not call `CancelExecution` on context cancellation, that's the caller's job (and a deliberate design choice for the upcoming `statement get` command, where Ctrl+C should stop polling without killing the server-side statement). - Pin `OnWaitTimeout: CONTINUE` explicitly on the `ExecuteStatement` call. - Update `executeAndPoll` to delegate to `pollStatement` and keep the existing signal-handling, spinner, and server-side cancel-on-Ctrl+C semantics intact. - Add five unit tests covering the new helper: - Immediate terminal short-circuit (no Get calls) - Failed terminal returned without error (caller decides) - Eventual success across multiple polls - Context cancellation returns ctx error and does NOT call CancelExecution - GetStatement transport error is wrapped and propagated - Update the existing `TestExecuteAndPollImmediateSuccess` matcher to assert `OnWaitTimeout == CONTINUE` so a future SDK default flip cannot regress us. ## Test plan - [x] `go test ./experimental/aitools/...` passes (10 polling-related cases including the 5 new ones). - [x] `make checks` clean (tidy, whitespace, dead code). - [x] `make fmt` no drift. - [x] `make lint` 0 issues. - [x] Existing `executeAndPoll` tests (immediate success, immediate failure, polling, fail-during-poll, ctx-cancellation-calls-cancel-execution) all still pass without modification beyond the matcher tweak.
mkazia
pushed a commit
to mkazia/cli
that referenced
this pull request
Apr 30, 2026
…tabricks#5093) ## Stack This PR is part of a 4-PR stack making `aitools` data exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order. 1. databricks#5092 — aitools: extract pollStatement helper and pin OnWaitTimeout *(base: `main`)* 2. **databricks#5093 — aitools: run multiple SQL queries in parallel from one query invocation** *(base: databricks#5092)* — **this PR** 3. databricks#5095 — aitools: add 'tools statement' lifecycle commands *(base: databricks#5093)* 4. databricks#5097 — aitools: parallelize discover-schema across tables and probes *(base: databricks#5095)* Use `git diff <base>...HEAD` or set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff against `main` includes ancestor PRs. --- ## Why Today `databricks experimental aitools tools query` runs one SQL at a time. ai-dev-kit's data-exploration phase fires 5-10 probes per dashboard (cardinality, top values, distributions, trend viability) and they all run in series because each is a separate CLI invocation that blocks. End-to-end exploration takes about a minute when it could take seconds. Quentin already wired up a bash workaround that fans out via the raw `/api/2.0/sql/statements` endpoint with `wait_timeout=0s` and harvests results separately. This PR exposes that pattern natively so the skill can drop the hack and other CLI users get the same speed-up. ## Changes **Before:** `query` accepted at most one positional SQL or a single `--file`. Mixing the two errored. JSON output was an array of row objects. **Now:** `query` accepts any number of positional SQLs and/or repeated `--file` paths. With one input, behavior is unchanged (back-compat). With two or more, the queries run in parallel against the warehouse and the result is a JSON array of one object per input in input order: ```json [ { "sql": "SELECT count(*) FROM t", "statement_id": "01ef...", "state": "SUCCEEDED", "elapsed_ms": 412, "columns": ["count"], "rows": [["12345"]] }, { "sql": "SELECT bad_syntax", "statement_id": "01ef...", "state": "FAILED", "elapsed_ms": 87, "error": { "message": "near 'bad_syntax': syntax error", "error_code": "SYNTAX_ERROR" } } ] ``` Implementation: - New `experimental/aitools/cmd/batch.go` with `executeBatch` (errgroup with bounded parallelism) and `runOneBatchQuery`. Each goroutine submits with `OnWaitTimeout: CONTINUE`, polls via the helper from databricks#5092, and encodes its outcome into a `batchResult` struct. Failures don't abort siblings. - New `--concurrency` flag (default 8). Same value used by `cmd/fs/cp.go` for similar fan-out. Validated `> 0` in `PreRunE` (a 0 value would deadlock `errgroup.SetLimit`). - `--file` is now a repeatable string slice. Previous `--file` + positional conflict error is removed; both compose. - `resolveSQL` is replaced by `resolveSQLs` returning `[]string`. Result order is `--file` inputs first (in flag order), then positional SQLs (in arg order). - Multi-query output is JSON-only. `--output text` and `--output csv` are rejected with an actionable error before any API call. - On Ctrl+C, in-flight statements are cancelled server-side via `CancelExecution` after `g.Wait()` returns. Statements that finished normally before the cancel are left alone. - Exit code is non-zero (`root.ErrAlreadyPrinted`) when any statement failed; the JSON already contains the error detail, no extra stderr noise. ## Test plan - [x] `go test ./experimental/aitools/...` passes. - [x] `make checks` clean. - [x] `make fmt` no drift. - [x] `make lint` 0 issues. - [x] New unit tests cover: - all-succeed batch with input-order preservation - server-reported failure on one of N (others still complete) - submission-time transport error encoded into per-result error - explicit `OnWaitTimeout: CONTINUE` on every `ExecuteStatement` - staggered completion (1 slow + 2 fast) preserves input order in results - context cancellation triggers `CancelExecution` for each in-flight statement - cobra-level rejection of `--output text` and `--output csv` with multiple positionals - cobra-level rejection of `--concurrency 0` and `--concurrency -1` - `resolveSQLs` covering mixed sources, multiple files, multiple positionals, indexed-error message - [x] Manual smoke against a real warehouse: ```bash databricks experimental aitools tools query \ --warehouse <wh> --output json \ "SELECT 1" "SELECT 2" "SELECT current_timestamp()" ```
mkazia
pushed a commit
to mkazia/cli
that referenced
this pull request
Apr 30, 2026
## Stack This PR is part of a 4-PR stack making `aitools` data exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order. 1. databricks#5092 — aitools: extract pollStatement helper and pin OnWaitTimeout *(base: `main`)* 2. databricks#5093 — aitools: run multiple SQL queries in parallel from one query invocation *(base: databricks#5092)* 3. **databricks#5095 — aitools: add 'tools statement' lifecycle commands** *(base: databricks#5093)* — **this PR** 4. databricks#5097 — aitools: parallelize discover-schema across tables and probes *(base: databricks#5095)* Use `git diff <base>...HEAD` or set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff against `main` includes ancestor PRs. --- ## Why Quentin's ai-dev-kit skill works against synchronous `tools query`. That covers most cases, but there are workflows where the agent wants a server-side handle it can poll separately: long-running maintenance queries, parallel exploration where the agent does other work in between, and any "submit-now-harvest-later" pattern. `tools query` with a single SQL is for "I want results now." This PR adds a low-level command tree, `tools statement`, for "I want a handle." Cleaner separation than overloading `query` with `--async`/`--cancel` flags (which would be semantically forced — a `query` shouldn't manage someone else's statement_id). ## Changes Four new subcommands under `databricks experimental aitools tools statement`: ```bash # Fire and exit with a handle. databricks experimental aitools tools statement submit \ --warehouse <wh> "SELECT pg_sleep(60)" # Output: # { "statement_id": "01ef...", "state": "PENDING", "warehouse_id": "..." } # Block until terminal and emit rows. databricks experimental aitools tools statement get <statement_id> # Peek at current state without polling. databricks experimental aitools tools statement status <statement_id> # Request cancellation. databricks experimental aitools tools statement cancel <statement_id> ``` Implementation notes: - All four subcommands emit a uniform `statementInfo` JSON shape: `{statement_id, state, warehouse_id, columns, rows, error}` with `omitempty` on every field except `statement_id`. So `submit` doesn't include `columns/rows`, `cancel` doesn't include `warehouse_id`, etc. Consumer parsing is uniform. - `submit` uses `WaitTimeout: "0s"` and `OnWaitTimeout: CONTINUE` (matching the helper from databricks#5092). - `get` uses `pollStatement` (from databricks#5092) and inherits its "ctx cancellation does NOT cancel server-side" semantics. This is the **important UX difference from `tools query`**: hitting Ctrl+C on `get` stops polling but leaves the statement running on the warehouse. Use `cancel` for explicit termination. That asymmetry is intentional, since `get` is poll-only by design — the user already submitted async. - `status` does a single `GetStatementByStatementId` with no polling. - `cancel` calls `CancelExecution` and optimistically reports `state=CANCELED`. The Statements API returns no body on cancel; the actual server-side state transitions asynchronously. The `Long` help points users at `status` if they need certainty. - A shared helper `statementErrorFromStatus` populates the `error` field for every non-success terminal state (FAILED, CANCELED, CLOSED), even when the server returns no `Status.Error` payload. So skill consumers can branch on `error == null` alone instead of inspecting `state`. - Each subcommand has a small testable helper (`submitStatement`, `getStatementResult`, `getStatementStatus`, `cancelStatementExecution`) extracted from the cobra `RunE`. Tests target the helpers directly with a mock `StatementExecutionInterface`. - Parent `statement.go` registers the four subcommands and is wired into `tools.go` next to `query`, `discover-schema`, and `get-default-warehouse`. - `submit` validates input (rejects mixed --file + positional) BEFORE accessing `WorkspaceClient`, so the error surfaces cleanly without an auth or warehouse roundtrip. ## Test plan - [x] `go test ./experimental/aitools/...` passes. - [x] `make checks` clean. - [x] `make fmt` no drift. - [x] `make lint` 0 issues. - [x] New tests cover: - `submit` returns the statement_id and pins `OnWaitTimeout: CONTINUE` - `submit` wraps transport errors with `execute statement: ...` - `get` polls until terminal and assembles rows - `get` reports server-side errors in the JSON without raising a Go error - `get` ctx cancellation propagates **without** calling `CancelExecution` (the deliberate UX difference from `query`) - `get` synthesizes `error` for terminal CLOSED / FAILED with no backend payload - `status` does a single GET, no polling - `status` reports server-side errors in the JSON; running/pending stay error-free - `status` synthesizes `error` for FAILED with no backend payload - `cancel` calls `CancelExecution` and reports `state=CANCELED` - `cancel` wraps API errors - `statementErrorFromStatus` table-driven across nil, succeeded, running, failed-with-error, failed/canceled/closed-without-error - `renderStatementInfo` JSON shape (full and minimal) - cobra-level: `submit` rejects mixed --file + positional, `submit` enforces MaximumNArgs(1), `get` and `cancel` require a positional statement_id - [x] Manual smoke against a real warehouse: ```bash SID=$(databricks experimental aitools tools statement submit \ --warehouse <wh> "SELECT pg_sleep(5)" | jq -r '.statement_id') databricks experimental aitools tools statement status "$SID" databricks experimental aitools tools statement get "$SID" ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack
This PR is part of a 4-PR stack making
aitoolsdata exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order.main)Use
git diff <base>...HEADor set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff againstmainincludes ancestor PRs.Why
discover-schemawalked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that #5093 (batch query) fixed; same fix.This is a pure latency win. No new user-facing API surface, no output-shape change.
Changes
Two layers of parallelism plus a shared statement budget:
RunEbecomes anerrgroup.Group. A failure on one table never aborts the others; it's rendered inline as"Error discovering ..."exactly as before.discoverTablestill runs DESCRIBE first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. Output text is assembled once both probes finish, preserving the existingCOLUMNS / SAMPLE DATA / NULL COUNTSorder.sqlGate(chan struct{} of capacity N + statement_id tracking) wraps everyexecuteSQLcall.--concurrency(default 8) caps total in-flight statements globally, regardless of how many tables you pass. So--concurrency 1actually serializes statement load, not just table fan-out.Switch
executeSQLto usepollStatement(the helper extracted in #5092) instead of the SDK'sExecuteAndWait. PinsOnWaitTimeout: CONTINUE. Failed states flow throughcheckFailedState, yielding more specific error messages (e.g."query failed: SYNTAX_ERROR near 'oops'") than the previous hand-rolled branch. The user-visible"SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v"wrapping is unchanged. Future polling-helper improvements land here for free.Cancellation discipline mirroring batch.go (#5093): signal handler cancels a derived
pollCtx;sqlGaterecords eachstatement_idpost-submission; on cancellation the recorded IDs are swept viaCancelExecutionbefore returningroot.ErrAlreadyPrinted. Without this, parallelism would orphan up to N×2 statements server-side on Ctrl+C.--concurrencyvalidation mirrorscmd/fs/cp.goand #5093:PreRunErejects values <= 0 witherrInvalidBatchConcurrency. Table-name validation also runs inPreRunEso malformed identifiers are rejected beforeMustWorkspaceClientruns (no unnecessary auth roundtrip on bad input).Output unchanged for any input that previously succeeded. Same dividers, same header/probe ordering, same per-probe error wrapping.
Test plan
go test ./experimental/aitools/...passes.make checksclean.make fmtno drift.make lint0 issues.New unit tests in
discover_schema_test.go:quoteTableNametable-driven (valid, missing parts, too many parts, injection attempts, empty parts, leading-digit identifiers, backtick in name)parseDescribeResultskips metadata rows (#-prefixed and empty)sqlGate.runpinsOnWaitTimeout: CONTINUE, propagates FAILED state, wraps transport errors, records IDs, respects cancelled contextcancelDiscoverInFlightcalls API per ID; empty list is a no-opdiscoverTable: sample and null-count probes run concurrently after DESCRIBE (deterministic atomic-counter + sync.OnceFunc + channel-close barrier; sequential execution surfaces a timeout error)discoverTable: a sample-probe failure does not abort null counts--concurrency 0and-1rejected at PreRunECATALOG.SCHEMA.TABLE) and injection attempts rejected at PreRunE before any API callManual smoke against a real warehouse: