feat: streaming output, orjson, and memory-efficient search rendering#254
Merged
feat: streaming output, orjson, and memory-efficient search rendering#254
Conversation
36c044a to
4a0602d
Compare
tomaz-lc
commented
Mar 17, 2026
|
|
||
| orjson is a mandatory dependency (specified in pyproject.toml) but the | ||
| fallback ensures the package still works if orjson cannot be installed | ||
| on a particular platform (e.g. missing Rust compiler for source builds). |
Contributor
Author
There was a problem hiding this comment.
Keep in mind this only applies for source builds - orjson is a well maintained library and they ship pre-built wheels for all most common platforms (Linux, OS X, Windows) and architectures (x86, arm).
0ade604 to
7a37232
Compare
maximelb
previously approved these changes
Mar 17, 2026
7a37232 to
8c01ee0
Compare
Add streaming output to avoid OOM on large searches. Previously all search results were buffered in a list before output, causing OOM on constrained VMs (e.g. 4GB RAM with 500K+ events). Now results stream one at a time for JSONL, JSON, expand, and table formats. Streaming behavior by format: - JSONL: one result per line, constant memory (all paths) - JSON: streaming array ([, item, item, ]), constant memory (all paths) - expand: one event block at a time, constant memory (all paths) - table (live search): sample first N pages for column widths, then stream remaining rows. O(sample + columns) memory. - table (checkpoint): two-pass over file - pass 1 computes exact column widths O(columns), pass 2 streams rows. Perfectly accurate layout. - CSV/YAML: still buffered (inherent to format, rarely used for large data) Key changes: - _stream_search_output(): core streaming function for JSONL, JSON, expand, and table from any iterable. Returns False for CSV/YAML. - _stream_table_events(): sample-based streaming table for live searches (configurable via _TABLE_SAMPLE_PAGES constant). - _stream_table_from_file(): two-pass streaming table for checkpoint files. - _run_normal and saved_run: try streaming first, fall back to list() only for CSV/YAML. - _run_with_checkpoint: search loop does not accumulate results in memory. Add orjson as dependency for ~3-10x faster JSON serialization: - New limacharlie/json_compat.py module: unified API (dumps, dumps_pretty, loads, backend_name) with graceful fallback to stdlib json. - output.py: format_json, format_jsonl, _table_value, _csv_value all use json_compat. Benefits ALL CLI commands, not just search. - Debug log (--debug) shows which JSON backend is active. CLI improvements: - Add -h as alias for --help on all commands (context_settings). - Add help strings to all search subcommands (run, validate, estimate, saved-list, saved-get, saved-create, saved-delete, saved-run). - Fix checkpoint-show --checkpoint error to show "--checkpoint" not "checkpoint_path" in missing parameter message. - Warn on large time range searches (>7 days) without --checkpoint when using buffered output formats (table/CSV/YAML). Suggests --checkpoint or --output jsonl. Threshold configurable via _LARGE_TIME_RANGE_WARN_SECONDS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8c01ee0 to
6920054
Compare
Add PyPI classifiers for Python 3.9-3.14, development status, topic, and audience. CI already tests on Python 3.14 via cloudbuild_pr.yaml. Add packaging tests: classifiers present, current Python version included, requires-python minimum, production/stable status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add project URLs (Documentation, Repository, Issues, Changelog, REST API Docs) so links render on the PyPI page. Update description to better reflect the package scope. Add packaging tests for URL presence and HTTPS validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dist checks orjson 3.11+ requires Python 3.10+, but we support 3.9. Split the dependency into two environment markers: - Python <3.10: orjson >=3.10.0,<3.11 (last series with 3.9 support) - Python >=3.10: orjson >=3.10.0 (latest) Add distribution install checks for Python 3.9-3.13 in CI (3.14 already covered by existing steps). All run in parallel. Python 3.9 step also verifies orjson 3.10.x is installed (not 3.11+) and runs the full unit test suite to catch syntax/compat issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…allel Consolidate the separate "Unit Tests" and "Dist Check" steps into unified per-version steps that build, install, verify orjson, and run the full unit test suite. All 6 versions run in parallel. Use E2_HIGHCPU_8 machine type (8 vCPUs) to handle ~10 concurrent steps efficiently. Previously used the default E2_MEDIUM (2 vCPUs). Integration tests and benchmarks remain on Python 3.14 only since they test API behavior, not Python version compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
test_jwt_cache.py used `float | int` union syntax which requires Python 3.10+. Adding `from __future__ import annotations` makes it work on 3.9. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
platform.freedesktop_os_release was added in Python 3.10. Use create=True on mock.patch so the test works on 3.9 where the attribute doesn't exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split each Python version into separate "Dist" and "Unit Tests" steps for clearer CI output and easier debugging. Each step: Dist steps: build wheel in /tmp/build-<ver>, install, verify pip show, limacharlie --version, orjson backend. Clean isolation per step. Unit test steps: install from source with dev deps in /tmp/test-<ver>, run full pytest suite. Clean isolation per step. All steps use unique /tmp dirs to avoid cross-step interference. Added echo banners (======) and phase markers (--- phase ---) so CI logs are easy to scan. Also added sdist check as separate parallel step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
Adds streaming output support across all search command paths, replacing the previous approach that buffered all results in memory before rendering. Output formats that support it (JSONL, JSON, expand, table) now stream results with constant or near-constant memory usage, making large searches (100K+ events) feasible on memory-constrained VMs.
Also adds
orjsonas an optional-but-preferred JSON backend (~3-10x faster than stdlibjson), with automatic fallback to stdlib if orjson is unavailable.Output behavior by path and format
search run(no checkpoint)search run --checkpointsearch run --resumesearch checkpoint-showsearch saved-runPeak memory by format
[, items,]Memory profile - live validation
Measured during a real 30-day
--resume --checkpointsearch over a production org. The checkpoint file grew to 632 MB (484 result pages) while RSS stayed flat at ~138 MB, confirming O(1) memory for the checkpoint write path.Key observations:
Warnings
Three independent warnings based on search time range:
search estimatecommand to check costs.--checkpoint.--checkpointso interrupted searches can be resumed.Cost notice (>30 day search)
Validate/estimate exit codes and output
search validateandsearch estimatenow exit with code 1 when the server returns an error (e.g. invalid query syntax). Previously they always exited 0.For table output, stats and estimatedPrice fields are flattened into individual columns (
stats.bytesScanned,price.value, etc.) so full values are visible without truncation.Validate: invalid query (exit code 1)
Validate: valid query (exit code 0)
Estimate: JSON output (preserves nested structure)
-h/--help flag fix
The
-hshort flag for help was not wired up at the CLI root level. This affected all commands and subcommands. Addedcontext_settings={"help_option_names": ["-h", "--help"]}to the root group, which propagates to all subcommands.Missing help/description strings for search subcommands
Added missing Click docstrings for:
validate,estimate,saved-get,saved-create,saved-delete,saved-run.Checkpoints list improvements
Changes
_stream_search_output()handles JSONL, JSON, expand, and table formats without buffering. ReturnsFalsefor CSV/YAML/raw so the caller can fall back tolist()._stream_table_from_file()for checkpoint paths - pass 1 scans the JSONL file to compute exact column widths (O(cols) memory), pass 2 streams rows with the computed layout._stream_table_events()for live search paths - buffers first ~3 pages to determine column widths, then streams remaining rows.limacharlie/json_compat.pyprovidesdumps,loads,dumps_prettyusing orjson when available, stdlib json fallback._warn_cost_if_over_30_days()warns when search spans >30 days and shows thesearch estimatecommand. Threshold matches server-side billing logic (replay: 31d, insight-go: 30d with>)._output_validate_or_estimate().-has help short flag (affects all commands/subcommands); added missing docstrings for search subcommands.search runexplain text covering streaming vs buffered behavior,--expand,--raw, and memory guidance.test_search_helpers.pywith 144 unit tests covering all helper functions, streaming, warnings, cost notice, validate/estimate exit codes, and output formatting.Blast radius / isolation
search run,search run --checkpoint,search run --resume,search checkpoint-show,search saved-run),search validate,search estimate, search help text, checkpoints list display, CLI-hflag.-hfix which benefits all), SDK classes, authentication, config.Performance characteristics
Notable contracts / APIs
json_compatmodule is internal, not part of the public SDK API.pyproject.tomlbut the fallback ensures backward compatibility if it cannot be installed.errorfield. This is a correctness fix.Test plan
TestWarnCostIfOver30Days,TestCostWarningCli)TestValidateEstimateExitCode)TestOutputValidateOrEstimate)TestLargeTimeRangeWarning)TestCheckpointRecommendWarning)test_search_helpers.py)search run --resume --checkpointwith 30-day range - RSS stable at 138 MB while file grew to 632 MBsearch validatewith invalid query returns exit code 1search estimatewith >30 day range shows cost notice🤖 Generated with Claude Code