Skip to content

Merging CipherOwl Improvements#115

Open
leozc wants to merge 191 commits intocoinbase:masterfrom
cipherowl-ai:master
Open

Merging CipherOwl Improvements#115
leozc wants to merge 191 commits intocoinbase:masterfrom
cipherowl-ai:master

Conversation

@leozc
Copy link
Copy Markdown
Contributor

@leozc leozc commented Apr 1, 2025

What changed? Why?

  1. This change includes a list of improvements from CipherOwl based on the original ChainStorage from Coinbase
  2. Key improvement:
    2.1 Adding LTC / Tron Support,
    2.2 Adding ZSTD support
    2.3 Making ChainStorage more friendly for opensource / k8s environment

How did you test the change?

  • [ X ] unit test
  • [ X ] integration test
  • functional test
  • adhoc test (described below)
  • running in production

ImNumber4 and others added 30 commits January 24, 2024 19:36
…replicator

# Conflicts:
#	internal/workflow/replicator.go
Signed-off-by: Henry Yang <henry.yang@cipherowl.com>
* Port ethereum beacon support
PikaZ76 and others added 30 commits January 5, 2026 10:44
Add two layers of protection against truncated/corrupted JSON responses
from Tron REST API endpoints:

1. Content-Length validation in REST API client: Detects when actual
   response bytes don't match the Content-Length header, indicating
   premature connection close. This is retried at the HTTP level.

2. JSON validation in Tron client: Validates response is valid JSON
   before attempting to unmarshal. Catches cases where Content-Length
   is not present (chunked encoding) or data is corrupted.

Both validations include logging with response preview for debugging.
Errors are marked as retryable to allow automatic retry.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…onse

fix: Add validation for truncated JSON responses in Tron client
* chore: Migrate AWS SDK from v1 to v2

This commit completes the migration from github.com/aws/aws-sdk-go (v1) to
github.com/aws/aws-sdk-go-v2 (v2) to address CVE-2020-8911 security vulnerability.

Changes:
- internal/aws/*: Rewrite config and retryer for v2, delete session.go
- internal/storage/metastorage/dynamodb/*: Migrate to v2 DynamoDB API
  - dynamodbattribute -> attributevalue
  - types.AttributeValue instead of *dynamodb.AttributeValue
- internal/s3/*: Migrate to v2 S3 API with manager package
- internal/storage/blobstorage/s3/*: Update ACL to use v2 types
- internal/dlq/sqs/*: Migrate to v2 SQS API
- cmd/admin/migrate.go: Update DynamoDB client usage
- Regenerate all mocks for v2 interfaces
- Update test files to use v2 SDK types

The v1 SDK has been removed from direct dependencies in go.mod.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* remove unused file

* chore: Upgrade Go version from 1.22 to 1.24 in Dockerfile

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Use thread-safe WriteAtBuffer from AWS SDK v2 for S3 downloads

Replace custom writeAtBuffer implementation with manager.WriteAtBuffer
from AWS SDK v2. The S3 Downloader uses 5 concurrent goroutines by
default, and the custom implementation was not thread-safe, potentially
causing race conditions and data corruption.

Also fix minor formatting in migrate.go.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
)

Upgrade multiple Go dependencies to address known CVEs flagged by Vanta:
- op-geth: v1.101500.0 → v1.101609.0
- golang.org/x/time: v0.5.0 → v0.9.0
- consensys/gnark-crypto: v0.14.0 → v0.18.0
- golang/snappy: v0.0.5-pre → v1.0.0
- prometheus/client_golang: v1.14.0 → v1.15.0
- supranational/blst: v0.3.13 → v0.3.16
- bits-and-blooms/bitset: v1.17.0 → v1.20.0
- gofrs/flock: v0.8.1 → v0.12.1
- spf13/pflag: v1.0.5 → v1.0.6

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
BCHN upgrade 12 introduced a new standard script type TX_SCRIPT ("script")
for bare scripts <= 201 bytes that were previously classified as nonstandard.
This was causing block parsing failures on Bitcoin Cash mainnet at block 940884.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* add dash

* add dash raw transaction data in blob data

* simply dash client code

* refactor: separate Dash parser from Bitcoin with preprocessBlock hook

Dash omits tx.hash in RPC responses. Instead of removing the required
validation on BitcoinTransaction.Hash, add a preprocessBlock hook that
normalizes hash=txid before validation runs.
* add zcash

* keep only transparent transactons
)

Dash has special transaction types (e.g. quorum commitments) that can
have empty Vin and/or Vout, causing block parsing to fail. This follows
the same approach used for Zcash shielded transactions.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cache unmarshaled tx in client

* opt getInputTransactions

* fix validation in client

* fetch tx in currency
* add heartbeat for extractor

* fix test
* add more heartbeat

* add heartbeat when every retry

* refactor: per-chain RPC method customization and per-retry heartbeat

- Add rpcMethods struct with newRPCMethods() override support for per-chain RPC config
- Add jsonrpc.WithOnAttempt for per-retry heartbeat in fetchInputTransactions
- Replace package-level method vars with instance-level methods field
* fix seismic timestamp to second

* fix timestamp, keep ms

* add uitls

* drop code duplicate
#100)

* feat: add SRC20TokenTransfer as distinct protobuf type instead of reusing ERC20

SRC20 (Seismic's encrypted token standard) was incorrectly stored as
ERC20TokenTransfer in the oneof. This adds a dedicated SRC20TokenTransfer
message with an encrypt_key_hash field and updates the parser and tests.

* delete EncryptKeyHash from SRC20TokenTransfer
…108)

* fix: make Bitcoin RPC method timeouts configurable via chain config

The BCH mainnet poller was failing on blocks with many input transactions
because getrawtransaction batch calls to NowNodes exceeded the hardcoded
30s timeout. This adds configurable RPC timeout fields to ClientConfig
(rpc_timeout_get_block, rpc_timeout_get_raw_tx, rpc_timeout_get_block_hash)
so any chain can tune timeouts via YAML config. Sets BCH mainnet
getrawtransaction timeout to 60s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review feedback and add getblock timeout for BCH

- Use default constants (defaultGetBlockByHash.Name etc.) instead of
  hardcoded RPC method name strings in rpcMethodsOverrideFromConfig
- Fix Zcash override order so config overrides take precedence over the
  hardcoded 10s default (config applied last wins)
- Add rpc_timeout_get_block: 30s for BCH mainnet since getblock calls
  are also timing out against NowNodes at the default 10s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses CVE-2026-33186 (CVSS 9.1 Critical): gRPC-Go authorization
bypass via missing leading slash in the HTTP/2 :path pseudo-header.
Fixed upstream in grpc-go v1.79.3.

- Vanta: Critical vulnerabilities identified in packages are addressed
  (AWS Container) — cipherowl/chainstorage
- References:
  - GHSA-p77j-4mvh-x3m3
  - https://nvd.nist.gov/vuln/detail/CVE-2026-33186

Transitive deps picked up by `go mod tidy` (OTel, x/crypto, x/net,
genproto, protobuf, etc.) are all minor/patch bumps, no code changes
required.

Verified locally:
- `make bin` — all 10 binaries build clean
- `go vet ./...` — clean
- `TEST_TYPE=unit go test ./... -short` — all packages pass
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Adds gomod and docker ecosystems with weekly cadence, cooldown policy,
and grouped PRs. Pins golang base image to current major.minor since
Go upgrades require cross-service coordination.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on (#114)

* feat: add parser options, streaming iterator, disk-spooled downloader

- Parse options (`WithSkipScripts`, `WithSkipWitnesses`, `WithSkipShielded`)
  threaded through sdk.Parser → NativeParser.ParseBlock as variadic. Cuts
  persistent NativeBlock size by 62–86% on bitcoin/zcash fixtures when
  tables don't consume those fields. Chainsformer blocks table can adopt
  today without a schema change.

- BitcoinBlockStream (iter.Seq2) with "free after iteration / paid
  before" Header() optimization. Exposed via bitcoin.StreamBlockIter and
  re-exported through parser.BitcoinStreamer.

- Chain-agnostic BlockDownloader.DownloadStream: spools compressed blob
  to a local temp file with io.Copy (32 KB RAM during download vs ~800
  MB ReadAll), then decompresses + proto.Unmarshal, invokes a chain-
  agnostic consumer with *api.Block. Temp file deleted on return.

- sdk.Client.StreamBlock mirrors the GetBlock ergonomics: delivers a
  chain-agnostic *StreamedBlock view whose GetBitcoin() returns a
  BitcoinBlockStream for bitcoin-family blocks, nil otherwise. No
  explicit Close for consumers — lifecycle is callback-scoped.

- storage/utils gains DecompressReader for streaming gzip/zstd decode.

Phase 1 streaming win on zcash: ~−600 MB peak on 3.8 GB blocks (eliminates
compressed-bytes in RAM). Peak is still dominated by
decompressed_bytes + api.Block during proto.Unmarshal; Phase 2 wire
walker will attack that next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: proto wire walker + DownloadStreamBitcoin (Phase 2)

Walker lives in protos/coinbase/chainstorage/blockchain_bitcoin.stream.go
next to the generated proto, so any edit to blockchain.proto /
blockchain_bitcoin.proto surfaces the contract test in the same PR diff.

- WalkBitcoinEnvelope hand-reads the proto wire format, decoding every
  field except BitcoinBlobdata.Header (returned as file offset+length).
  The header JSON never materializes as []byte in RAM.

- Walker is strict: unknown field numbers on the wire produce an error
  pointing at the file + constant to update. No silent skip.

- BlockDownloader.DownloadStreamBitcoin: spool compressed -> tempfile1,
  decompress -> tempfile2 via TeeReader while wire-walking, hand the
  consumer a partial *api.Block (Header = nil) + a re-openable header
  reader factory that seeks into tempfile2. Both temp files deleted on
  return.

- sdk.Client.StreamBlock auto-dispatches: bitcoin-family chains take
  the Phase 2 path; others fall through to Phase 1 (DownloadStream).

Drift-defense tests in protos/coinbase/chainstorage/:
- TestWireWalkerContract_*: bidirectional reflection check between the
  walker's KnownBlockFields / KnownBitcoinBlobFields maps and the proto
  descriptor. Catches new fields, stale entries, kind/cardinality/oneof
  drift.
- TestWireWalkerSwitchCoverage_*: for every field in Known*, build a
  proto containing only that field, walk it, assert no "unknown field"
  error. Catches Known* entries that lack a switch case.

Together any proto change that the walker doesn't fully handle fails
at least one of the two tests.

Measured on real zcash mainnet blocks (3.3-3.8 GB decompressed):
  block 322102: 4.43 s / 8.51 GB peak (baseline) -> 2.81 s / 4.81 GB (Phase 2)
  block 298695: 3.79 s / 7.38 GB peak (baseline) -> 2.49 s / 4.07 GB (Phase 2)
  = -35% wall time, -45% peak heap across both blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: generic WalkBlockEnvelope, extend Phase 2 to all chains

- New protos/coinbase/chainstorage/block_stream.go: chain-agnostic
  reflection-driven walker that populates a full *api.Block directly
  from a proto-encoded stream. Replaces the ioutil.ReadAll +
  proto.Unmarshal pair in DownloadStream; peak RAM during
  DownloadStream drops from ~2x block size to ~1x block size on every
  chain (not just bitcoin).

- blockchain_bitcoin.stream.go slimmed down (~200 LOC removed): the
  bitcoin-specific walker now shares the generic walker struct and
  delegates every field except Header to decodeField. The only
  bitcoin-specific path is the Header-as-offset skip for the
  streaming iterator use case. Duplicated varint/bytes/tag primitives
  and the hand-rolled map-entry decoder are gone.

- Because walkBitcoinBlob no longer uses proto.Unmarshal per
  RepeatedBytes group, the per-group transient []byte + destination
  duplicate is eliminated. On zcash 322102 the bitcoin walker's peak
  drops from 4.82 GB to 3.85 GB (-20%); on zcash 298695 from 4.15 GB
  to 3.37 GB (-19%).

Measured peak on zcash mainnet (3.3-3.8 GB decompressed):
  block 322102: 8.52 GB (legacy) -> 3.89 GB (generic) -> 3.85 GB (bitcoin walker + iter)
  block 298695: 7.38 GB (legacy) -> 3.41 GB (generic) -> 3.37 GB (bitcoin walker + iter)
  = -55% peak vs legacy.

NOTE: the remaining ~3.8 GB peak on zcash is dominated by
BitcoinBlobdata.InputTransactions which buildInputMetadataMap reads
upfront. A follow-up (lazy per-tx prev-output resolution) will drop
that further.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: lazy per-tx InputTransactions + skip RawInputTransactions

Drops parser peak RAM on 4 GB zcash blocks from ~8.5 GB to ~19 MB.

Key changes:
- Walker now skips BitcoinBlobdata.RawInputTransactions (field 3) in
  addition to Header + InputTransactions. On zcash the field duplicates
  the InputTransactions payload, so materializing it doubled peak RAM.
- DownloadStreamBitcoin exposes a per-tx loadGroup(i) closure that
  seeks into the decompressed spool file and proto.Unmarshals a single
  RepeatedBytes group on demand — no more O(block) in-memory groups.
- StreamBlockIter / bitcoinNativeParserImpl build prev-output metadata
  per transaction (buildMetadataForGroup), so only one tx's worth of
  input-tx state is retained at any moment.
- zstd decompressor configured with WithDecoderConcurrency(1),
  WithDecoderLowmem(true), WithDecoderMaxMemory(256MB) to cap
  internal buffering on multi-GB blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: consolidate bitcoin stream files, drop dead public API

- Remove StreamingNativeParser interface + public StreamBlock method +
  BitcoinBlockVisitor[Func] types. Public streaming surface is now just
  StreamBlockIter / BlockStream (iter.Seq2-based).
- Merge bitcoin_native_stream.go and bitcoin_native_stream_iter.go into
  a single bitcoin_native_stream.go. With the visitor/StreamBlock types
  now package-private, the two files were one unit; split only remained
  because the former had public types.
- Replace the internal visitor interface with a plain emit callback
  (func(*BitcoinTransaction) error) consumed directly by the iter loop.
- Rewrite BenchmarkBitcoinStreamBlock against StreamBlockIter.
- Drop the now-dead StreamBlock section from zcash_large_bench_test.go
  (zcash_stream_bench_test.go already benchmarks the streaming path
  end-to-end).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: pull-based streaming API with explicit Close()

Replace callback-scoped DownloadStream / DownloadStreamBitcoin / StreamBlock
with a return-based API that hands back a handle the caller must Close().
More flexible (handles can be passed across goroutines or composed into
pipelines) and mirrors GetBlock's shape.

- downloader.BlockDownloader
  - DownloadStream(ctx, blockFile) (*api.Block, error) — no resources
    outlive the call; caller has no cleanup responsibility.
  - DownloadStreamBitcoin(ctx, blockFile) (*BitcoinStream, error) —
    returns a handle carrying the partial block plus OpenHeaderReader /
    LoadInputTxGroup closures. Caller MUST Close() to remove the
    decompressed spool file.
  - BitcoinStream uses sync.Once for idempotent Close and wires
    runtime.AddCleanup as a safety net against leaks.

- sdk.Client.StreamBlock(ctx, tag, height, hash, opts...) (*StreamedBlock, error)
  StreamedBlock gains Close() that delegates to the underlying
  downloader handle (no-op for non-bitcoin chains).

Also drops StreamConsumer / BitcoinStreamConsumer callback types and
the nil-consumer guard (there is no consumer anymore).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: move chain dispatch to parser, downloader becomes dumb plumbing

Re-architect the streaming path so chain-specific logic (wire walker
choice, lazy-load semantics) lives in the parser package instead of
the downloader. The downloader now produces a chain-agnostic spool
handle; the parser consumes it and returns a chain-specific view.

Downloader (chain-agnostic):
- DownloadStreamBitcoin removed.
- DownloadStream(ctx, bf) now returns *SpooledBlock with
  BlockFile + Open() + Close(). No proto walking at this layer.
- SpooledBlock uses sync.Once for idempotent Close and wires
  runtime.AddCleanup as a leak safety net.

Parser (chain dispatch):
- Parser.StreamBitcoinBlock removed in favor of the chain-agnostic
  Parser.StreamBlock(ctx, spooled, opts...) (StreamedBlock, error).
- Internal dispatch: bitcoin family (NativeParser implements
  BitcoinStreamer) walks via WalkBitcoinEnvelope and wires
  seek-based loaders over the spool; other chains materialize via
  WalkBlockEnvelope.
- New interfaces: StreamedBlock (GetMetadata + Close),
  BitcoinStreamedBlock (StreamedBlock + BitcoinBlockIter).
- Internal BitcoinBlockStream renamed BitcoinBlockIter
  (iterator-only contract that native parsers return);
  BitcoinStreamedBlock is the full public contract.

SDK:
- StreamedBlock is now parser.StreamedBlock (interface), not a
  struct wrapper. Callers type-assert to BitcoinStreamedBlock to
  reach iterators.
- Client.StreamBlock simplified to: DownloadStream +
  parser.StreamBlock. Bitcoin-family dispatch logic now lives in
  the parser, not the SDK.

All call sites, mocks, and tests updated. Downloader tests assert
on SpooledBlock.Open/Close; SDK stream test mocks DownloadStream
returning a SpooledBlock backed by a real tempfile so the parser's
seek-based loaders exercise end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: per-chain typed stream methods instead of runtime type-assert

Replace the generic StreamBlock + runtime view.(BitcoinStreamedBlock)
cast with chain-specific methods that return the typed stream
directly:

    // Before
    view, _ := client.StreamBlock(...)
    bs, ok := view.(sdk.BitcoinStreamedBlock)

    // After
    bs, err := client.StreamBitcoinBlock(...)

Compile-time chain safety at the call site; chain mismatch becomes a
typed error inside the method. Each new chain that gains streaming
support adds its own typed method (StreamEthereumBlock, ...) on the
same pattern.

Changes:
- Parser.StreamBlock → Parser.StreamBitcoinBlock (returns
  BitcoinStreamedBlock directly). Errors cleanly if the configured
  native parser is not bitcoin-family.
- Client.StreamBlock → Client.StreamBitcoinBlock, same shape.
- Drop the chain-agnostic StreamedBlock-dispatch path and its
  genericStreamedBlock wrapper. Skipped bitcoin blocks get their
  own skippedBitcoinStream that implements BitcoinStreamedBlock
  with empty iteration.
- All mocks + tests updated to the new typed methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: apply chain-specific hooks in bitcoin-family streaming path

dash and zcash have chain-specific parser behavior that the inherited
StreamBlockIter path silently ignored, producing outputs that diverged
from ParseBlock:

- Dash: RPC omits tx.hash; ParseBlock runs preprocessBlock=backfillTxHash
  to copy TxId into Hash. Streaming never applied this, yielding txs with
  empty Hash.
- Zcash: buildTransparentTransactionMask filters out shielded-only txs
  before they reach NativeBlock.Transactions. Streaming included them.

Replace the block-level preprocessBlock hook with two per-tx hooks that
both ParseBlock and streaming apply identically:

- preprocessTx(*BitcoinTransaction): normalizes each tx before
  validation (e.g. hash backfill for dash/zcash).
- txFilter(json.RawMessage) (bool, error): classifies each raw tx as
  keep/skip (e.g. zcash shielded-tx drop).

Zcash's ParseBlock override is deleted — the base ParseBlock via the
hooks produces identical output. buildTransparentTransactionMask and
buildZcashTransactions are folded into the base parser.

Also adds a new bitcoin_native_stream_parity_test.go that runs
ParseBlock and StreamBlockIter against the same fixture for bitcoin,
dash, and zcash and asserts transaction lists match exactly. This
catches the pre-existing gaps and guards against future overrides
silently diverging streaming from ParseBlock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: SDK Client.StreamNativeBlock with chain-agnostic NativeStreamedBlock

Consolidate the streaming SDK surface into a single entry point that
mirrors GetBlock/ParseNativeBlock ergonomics: one method, one defer,
one error path. Chain-specific iteration is available through
nil-checkable accessors on the returned view, parallel to
*api.NativeBlock's oneof getters.

Consumer pattern:

    native, err := client.StreamNativeBlock(ctx, tag, height, hash, opts...)
    if err \!= nil { return err }
    defer native.Close()

    if bs := native.GetBitcoin(); bs \!= nil {
        for tx, err := range bs.Transactions() { ... }
        header, _ := bs.Header()
    }

Changes:

- New interfaces (internal/blockchain/parser/internal/stream.go):
  * NativeStreamedBlock { GetMetadata, Close, GetBitcoin, GetEthereum }
  * BitcoinNativeStream { Transactions, Header }
  * EthereumNativeStream { Transactions, Header } — stub; GetEthereum()
    returns nil until the ethereum streaming walker lands.
- Internal Parser: StreamBitcoinBlock replaced by ParseStreamNative
  (chain-agnostic input, chain-dispatched output). Bitcoin-family
  chains populate GetBitcoin via the native parser's StreamBlockIter;
  other chains leave all Get<Chain>() nil so callers fall back to
  GetBlock + ParseNativeBlock.
- SDK Client: StreamBitcoinBlock replaced by StreamNativeBlock.
  Implementation uses an unexported streamingParser mirror interface
  to reach the internal parser method without surfacing it on
  sdk.Parser.
- sdk.Parser trimmed (no streaming methods); sdk.SpooledBlock and
  sdk.BitcoinStreamedBlock re-exports dropped — consumers never
  construct SpooledBlock, and the old BitcoinStreamedBlock interface
  is superseded.

Mocks, tests, and interceptor passthroughs updated. All existing
tests pass; the bitcoin-family stream/parse parity tests still hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf: add fast path in bitcoin streaming decode when no txFilter is set

The txFilter hook added in dc562e9 forced a two-stage JSON decode
(json.RawMessage → json.Unmarshal into BitcoinTransaction) for every
chain, including bitcoin/bitcoincash/litecoin/dash which don't use a
filter. That added ~30% overhead and ~11MB of extra allocations per
15MB block.

Branch on b.txFilter:
- nil (bitcoin, bitcoincash, litecoin, dash): decode directly into
  BitcoinTransaction, same as before the hook refactor.
- set (zcash): keep the two-stage decode so the raw bytes remain
  available to the filter.

Benchmark (raw_block_731379 fixture, M3 Max, -count=2 avg):
  Before: BenchmarkBitcoinStreamBlockIter    115ms  37MB  379k allocs
  After:  BenchmarkBitcoinStreamBlockIter     76ms  26MB  346k allocs

Now comparable to ParseBlock (77ms) with slightly fewer allocations
because streaming doesn't build the full BitcoinBlock struct.

Parity tests still pass, confirming zcash's two-stage path is
exercised correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: add phase-level timing to zcash streaming bench

Split the streaming path's elapsed time into download+spool, walk
envelope, and iterate transactions. Split the legacy baseline into
download+unmarshal and parse block. Makes the source of the
streaming-vs-legacy gap visible (disk write penalty + a second 4GB
read for the walker pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf: reduce tempfile + syscall overhead in streaming pipeline

Three changes that dramatically improve concurrent streaming throughput
without hurting memory profile:

1. Single tempfile in DownloadStream. Previously two spools
   (compressed intermediate + decompressed). Now the HTTP body flows
   through the decompressor directly into the single decompressed
   spool. Retries truncate + rewind and reissue the GET.

2. Drop fdatasync on the decompressed spool. The same process reads
   it right back through the OS page cache; forcing disk flush only
   adds latency under concurrent I/O.

3. Cache a single *os.File handle in the bitcoin stream loaders and
   use io.ReaderAt (pread) for header + per-tx group reads. Previous
   loadGroup did Open+Seek+Read+Close per tx = 4 syscalls × N txs ×
   concurrent workers. Under a 30-way bitcoin workload with ~5000-tx
   blocks this was 600k+ syscalls per batch. Now: one open + N preads.

Also tightens heapPeakTracker sampler (zcash_large_bench_test.go) to
5 ms with start + stop snapshots so short ops register a peak.

Concurrent timing impact (30 workers, 1000 blocks per chain, full
pipeline including read + decompress + parse/stream):

  Before               After       Change
  Bitcoin  421 ms  →   155 ms      -63%
  Dash     138 ms  →    10 ms      -93%
  Zcash    175 ms  →    35 ms      -80%

Streaming is now within ±10% of Download+ParseBlock mean time across
all bitcoin-family chains. Memory profile unchanged — still 5-152×
peak-RAM reduction depending on block size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: add prod-blob parity validator for bitcoin-family chains

New TestProdParityBitcoinFamily downloads random blocks from the prod
S3 buckets (bitcoin, dash, zcash) and asserts ParseBlock and
StreamBlockIter produce identical transaction lists over the full
pipeline: read + decompress + parse/stream.

Gated by CHAINSTORAGE_PROD_PARITY=1 with AWS_PROFILE=cipherowl-prod.
Blocks are cached under ~/data/chainstorage/{chain}/ so re-runs skip
S3; deterministic height sampling via seed makes the cache reusable.

A local httptest server serves cached blobs to the real
downloader.BlockDownloader so measurements cover the production code
path, not just the final parse step.

Two modes:
- Default (workers=30): concurrent full-pipeline timing + parity
  comparison. 1000 blocks/chain in ~24s on M3 Max.
- CHAINSTORAGE_PROD_PARITY_PROFILE_MEM=1 (workers=1): per-block peak
  heap delta for both paths; streaming path does not accumulate txs
  so peak reflects intrinsic streaming profile. 1000 blocks/chain in
  ~175s on M3 Max.

Observed on 9000 block comparisons (3 modes × 3000 blocks): zero
parity mismatches. Peak heap at max:
  Bitcoin:  58.7 MB (Parse)  vs  11.4 MB (Stream)  -  5×
  Dash:    670.5 MB (Parse)  vs  15.4 MB (Stream)  - 43×
  Zcash:     8.38 GB (Parse) vs  56.3 MB (Stream)  - 152×
  (on a 3.71 GB zcash block: 8.38 GB → 22.4 MB = 384×)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lint: gofmt + errcheck pass

- gofmt fixes in 4 files flagged by `make fmt` check (whitespace /
  import grouping).
- Explicit `_ = x.Close()` / `defer func() { _ = x.Close() }()` on
  Close paths where the error is intentionally ignored. errcheck is
  satisfied; behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lint: resolve ineffassign warnings via closure-scoped GC cleanup

The "assign nil before runtime.GC()" idiom for dropping references
looks dead to ineffassign. Scope the relevant work in a closure so
the locals go out of scope at the closure return; the subsequent
runtime.GC() then reclaims them without needing an explicit nil
assignment.

- measureFullPipelineProfile: parse phase (rawBlock, baseline) runs
  in a closure; streaming measurement starts from a clean baseline
  once the closure returns.
- TestZcashLargeBlockBench: read + Unmarshal runs in a closure so
  the multi-GB `data` slice is reclaimable before the block is
  measured.

Also an explicit `_ = spooled.Close()` on a discarded error path
that errcheck would catch on the next pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: update handler_test ParseNativeBlock DoAndReturn for variadic opts

The mock generated from the current Parser interface takes
...parser.ParseOption as a third argument. Six DoAndReturn closures
in handler_test.go still had the pre-variadic 2-arg signature,
causing runtime mock mismatches under CI's stricter check.

Updated to `func(ctx context.Context, rawBlock *api.Block,
_ ...parser.ParseOption) (*api.NativeBlock, error)`.

ParseRosettaBlock (non-variadic) callers unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.