-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
api: storageIssues related to the Cloud Storage API.Issues related to the Cloud Storage API.type: cleanupAn internal cleanup or hygiene concern.An internal cleanup or hygiene concern.
Description
BlobReadChannel dumps its buffer on any seek, therefore so does CloudStorageReadChannel.
This causes unexpected/pathological behavior, e.g. in hadoop-bam/spark-bam where the read pattern is roughly "read 100 bytes, rewind 99 bytes, repeat 1000x"; basically the same 2MB is fetched over the network at every iteration.
spark-bam uses a CachingChannel abstraction that LRU-caches blocks of the underlying channel, which fixes this.
Just wanted to call out this "gotcha"; might be worth fixing here.
Metadata
Metadata
Assignees
Labels
api: storageIssues related to the Cloud Storage API.Issues related to the Cloud Storage API.type: cleanupAn internal cleanup or hygiene concern.An internal cleanup or hygiene concern.