Skip to content

NIO channels could optimize buffer-invalidation on seeks #3041

@ryan-williams

Description

@ryan-williams

BlobReadChannel dumps its buffer on any seek, therefore so does CloudStorageReadChannel.

This causes unexpected/pathological behavior, e.g. in hadoop-bam/spark-bam where the read pattern is roughly "read 100 bytes, rewind 99 bytes, repeat 1000x"; basically the same 2MB is fetched over the network at every iteration.

spark-bam uses a CachingChannel abstraction that LRU-caches blocks of the underlying channel, which fixes this.

Just wanted to call out this "gotcha"; might be worth fixing here.

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the Cloud Storage API.type: cleanupAn internal cleanup or hygiene concern.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions