Skip to content

feat(writer): configurable buffer size of unsized write#2143

Merged
Xuanwo merged 12 commits intoapache:mainfrom
wcy-fdu:configure_buffer_size_of_writer
May 6, 2023
Merged

feat(writer): configurable buffer size of unsized write#2143
Xuanwo merged 12 commits intoapache:mainfrom
wcy-fdu:configure_buffer_size_of_writer

Conversation

@wcy-fdu
Copy link
Copy Markdown
Contributor

@wcy-fdu wcy-fdu commented Apr 27, 2023

Currently only support configurable write buffer size for s3, gcs, and oss.
close #2087

Comment thread core/src/services/s3/backend.rs Outdated
@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Apr 27, 2023

I don't like current API:

  • Users should be able to specify the buffer size during write operation. (maybe not true?)
  • There are different buffer size/limit here, we should make the behavior more clear.
    • S3's buffer size requires every write size >= size
    • GCS's buffer size requires every write size == size to align with 256 KiB

For my current understanding, there are at least the following buffer related sizes:

  • the at_least_size for write operation: every write will have at least n bytes
  • the exact_size for write operation: every write will have exactly n times of bytes
  • the at_most_size for write operation: storage can buffer at most n bytes before write

I'm also seeking for better names. Any ideas?


Seems we do need to add value in config, the best value cloud be different for services.

@Xuanwo Xuanwo marked this pull request as draft April 27, 2023 11:20
@xyjixyjixyji
Copy link
Copy Markdown
Contributor

I'm also seeking for better names. Any ideas?

I think it is ok to specify the buffer size for each write, since one writer is constructed per-write operation.

Then maybe this write pattern could be a field in OpWrite like

enum Buffersize {
   AtLeast(usize),
   Exact(usize),
   AtMost(usize),
}

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Apr 27, 2023

Then maybe this write pattern could be a field in OpWrite like

No, too much. And they could be set at the same time, for example, at least 4MiB, at most 16 MiB.

@suyanhanx
Copy link
Copy Markdown
Member

suyanhanx commented Apr 27, 2023

Why not just upper/lower? 👀

@wcy-fdu
Copy link
Copy Markdown
Contributor Author

wcy-fdu commented Apr 28, 2023

I don't like current API:

  • Users should be able to specify the buffer size during write operation. (maybe not true?)

  • There are different buffer size/limit here, we should make the behavior more clear.

    • S3's buffer size requires every write size >= size
    • GCS's buffer size requires every write size == size to align with 256 KiB

For my current understanding, there are at least the following buffer related sizes:

  • the at_least_size for write operation: every write will have at least n bytes
  • the exact_size for write operation: every write will have exactly n times of bytes
  • the at_most_size for write operation: storage can buffer at most n bytes before write

I'm also seeking for better names. Any ideas?

Seems we do need to add value in config, the best value cloud be different for services.

Seem that different services need different semantics, but maybe maintaining three related items is a little redundant. Waiting for an exact solution(naming scheme).

@wcy-fdu
Copy link
Copy Markdown
Contributor Author

wcy-fdu commented May 5, 2023

Any suggestion?
If there is no good solution for the time being, we can close this pr as well.

@xyjixyjixyji
Copy link
Copy Markdown
Contributor

Any suggestion? If there is no good solution for the time being, we can close this pr as well.

I think this need some further discussion before we implement this.

Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/oss/backend.rs Outdated
Comment thread core/src/services/s3/backend.rs Outdated
Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/oss/backend.rs Outdated
Comment thread core/src/services/oss/backend.rs Outdated
Comment thread core/src/services/oss/backend.rs Outdated
Comment thread core/src/services/oss/writer.rs Outdated
Comment thread core/src/services/gcs/backend.rs Outdated
Comment thread core/src/services/gcs/writer.rs Outdated
Comment thread core/src/services/oss/writer.rs Outdated
Copy link
Copy Markdown
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot!

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented May 6, 2023

Mostly LGTM, the only thing left is to resolve the conflicts.

@wcy-fdu wcy-fdu marked this pull request as ready for review May 6, 2023 16:18
@Xuanwo Xuanwo merged commit dfd286c into apache:main May 6, 2023
suyanhanx pushed a commit to suyanhanx/opendal that referenced this pull request May 8, 2023
* config buffer size for gcs

* config buffer size for oss s3

* refactor

* refactor buffer size limitation

* resolve comments

* minor

* minor

* resolve comments

* rebase main and resolve conflict

* minor

* typo
@Xuanwo Xuanwo mentioned this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make the buffer size of streaming write configurable

4 participants