Skip to content

Enlarge protobuf read buffer size#4020

Merged
snnn merged 2 commits intomasterfrom
snnn/model
May 24, 2020
Merged

Enlarge protobuf read buffer size#4020
snnn merged 2 commits intomasterfrom
snnn/model

Conversation

@snnn
Copy link
Contributor

@snnn snnn commented May 23, 2020

Description:

  1. Enlarge protobuf read buffer size from 8KB(default) to 1MB.

Our Azure build VM has a IOPS cap of 500, which is far less a physical SSD can do.
https://azure.microsoft.com/en-us/pricing/details/managed-disks/

A real SSD can do 100k-500k reads per second!
https://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-750-spec.pdf

With 500 IOPS and 8KB buffer size, we can only achieve 3.9MB/s disk speed, which means it need about 2-3 minutes to load an ONNX model zoo model, 2 hours to read all of our test data, which is not acceptable. That's why you can frequently see job timeout in our Windows CI build pipeline.

  1. Temporarily disable windows static analysis CI job

Because the latest protobuf changed the proto classes to final, so some of our code can't be compiled. I don't know how to fix it. In order to unblock the other PRs, I'll temporarily disable the job and try to seek help in the next week when everyone is back.

p.s. Don't use std::fstream. The buffer size is hard coded to 4095, there is no way to change it.

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@snnn snnn requested a review from a team as a code owner May 23, 2020 06:21
@snnn snnn requested a review from tiagoshibata May 23, 2020 06:23
return Status(ONNXRUNTIME, INVALID_ARGUMENT, "Null model_proto ptr.");
}
google::protobuf::io::IstreamInputStream zero_copy_input(&model_istream);
google::protobuf::io::IstreamInputStream zero_copy_input(&model_istream, 1 << 20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 1<<20 is a constant for multiple places, we'd better to define it as a constexpr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'll resolve it next week when I add the static_analysis job back.


- template: templates/clean-agent-build-directory-step.yml

- job: 'static_analysis'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I encounter the same problem too.

@snnn snnn merged commit aafe988 into master May 24, 2020
@snnn snnn deleted the snnn/model branch May 24, 2020 23:31
snnn pushed a commit that referenced this pull request Jun 1, 2020
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants