Fix test_cublas and relax tolerance #8108

areusch · 2021-05-21T16:31:49Z

Allows test to pass on ci-gpu with new 18.04 image.

@tkonolige @tqchen

* Allows test to pass on ci-gpu with new 18.04 image.

comaniac · 2021-05-21T23:04:01Z

FYI: It's possible that this error is caused by the CUBLAS math flag:

tvm/src/runtime/contrib/cublas/cublas.cc

Line 38 in 8131364

inline void CUBLASTryEnableTensorCore(cublasHandle_t hdl) {

According to the CUBLAS document, this flag is being deprecated. We also made some tests previously around this flag, and found that it is effective even for float32, meaning that CUBLAS kernel internally casts float32 to float16, does the computation, and casts the results back. As a result, this flag may introduce accuracy issue.

areusch · 2021-05-24T18:40:16Z

@comaniac i was sending this through CI for @tkonolige as he was out friday. I'll let him reply to your comment.

tkonolige · 2021-05-24T19:04:02Z

@comaniac Disabling the flag makes the tests pass. What should we do here? Accept lower accuracy for performance?

comaniac · 2021-05-24T19:11:19Z

@comaniac Disabling the flag makes the tests pass. What should we do here? Accept lower accuracy for performance?

I personally prefer to keep the accuracy, because it seems not right to tolerate 1e-2 for a single batch_matmul op. It means the end-to-end accuracy of all models with cublas.batch_matmul may be larger than 1e-2. cc @Hzfengsy @Laurawly as they added this flag at the time it hasn't been deprecated.

Hzfengsy · 2021-05-25T00:57:38Z

I also prefer to keep accuracy. Just as @comaniac said, 1e-2 is too low for larger end-to-end workloads

areusch · 2021-05-27T22:31:04Z

superseded by #8130

Fix test_cublas and relax tolerance.

1b89b37

* Allows test to pass on ci-gpu with new 18.04 image.

tkonolige mentioned this pull request May 25, 2021

[CUBLAS] Remove deprecated CUBLAS_TENSOR_OP_MATH flag #8130

Merged

areusch mentioned this pull request May 25, 2021

Update all ci- containers to reflect main #7995

Closed

17 tasks

areusch closed this May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_cublas and relax tolerance #8108

Fix test_cublas and relax tolerance #8108

Uh oh!

areusch commented May 21, 2021

Uh oh!

comaniac commented May 21, 2021

Uh oh!

areusch commented May 24, 2021

Uh oh!

tkonolige commented May 24, 2021

Uh oh!

comaniac commented May 24, 2021

Uh oh!

Hzfengsy commented May 25, 2021

Uh oh!

areusch commented May 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix test_cublas and relax tolerance #8108

Fix test_cublas and relax tolerance #8108

Uh oh!

Conversation

areusch commented May 21, 2021

Uh oh!

comaniac commented May 21, 2021

Uh oh!

areusch commented May 24, 2021

Uh oh!

tkonolige commented May 24, 2021

Uh oh!

comaniac commented May 24, 2021

Uh oh!

Hzfengsy commented May 25, 2021

Uh oh!

areusch commented May 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants