Skip to content

Expose DirectML provider to python (conflicts resolved from #3359)#4630

Merged
hariharans29 merged 8 commits intomicrosoft:masterfrom
cameronmaske:dml
Sep 8, 2020
Merged

Expose DirectML provider to python (conflicts resolved from #3359)#4630
hariharans29 merged 8 commits intomicrosoft:masterfrom
cameronmaske:dml

Conversation

@cameronmaske
Copy link
Contributor

@cameronmaske cameronmaske commented Jul 27, 2020

Description: Exposed the DirectML (DML) provider to Python.
This is based on #3359 but addresses the conflicts on that branch and include the DirectML.dll in the setup.py.

Motivation and Context
Previously the DML provider was not exposed to python.
Fixes #3358

@cameronmaske cameronmaske requested a review from a team as a code owner July 27, 2020 09:38
@ghost
Copy link

ghost commented Jul 27, 2020

CLA assistant check
All CLA requirements met.

fdwr
fdwr previously approved these changes Aug 5, 2020
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@faxu
Copy link
Contributor

faxu commented Aug 8, 2020

/azp run Linux CPU CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,MacOS NoContribops CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Windows CPU CI Pipeline

@faxu
Copy link
Contributor

faxu commented Aug 8, 2020

/azp run orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-mac-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@faxu
Copy link
Contributor

faxu commented Aug 12, 2020

@cameronmaske looks like there are some more conflicts to resolve on this.

@hariharans29
Copy link
Member

@cameronmaske @fdwr - does this PR look okay now ? (Resolved conflicts but making sure it makes sense)

@hariharans29
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@hariharans29
Copy link
Member

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@hariharans29
Copy link
Member

The Windows GPU build are failing. Is it because it isn't able to locate the DirectML.dll for some reason ?

@faxu
Copy link
Contributor

faxu commented Aug 17, 2020

@cameronmaske Can you take a look at the failure issue and also resolve conflicts?

@cameronmaske
Copy link
Contributor Author

@faxu I’m currently traveling but will try and resolve these issue when I return next week

@cameronmaske
Copy link
Contributor Author

cameronmaske commented Aug 24, 2020

Is there any way to see the full error output of why the build failed? Clicking Details just gives the error "Build Not Found"

@cameronmaske
Copy link
Contributor Author

@faxu Issue should be resolved. For context, the DirectML.dll was not being copied over to the /capi directory to be included with the python wheel. I've updated the cmake files to do that now.

@faxu
Copy link
Contributor

faxu commented Aug 24, 2020

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@faxu
Copy link
Contributor

faxu commented Aug 24, 2020

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@faxu
Copy link
Contributor

faxu commented Aug 25, 2020

@cameronmaske Still some issues, it seems...

@cameronmaske
Copy link
Contributor Author

cameronmaske commented Aug 25, 2020

@faux Reading over the test failures, I'd like some guidance from the maintainers about which route to take to address this, as the direction seems quite opinionated.

The error occurring is...

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Memory pattern must be disabled before registering DMLExecutionProvider

The DML provider does not support enable_mem_pattern = True which is the default behavior. There are two approaches I can think of to address this.

My thoughts are either,

Alter the tests to explicitly disable that option when running with the DMLExecutionProvider. Some special care will need to be taken for when the session options are loaded from the model file.

or

Change the session options to default enable_mem_pattern to False if the DMLExecutionProvider is used. This may be harder to do (I'm very new to the codebase, so guidance is appreciated if this is the route to do).

Maybe there are different approaches, appreciate any thoughts/guidance.

@jywu-msft
Copy link
Member

@faux Reading over the test failures, I'd like some guidance from the maintainers about which route to take to address this, as the direction seems quite opinionated.

The error occurring is...

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Memory pattern must be disabled before registering DMLExecutionProvider

The DML provider does not support enable_mem_pattern = True which is the default behavior. There are two approaches I can think of to address this.

My thoughts are either,

Alter the tests to explicitly disable that option when running with the DMLExecutionProvider. Some special care will need to be taken for when the session options are loaded from the model file.

or

Change the session options to default enable_mem_pattern to False if the DMLExecutionProvider is used. This may be harder to do (I'm very new to the codebase, so guidance is appreciated if this is the route to do).

Maybe there are different approaches, appreciate any thoughts/guidance.

Between those two options, I would lean towards the latter. enable_memory_pattern is an optimization, so if it is incompatible with DMLExecutionProvider, it seems better to make it a no-op, rather than place onus on user to know what optimizations are compatible with which provider.
Defaulting the correct option seems also preferable to changing all the various test code to special case DMLExecutionProvider to instantiate sess_options and set enable_mem_pattern to false.
As for how to default enable_mem_pattern to false, I think we can either remove the error message (and change the setting there). or perhaps better would be to set enable_mem_pattern to false in InferenceSession::ConstructorCommon
sometime after FinalizeSessionOptions()
@pranavsharma , what do you think?

@hariharans29
Copy link
Member

This error should go way now, please rebase with master and I can merge this PR. Thanks.

@hariharans29
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@hariharans29
Copy link
Member

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@hariharans29
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@hariharans29
Copy link
Member

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@hariharans29 hariharans29 merged commit 4553b2e into microsoft:master Sep 8, 2020
@cameronmaske
Copy link
Contributor Author

@hariharans29 Many thanks for the help in getting this merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to use DirectML execution provider from Python?

5 participants