This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Commit 29d6f27
authored
Use RTC for elementwise and broadcast ops (#18622)
* Reapplying PR #17767
* Making RTC required
* Move cuda utils to src/common/cuda and refactor RTC part
* Unary ops via RTC
* Support binary_scalar forward
Remove elemwise_scatter_op.*
Fix BinaryScalar usage in NumPy
* Backward of binary scalar
* Binary forward
* Fix for binary_scalar
* Moving all binary forward to RTC
Reorganization
* Backward of binary ops
* Suuport broadcast
Add RTC to NumPy ops
* RTC for elementwise sum
Fixes
* RTC for backward usenone of broadcast
* RTC for broadcast bwd usein
* Remove non-RTC vectorization support
* Remove template from ReduceWorkspaceSize
* Fixes from rebase
* Guarding RTC usage behing MXNET_USE_CUDA
* More guards
* C++17 for CUDA code
* MixedUnaryBackwardInOut as RTC
* Removing unused variable
* Revert "C++17 for CUDA code"
This reverts commit b09090c.
* Get rid of CI tests without RTC
Get rid of if constexpr as CUDA 10 does not support it
* Fix lint
* Change a few more elemwise functions
Fix for too long value
* Fix large tensor build
* Another try with DBL_MAX
* Fix Windows compilation
* Fix the large int test
* Add the printing of error code value to CUDA_DRIVER_CALL
* Fix
* Fix binary scalar
* Get more information when cuLaunchKernel fails
* Going easy on Windows compiler
* Fix lint
* Reorganization to split strings due to Windows compilation problems
* Fix error with uninitialized value
* Fix handling of different types for backward of binary scalar
* Decreasing RTC overhead
* Fix lint and remove rest of mentions of ENABLE_RTC
* Jetson with RTC
* Fix the aws s3 command
* Debugging Windows failure
* More debugging of Windows failure
* Debug
* Fix the issue on Windows (long -> long long for 8B)
* libcuda.so for Jetson
* Enable debug information for RTC kernels and cleaning debug ptx dump
* Fix lint
* Try without linking the stub of libcuda.so to different place in Jetson
* Add docstring
* Answering review comments
* Unifying vectorization
* Fix
* Fixes for reduce ops
* Fix M=1 case
* Fixes from rebase
Fixes for mixed type gradient functions
Set the launch bounds on RTC kernels
* Fix
* Fix tests
* Adding tutorial for RTC
* Fixes after merge
* Fixes from review
* Change env var doc and undo the change to toctree1 parent bbc39fa commit 29d6f27
File tree
141 files changed
+7274
-3548
lines changed- 3rdparty/mshadow/mshadow
- ci
- docker
- jenkins
- config
- docs
- python_docs/python/tutorials/extend
- static_site/src/pages/api/faq
- include/mxnet
- python/mxnet
- contrib/amp/lists
- src
- c_api
- common
- cuda
- rtc
- engine
- imperative
- kvstore
- ndarray
- operator
- contrib
- nn
- fusion
- nn
- cudnn
- numpy
- linalg
- random
- quantization
- random
- tensor
- profiler
- storage
- tests/python
- gpu
- unittest
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
141 files changed
+7274
-3548
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
275 | | - | |
276 | 275 | | |
277 | 276 | | |
278 | 277 | | |
| |||
387 | 386 | | |
388 | 387 | | |
389 | 388 | | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | 389 | | |
396 | 390 | | |
397 | 391 | | |
| |||
1144 | 1138 | | |
1145 | 1139 | | |
1146 | 1140 | | |
1147 | | - | |
1148 | | - | |
1149 | | - | |
1150 | | - | |
1151 | | - | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | | - | |
1156 | | - | |
1157 | | - | |
1158 | | - | |
1159 | | - | |
1160 | | - | |
1161 | | - | |
1162 | | - | |
1163 | | - | |
1164 | | - | |
1165 | | - | |
1166 | | - | |
1167 | | - | |
1168 | | - | |
1169 | | - | |
1170 | | - | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
1175 | | - | |
1176 | | - | |
1177 | | - | |
1178 | | - | |
1179 | | - | |
1180 | | - | |
1181 | | - | |
1182 | | - | |
1183 | | - | |
1184 | | - | |
1185 | | - | |
1186 | | - | |
1187 | | - | |
1188 | | - | |
1189 | 1141 | | |
1190 | 1142 | | |
1191 | 1143 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | 82 | | |
84 | 83 | | |
85 | 84 | | |
| |||
547 | 546 | | |
548 | 547 | | |
549 | 548 | | |
550 | | - | |
551 | | - | |
| 549 | + | |
| 550 | + | |
552 | 551 | | |
553 | | - | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
561 | | - | |
| 552 | + | |
| 553 | + | |
562 | 554 | | |
563 | 555 | | |
564 | 556 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | 64 | | |
66 | 65 | | |
67 | 66 | | |
| |||
76 | 75 | | |
77 | 76 | | |
78 | 77 | | |
79 | | - | |
80 | 78 | | |
81 | 79 | | |
82 | 80 | | |
| |||
91 | 89 | | |
92 | 90 | | |
93 | 91 | | |
94 | | - | |
95 | 92 | | |
96 | 93 | | |
97 | 94 | | |
| |||
106 | 103 | | |
107 | 104 | | |
108 | 105 | | |
109 | | - | |
110 | 106 | | |
111 | 107 | | |
112 | 108 | | |
| |||
121 | 117 | | |
122 | 118 | | |
123 | 119 | | |
124 | | - | |
125 | 120 | | |
126 | 121 | | |
127 | 122 | | |
| |||
136 | 131 | | |
137 | 132 | | |
138 | 133 | | |
139 | | - | |
140 | 134 | | |
141 | 135 | | |
142 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
145 | | - | |
146 | 145 | | |
147 | 146 | | |
148 | 147 | | |
| |||
670 | 669 | | |
671 | 670 | | |
672 | 671 | | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | | - | |
677 | | - | |
678 | | - | |
679 | | - | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | 672 | | |
695 | 673 | | |
696 | 674 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
258 | 258 | | |
259 | 259 | | |
260 | 260 | | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | 261 | | |
276 | 262 | | |
277 | 263 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | 44 | | |
46 | 45 | | |
47 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
130 | 129 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
129 | 128 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
129 | 128 | | |
0 commit comments