-
Notifications
You must be signed in to change notification settings - Fork 19
Refactor arithmetic ops to use std::variant + std::visit #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface
|
Not sure what happened in github action: https://github.com/Cytnx-dev/Cytnx/actions/runs/19826283869/job/56800387981?pr=723 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #723 +/- ##
==========================================
+ Coverage 32.35% 35.85% +3.50%
==========================================
Files 215 214 -1
Lines 36363 32020 -4343
Branches 14597 13303 -1294
==========================================
- Hits 11764 11481 -283
+ Misses 22659 18636 -4023
+ Partials 1940 1903 -37 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This is looking good - you've removed about 20,000 lines of code in the last couple of weeks? 😁 There is still a lot of repitition with the template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
auto type = cy_typeid_v<T>;
Storage Cnst(1, type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() =
Storage(Rt._impl->storage().size(), std::min(type, Rt.dtype()), Rt.device());
if (Rt.device() == Device.cpu) {
std::visit(
[&](auto *rptr) {
using TR = std::remove_pointer_t<decltype(rptr)>;
cytnx::linalg_internal::AddInternalImpl<type, TR>(
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {});
},
Rt.ptr());
} else {
#ifdef UNI_GPU
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_iitType][Rt.dtype()](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
#else
cytnx_error_msg(true, "[Add] fatal error, the tensor is on GPU without CUDA support.%s",
"\n");
#endif
}Now the type of Since But cytnx::linalg_internal::AddInternalImpl<lhs_type, rhs_type, out_type>(lhs, rhs, out);This involves either a triple dispatch of Now, for the GPU version, I just noticed: Cytnx/src/backend/linalg_internal_gpu/cuAdd_internal.cu Lines 5291 to 5300 in c5dd9c4
ARGH: the output type is hard-coded to be template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
// CPU path: use the templated AddInternalImpl directly, with the output type
// determined by type_promote().
if (Rt.device() == Device.cpu) {
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = cytnx::Scalar_init_interface::type_promote(lhs_type, rhs_type);
// Construct constant tensor and output tensor with correct promoted dtype
Tensor Cnst({1}, lhs_type); // one element, rank 1 tensor
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
// Dispatch to fully-typed templated kernel
cytnx::linalg_internal::AddInternalImpl(Cnst, Rt, out);
return out;
} else {
#ifdef UNI_GPU
// GPU path: existing cuAdd kernels assume the output dtype is std::min(lhs_type, rhs_type))
// so we must match this until the kernels
// are regenerated with correct type promotion semantics.
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = std::min(lhs_type, rhs_type);
Storage Cnst(1, lhs_type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_ii[lhs_type][rhs_type](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
return out;
#else
cytnx_error_msg(true, "[Add] fatal error: GPU tensor without CUDA support.%s", "\n");
#endif
}
}There is nothing we can do about the GPU version; in order to call the existing It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc. Someone should add some, with some annotations that some corner cases are currently expected to fail. For example, addition of a Footnotes
|
For HTTP error, it is fine to just rerun the failed test. |
Yeah, you’re right, it’s a critical problem. I will solve others issues later. |
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface