-
Notifications
You must be signed in to change notification settings - Fork 57
More tweaks #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
More tweaks #375
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB} | ||
| length(A) == length(B) || throw(DimensionMismatch(lazy"length of A ($(length(A))) does not match length of B ($(length(B))")) | ||
|
|
||
| Adata = parent(A) | ||
| Astr = stride(A, 1) | ||
| IA = A.offset | ||
|
|
||
| Bdata = parent(B) | ||
| Bstr = strides(B) | ||
|
|
||
| IB_1 = B.offset | ||
| # build index arrays | ||
| IAs = Int[] | ||
| IBs = Int[] | ||
| @inbounds for _ in axes(B, 2) | ||
| IB = IB_1 | ||
| for _ in axes(B, 1) | ||
| IA += Astr | ||
| append!(IAs, IA) | ||
| IB += Bstr[1] | ||
| append!(IBs, IB) | ||
| end | ||
| IB_1 += Bstr[2] | ||
| end | ||
| Adata[IAs] .= Bdata[IBs] | ||
|
|
||
| return A | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,7 +46,7 @@ function AbelianTreeTransformer(transform, p, Vdst, Vsrc) | |
| end | ||
|
|
||
| const _GenericTransformerData{T, N} = Tuple{ | ||
| Matrix{T}, | ||
| DenseMatrix{T}, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this change makes the types below abstractly typed, do we need this?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, but in that case we would really have to make that an additional type parameter in the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, it would have been helpful to have had a comment or anything that this was why they were there |
||
| Tuple{NTuple{N, Int}, Vector{Tuple{NTuple{N, Int}, Int}}}, | ||
| Tuple{NTuple{N, Int}, Vector{Tuple{NTuple{N, Int}, Int}}}, | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense to include, and should this not simply fall back to the default
copyto!?This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided
copyto!.I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just investigated this a bit more, couple comments:
Base.copyto!(A, B), which then dispatches to Strided.jlcopy!(sreshape(A, size(B)), B), which I think works with your last changes.It might be reasonable to turn around the logic here, and simply go from opt-out to opt-in, i.e.
TensorKit._copyto!(A, B) = copyto!(A, B)and then only specialize this for<:Vector+<:Memoryparent types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH all this might be obviated by the fixes that have now been merged into
StridedandStridedViews, right? Why don't I just nuke this and we can see if we need it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this function still bypasses all of the Strided stuff because in this really specific case I have a bit more information and could squeeze out a tiny bit more performance. Ultimately though, if this turns out to be too much of a hassle it might be reasonable to choose maintainability and simply replace this at the callsites