Skip to content

WIP: some changes to the ad test utils#180

Open
Jutho wants to merge 1 commit intomainfrom
jh/testimprovements
Open

WIP: some changes to the ad test utils#180
Jutho wants to merge 1 commit intomainfrom
jh/testimprovements

Conversation

@Jutho
Copy link
Member

@Jutho Jutho commented Feb 27, 2026

Some changes as discussed. Somehow the AD correctness tests fail for Mooncake LQ, whereas they don't do for Mooncake QR (which uses the same approach), nor for Enzyme or ChainRules (I think). I haven't figured out what is going on yet, but maybe some reviewer has an idea.

@Jutho Jutho changed the title some changes to the ad test utils WIP: some changes to the ad test utils Feb 27, 2026
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/pullbacks/svd.jl 89.25% <100.00%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

r = MatrixAlgebraKit.qr_rank(R)
function remove_qr_gauge_dependence!(ΔQ, ΔR, A, Q, R; rank_atol = MatrixAlgebraKit.default_pullback_rank_atol(R))
r = MatrixAlgebraKit.qr_rank(R; rank_atol)
Q₁ = @view Q[:, 1:r]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is now no longer used? Can you comment on the todo that is left? Is the idea that the zeroing out should use the rank, but the projecting out should only use the minmn?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is no longer used?

On the TODO: yes I think we can distinguish between the cases of rank-deficiency, and just the extra columns in Q if Q is square but resulting from qr_full applied to a full-rank tall matrix. In that case, Q should be completely fixed and differentiable. However, to treat the most general case (a rank deficient rectangular matrix), I have to redo the analysis because this was not covered in my notes ( https://jutho.github.io/LinAlgAD/main.html#general-result ): this is about the $Q_3$ part but I omitted this from the analysis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable Q1 is not used anymore I think

ΔQR = randn!.(copy.(QR))
remove_qr_gauge_dependence!(ΔQR..., A, QR...)
ΔQR = structured_randn!.(copy.(QR))
A isa Diagonal || remove_qr_gauge_dependence!(ΔQR..., A, QR...)
Copy link
Member

@lkdvos lkdvos Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to project out gauge dependence for rank-deficient Diagonals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I tried for my changes to be more stylistic, i.e. to be mostly functionally equivalent to the existing code. The existing Diagonal specialisation didn't have any gauge dependence removal, but yes, I agree it would make sense to also have this, for a A::Diagonal with some small or zero diagonal elements.

ΔQR = randn!.(copy.(QR))
remove_qr_gauge_dependence!(ΔQR..., A, QR...)
ΔQR = structured_randn!.(copy.(QR))
A isa Diagonal || remove_qr_gauge_dependence!(ΔQR..., A, QR...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

mul!(ΔN, Q, Q' * ΔN)
return ΔN
end
remove_left_null_gauge_dependence!(ΔN, A, N) = remove_qr_null_gauge_dependence!(ΔN, A, N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to remove one of the functions, if they are both the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, unless it is useful to know that there is a remove_method_gauge_dependence! for every method (or at least decomposition) we have.

r = MatrixAlgebraKit.lq_rank(L)
function remove_lq_gauge_dependence!(ΔL, ΔQ, A, L, Q; rank_atol = MatrixAlgebraKit.default_pullback_rank_atol(L))
r = MatrixAlgebraKit.lq_rank(L; rank_atol)
Q₁ = @view Q[1:r, :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

T = complex(eltype(A))
D = eig_vals(A)
ΔD = randn!(similar(A, complex(T), m))
ΔD = randn!(similar(D))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using similar, in the other functions copy. I don't have any preference, but let's be consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both were used, but I agree on consistency. I think similar is slightly more efficient, but maybe I was worried in some cases if similar would produce the correct type, i.e. the same output type, but I think it does for all the cases where we use it.

Comment on lines +405 to +406
ΔS = zero(S)
randn!(diagview(ΔS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this obvious that this has to be a diagonal variation? Are we checking for this anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had a comment stating that this is not strictly necessary, but apparently it got lost. I think only the diagonal of ΔS is used int he pullback (equivalent to projecting out all non-diagonal entries), so this is not strictly necessary, and we can just as well have structured_rand!(similar(S)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it is just that I tried to stay functionally equivalent to the old version, at least for this first iteration. Once the tests pass on this (which they seem to do, except for the LQ stuff), we can try to remove indeed these restrictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants