Conversation
…`KrylovKit.svdsolve` pullback
Codecov ReportAttention: Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
pbrehmer
left a comment
There was a problem hiding this comment.
Thanks for getting this in, I like this a lot. I will implement the same thing in #150.
One thing I was wondering: Have you checked to which degree this gets rid of the gauge sensitivity warnings? In any case, I think this is a strict improvement so let's get it merged :-)
|
This doesn't fix any gauge sensitivity warnings (in fact it might make them a bit worse if the tolerance is lowered...), but combined with Jutho/KrylovKit.jl#123 it at least gets rid of most of the We can actually get rid of the warnings altogether by setting |
Co-authored-by: Lukas Devos <ldevos98@gmail.com>
|
Lost coverage now that #150 is merged since the default is the EDIT: apparently there's still coverage, but I have no clue from where. I'll try to add some more combinations to the CTMRG gradient test to be sure, maybe it's a good time for that anyway now that a lot of stability issues are solved. |
|
Sorry to keep dismissing the reviews, but it turns out using |
|
Okay, the gradient test at least gives an overview of the combinations that go wrong (see here). In summary, the CTMRG gradient computation is throwing an error for:
Example stacktrace: |
|
Hmm, I am confused by this since I feel this definitely worked before. Judging by the stacktrace, this is not a numerical problem but an issue with the algorithm creation? I will take a look now and fix it.
Well, the SVD rrule alg should have an effect since
It would make sense to use an iterative rrule only when the specified forward SVD algorithm is iterative since then the SVD cannot return the full U, S and V such that TensorKit's rrule cannot be used. If, however, TensorKit's |
I agree it does not make much sense to use the iterative rrule with a |
* Implement norm-preserving retractions * Drop implicit assumption that state has unit norm * Update src/algorithms/optimization/peps_optimization.jl Co-authored-by: Lukas Devos <ldevos98@gmail.com> * Increase tol for cotangent gauge sensitivity warnings * Fix typo * Not too high, not too low? * Definitely too low * Very arbitrary * Restore tolerance, reduce bond dimension instead? * No real difference, so restore bond dimension --------- Co-authored-by: Lukas Devos <ldevos98@gmail.com>
|
The problem was the typing of the SVD pullback. Apparently, some algorithm combinations led to the case of the TensorKit
I would say this should run since this is a valid algorithm combination in principle. Sure, one wouldn't use this for practical purposes but perhaps for methodological reason. Also given that the iterative rrules tend to have a few hickups, I find it useful to always test them in different combinations to see whether they work. |
…Kit.jl into lb/svdsolve_rrule_tol
|
Thanks a lot for the fixes @pbrehmer! Everything seems okay now, but the CTMRG tests time out because the gradient test covers a lot more combinations now. Since this is not super urgent I was thinking of taking some more time here and:
How does that sound? |
|
Likewise thanks for your work!
In my experience most of the time in that test is spent in first compiling Zygote (especially for the fermionic p-wave superconductor!) and then in all the combinations where |
|
I split off the gradient tests into a separate workflow and tried some more combinations for the p-wave superconductor gradient test. Everything runs, which is nice, but is seems that the only combination that gives a correct gradient for the current setup is Another issue is the compilation time and the runtimes for |
|
Just as a sidenote, if we feel this is warranted we can also increase the allowed time. I manually reduced this a bit to since this tends to indicate something fishy is going on, but if we know this is just how long it takes we can obviously increase that again |
|
I bumped the environment bond dimension, which fixes #158, and added in all working combinations for both models. I left out the This should now give an idea of how long things take. I can try to reduce the bond dimension a bit in the hopes that the p-wave tests then still pass, but still I think the only way of getting this to not time out is to either increase the limit (we'll see how far it gets now), or drop the |
|
It turns out that a very small environment bond dimension increase is enough to make the p-wave gradient tests pass, so I reduced this again. Still, the gradient tests time out severely. I think completing all of them would take double the time of the current timeout, so just increasing the time limit would not be ideal as the test suite would then take very long to complete. The only reasonable way to fix this aside from just increasing the time limit I can think of is to not test the trivial |
|
Which part of these tests are the expensive ones? Can we get away with doing a single gradient evaluation for all of them, and would that help? What I mean is, can we converge something, and then simply check if all methods give more or less the same gradient |
|
The really expensive ones are those where Zygote just automatically backpropagates through all CTMRG iterations to get the gradient (these >5 minutes per case versus around one minute for all other gradient modes). In general, it's really the gradient computation itself that takes the most time, so even reusing a pre-converged CTMRG environment would not really help speed things up much I think. |
|
Have we ever tried the trick of first computing the environment, and then doing AD through a limited number of iterations? |
|
I'm pretty sure that's what we're already doing in the test actually. As far as I can tell the environment is pre-converged first and then |
Okay that's not entirely true. We're pre-converging the environment once at the initial point but then for every retraction in the optimtest the gradient is computed on the full I also thought of decreasing the number of step sizes tried in the |
I would suggest cutting down on the combinations with
Note that |
|
Even after these cuts, two timeouts on v1.11, by at most few minutes. I don't think we should cut any more test combinations and I don't really see any more obvious ways of speeding things up. How would we feel about increasing the timeout by 15 minutes? |
|
You can do so by adding a |
pbrehmer
left a comment
There was a problem hiding this comment.
Thanks @leburgel for vastly improving the gradient tests! I left a comment on having a user option for gradient_alg=nothing but that can be addressed in another PR. We should keep in mind that the SVD interface is still quite messy but that's okay for now, I guess.
| ) | ||
| # instantiate because hook_pullback doesn't go through the keyword selector... | ||
| concrete_gradient_alg = if isnothing(gradient_alg) | ||
| nothing # TODO: add this to the PEPSKit.GradMode selector? |
There was a problem hiding this comment.
I was also considering this and I am not sure about it. From the user perspective it would make a lot of sense to have a :naive option which differentiates through all the CTMRG iterations via gradient_alg=nothing, mostly as a reference method. However, internally it would be weird to do this inside the GradMode constructor since Nothing is definitely not a subtype of GradMode. Of course we could ignore that for the sake usability but I am unsure about that.
In any case, this should probably be handled in a different PR, so let's keep this in mind for now.
There was a problem hiding this comment.
I think my main problem here was that I needed to instantiate the algorithm to use with hook_pullback, and my first attempt of calling GradMode(; alg=nothing) didn't work so I left the comment more as a reminder to myself. I don't think this is really much of a problem, as the keyword selector as a whole hadles calling fixedpoint(...; gradient_alg=nothing, ...) perfectly well. So we can open an issue and make it so we can use GradMode directly for this in the future, but I'm not sure if this is very useful.
A working example of how to do naive AD would solve everything as well, and I think we're anyway going to add that in the docs update.
An attempt to address (most?) occurrences of Jutho/KrylovKit.jl#110 we encounter, specifically the ones in #151 and #154. This will kind of be superseded by #150 right away, but I think it's anyway not a bad idea to address this directly as well in parallel.
There's a lot of extra warnings now with the lowered tolerance, but these are more a consequence of the precise check being performed in KrylovKit.jl rather than an actual issue, and I've confirmed that this is solved by Jutho/KrylovKit.jl#123.
One question I had here is what we want to do with the verbosity here. It seems that
KrylovKitChainRulesCoreExt.compute_svdsolve_pullback_datahandles a lot of its warnings based on the verbosity of the primal algorithm. Originally, this was just set to a dummyGKLalgorithm which effectively hasverbosity=1, giving a lot of warning messages even though by default therrule_alghasverbosity=-1. I at least set an explicitverbosity=1in the minimal primal algorithm for now so this is a bit more obvious, but I wonder if we should:KrylovKitChainRulesCoreExt.compute_svdsolve_pullback_data, so we can at least get a warning when the cotangent linear problem does not converge.rrule_alg.verbosity, so users can actually choose whether or not they want to see warnings. In particular, this would allow to suppress some previously unsuppressible logging spam if people want.Should also fix #158 once merged.