Skip to content

Scale rrule_alg tolerance with singular value magnitude when using KrylovKit.svdsolve pullback#155

Merged
leburgel merged 24 commits intomasterfrom
lb/svdsolve_rrule_tol
Mar 17, 2025
Merged

Scale rrule_alg tolerance with singular value magnitude when using KrylovKit.svdsolve pullback#155
leburgel merged 24 commits intomasterfrom
lb/svdsolve_rrule_tol

Conversation

@leburgel
Copy link
Member

@leburgel leburgel commented Mar 11, 2025

An attempt to address (most?) occurrences of Jutho/KrylovKit.jl#110 we encounter, specifically the ones in #151 and #154. This will kind of be superseded by #150 right away, but I think it's anyway not a bad idea to address this directly as well in parallel.

There's a lot of extra warnings now with the lowered tolerance, but these are more a consequence of the precise check being performed in KrylovKit.jl rather than an actual issue, and I've confirmed that this is solved by Jutho/KrylovKit.jl#123.

One question I had here is what we want to do with the verbosity here. It seems that KrylovKitChainRulesCoreExt.compute_svdsolve_pullback_data handles a lot of its warnings based on the verbosity of the primal algorithm. Originally, this was just set to a dummy GKL algorithm which effectively has verbosity=1, giving a lot of warning messages even though by default the rrule_alg has verbosity=-1. I at least set an explicit verbosity=1 in the minimal primal algorithm for now so this is a bit more obvious, but I wonder if we should:

  • Keep an explicit nonzero verbosity in the primal alg when calling KrylovKitChainRulesCoreExt.compute_svdsolve_pullback_data, so we can at least get a warning when the cotangent linear problem does not converge.
  • Set the primal alg verbosity to rrule_alg.verbosity, so users can actually choose whether or not they want to see warnings. In particular, this would allow to suppress some previously unsuppressible logging spam if people want.

Should also fix #158 once merged.

@codecov
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/utility/svd.jl 86.95% 3 Missing ⚠️
Files with missing lines Coverage Δ
src/utility/svd.jl 89.43% <86.95%> (-1.74%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leburgel leburgel requested a review from pbrehmer March 12, 2025 09:20
pbrehmer
pbrehmer previously approved these changes Mar 12, 2025
Copy link
Collaborator

@pbrehmer pbrehmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this in, I like this a lot. I will implement the same thing in #150.

One thing I was wondering: Have you checked to which degree this gets rid of the gauge sensitivity warnings? In any case, I think this is a strict improvement so let's get it merged :-)

@leburgel
Copy link
Member Author

This doesn't fix any gauge sensitivity warnings (in fact it might make them a bit worse if the tolerance is lowered...), but combined with Jutho/KrylovKit.jl#123 it at least gets rid of most of the cotangent linear problem returned unexpected result warnings, which is something.

We can actually get rid of the warnings altogether by setting verbosity=rrule_alg.verbosity in the dummy forward algorithm and then controlling the verbosity from there instead of hardcoding verbosity=1, but I was a bit reluctant to do this since this also means that there's no warnings if the cotangent linear problem fails to converge at all.

Co-authored-by: Lukas Devos <ldevos98@gmail.com>
@leburgel
Copy link
Member Author

leburgel commented Mar 12, 2025

Lost coverage now that #150 is merged since the default is the TensorKit.tsvd! rrule now. Seems like that one's faster too so in itself that's very good, but I'll tweak some tests to get back coverage for the KrylovKit.svdsolve rrule.

EDIT: apparently there's still coverage, but I have no clue from where. I'll try to add some more combinations to the CTMRG gradient test to be sure, maybe it's a good time for that anyway now that a lot of stability issues are solved.

lkdvos
lkdvos previously approved these changes Mar 12, 2025
@leburgel
Copy link
Member Author

Sorry to keep dismissing the reviews, but it turns out using rrule_alg=:arnoldi in the svd adjoint now breaks the CTMRG gradients. This probably didn't show because the default was changed to rrule_alg=:tsvd at the same time so tests kept passing, but probably best to fix this here as well.

@leburgel
Copy link
Member Author

leburgel commented Mar 13, 2025

Okay, the gradient test at least gives an overview of the combinations that go wrong (see here). In summary, the CTMRG gradient computation is throwing an error for:

  • gradient_alg=nothing, gradient_iterscheme=:fixed or gradient_iterscheme=:diffgauge, svd_rrule_alg=:arnoldi
    • This seems like it really should not give an issue, since neither the iterscheme nor the SVD rrule alg should not have any effect when gradient_alg is nothing. However the stacktrace (see below) is the same as in the other cases, so maybe it's the same underlying problem.
  • gradient_iterscheme=:diffgauge, svd_rrule_alg=:arnoldi
    • It seems using the iterative SVD rrule in the diffgauge does not work since there's some incompatibility in the tsvd calls (see below). I can try to pin this down further, but maybe an obvious question first. Does it make sense to use the iterative rrule with the diffgauge iterscheme in the first place @pbrehmer? And am I passing the kwargs properly to do so? For reference, I'm constructing the CTMRG algorithms as
contrete_ctmrg_alg = PEPSKit.CTMRGAlgorithm(;
    alg=ctmrg_alg,
    verbosity=ctmrg_verbosity,
    projector_alg=projector_alg,
    svd_alg=(; rrule_alg=(; alg=svd_rrule_alg)),
)
Example stacktrace:
ctmrg_alg=:sequential, projector_alg=:halfinfinite, svd_rrule_alg=:arnoldi, gradient_alg=:nothing, gradient_iterscheme=:fixed: Error During Test at /home/runner/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:45
  Got exception outside of a @test
  TaskFailedException
  Stacktrace:
    [1] wait(t::Task)
      @ Base ./task.jl:370
    [2] fetch
      @ ./task.jl:390 [inlined]
    [3] fetch
      @ ~/.julia/packages/StableTasks/cSY7d/src/internals.jl:9 [inlined]
    [4] mapreduce_first
      @ ./reduce.jl:421 [inlined]
    [5] _mapreduce(f::typeof(fetch), op::typeof(BangBang.append!!), ::IndexLinear, A::Vector{StableTasks.StableTask{Any}})
      @ Base ./reduce.jl:432
    [6] _mapreduce_dim(f::Function, op::Function, ::Base._InitialValue, A::Vector{StableTasks.StableTask{Any}}, ::Colon)
      @ Base ./reducedim.jl:337
    [7] mapreduce(f::Function, op::Function, A::Vector{StableTasks.StableTask{Any}})
      @ Base ./reducedim.jl:329
    [8] _tmapreduce(f::Function, op::Function, Arrs::Tuple{Vector{UnitRange{Int64}}}, ::Type{Any}, scheduler::OhMyThreads.Schedulers.DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, mapreduce_kwargs::@NamedTuple{})
      @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:113
    [9] #tmapreduce#22
      @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:85 [inlined]
   [10] tmapreduce
      @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:69 [inlined]
   [11] _tmap(::OhMyThreads.Schedulers.DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, ::Function, ::Vector{Tuple{Int64, Int64}})
      @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:451
   [12] #tmap#102
      @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:373 [inlined]
   [13] tmap
      @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:337 [inlined]
   [14] rrule(config::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::typeof(PEPSKit.dtmap), f::Function, A::Vector{Tuple{Int64, Int64}}; kwargs::@Kwargs{})
      @ PEPSKit ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/diffable_threads.jl:16
   [15] rrule(config::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::typeof(PEPSKit.dtmap), f::Function, A::Vector{Tuple{Int64, Int64}})
      @ PEPSKit ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/diffable_threads.jl:13
   [16] chain_rrule(::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Function, ::Function, ::Vararg{Any})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:233
   [17] macro expansion
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0 [inlined]
   [18] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.dtmap), ::PEPSKit.var"#328#329"{PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation}}, ::Vector{Tuple{Int64, Int64}})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:91
   [19] sequential_projectors
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/sequential.jl:78 [inlined]
   [20] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.sequential_projectors), ::Int64, ::PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [21] ctmrg_leftmove
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/sequential.jl:49 [inlined]
   [22] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.ctmrg_leftmove), ::Int64, ::PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.SequentialCTMRG)
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [23] ctmrg_iteration
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/sequential.jl:59 [inlined]
   [24] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.ctmrg_iteration), ::PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.SequentialCTMRG)
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [25] #255
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/ctmrg.jl:98 [inlined]
   [26] _pullback(::Zygote.Context{false}, ::PEPSKit.var"#255#259"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.SequentialCTMRG, MPSKit.IterativeLoggers.IterLog})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [27] #35
      @ ~/.julia/packages/LoggingExtras/VLO3o/src/verbosity.jl:103 [inlined]
   [28] #rrule_via_ad#59
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:270 [inlined]
   [29] rrule_via_ad
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:258 [inlined]
   [30] #860
      @ ~/.julia/packages/ChainRules/Q16hj/src/rulesets/Base/CoreLogging.jl:10 [inlined]
   [31] with_logstate(f::ChainRules.var"#860#861"{Zygote.ZygoteRuleConfig{Zygote.Context{false}}, LoggingExtras.var"#35#37"{PEPSKit.var"#255#259"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.SequentialCTMRG, MPSKit.IterativeLoggers.IterLog}}}, logstate::Base.CoreLogging.LogState)
      @ Base.CoreLogging ./logging/logging.jl:522
   [32] with_logger(f::Function, logger::LoggingExtras.EarlyFilteredLogger{LoggingExtras.LevelOverrideLogger{Base.CoreLogging.ConsoleLogger}, LoggingExtras.var"#36#38"{Int64}})
      @ Base.CoreLogging ./logging/logging.jl:632
   [33] rrule(rc::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::typeof(Base.CoreLogging.with_logger), f::Function, logger::LoggingExtras.EarlyFilteredLogger{LoggingExtras.LevelOverrideLogger{Base.CoreLogging.ConsoleLogger}, LoggingExtras.var"#36#38"{Int64}})
      @ ChainRules ~/.julia/packages/ChainRules/Q16hj/src/rulesets/Base/CoreLogging.jl:9
   [34] chain_rrule(::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Function, ::Function, ::Vararg{Any})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:233
   [35] macro expansion
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0 [inlined]
   [36] _pullback(::Zygote.Context{false}, ::typeof(Base.CoreLogging.with_logger), ::LoggingExtras.var"#35#37"{PEPSKit.var"#255#259"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.SequentialCTMRG, MPSKit.IterativeLoggers.IterLog}}, ::LoggingExtras.EarlyFilteredLogger{LoggingExtras.LevelOverrideLogger{Base.CoreLogging.ConsoleLogger}, LoggingExtras.var"#36#38"{Int64}})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:91
   [37] #withlevel#34
      @ ~/.julia/packages/LoggingExtras/VLO3o/src/verbosity.jl:99 [inlined]
   [38] _pullback(::Zygote.Context{false}, ::LoggingExtras.var"##withlevel#34", ::Int64, ::typeof(LoggingExtras.withlevel), ::PEPSKit.var"#255#259"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.SequentialCTMRG, MPSKit.IterativeLoggers.IterLog}, ::Base.CoreLogging.LogLevel)
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [39] withlevel
      @ ~/.julia/packages/LoggingExtras/VLO3o/src/verbosity.jl:98 [inlined]
  --- the above 2 lines are repeated 1 more time ---
   [42] _pullback(::Zygote.Context{false}, ::typeof(Core.kwcall), ::@NamedTuple{verbosity::Int64}, ::typeof(LoggingExtras.withlevel), ::PEPSKit.var"#255#259"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.SequentialCTMRG, MPSKit.IterativeLoggers.IterLog})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [43] leading_boundary
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/ctmrg.jl:94 [inlined]
   [44] _pullback(::Zygote.Context{false}, ::typeof(MPSKit.leading_boundary), ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, ::PEPSKit.SequentialCTMRG)
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [45] _apply
      @ ./boot.jl:946 [inlined]
   [46] adjoint
      @ ~/.julia/packages/Zygote/3To5I/src/lib/lib.jl:212 [inlined]
   [47] _pullback
      @ ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:67 [inlined]
   [48] #leading_boundary#260
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/ctmrg.jl:115 [inlined]
   [49] _apply
      @ ./boot.jl:946 [inlined]
   [50] adjoint
      @ ~/.julia/packages/Zygote/3To5I/src/lib/lib.jl:212 [inlined]
   [51] _pullback
      @ ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:67 [inlined]
   [52] leading_boundary
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/ctmrg.jl:114 [inlined]
   [53] #rrule_via_ad#59
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:270 [inlined]
   [54] rrule_via_ad
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:258 [inlined]
   [55] _rrule
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/hook_pullback.jl:47 [inlined]
   [56] #rrule#50
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/hook_pullback.jl:33 [inlined]
   [57] rrule
      @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/hook_pullback.jl:29 [inlined]
   [58] chain_rrule_kw
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:245 [inlined]
   [59] macro expansion
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0 [inlined]
   [60] _pullback(::Zygote.Context{false}, ::typeof(Core.kwcall), ::@NamedTuple{alg_rrule::Nothing}, ::typeof(PEPSKit.hook_pullback), ::typeof(MPSKit.leading_boundary), ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}, ::PEPSKit.SequentialCTMRG)
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:91
   [61] #2
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:78 [inlined]
   [62] _pullback(ctx::Zygote.Context{false}, f::Main.var"##Gradients#231".var"#2#4"{PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, Int64, Nothing, PEPSKit.SequentialCTMRG}, args::PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
   [63] pullback(f::Function, cx::Zygote.Context{false}, args::PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface.jl:96
   [64] pullback
      @ ~/.julia/packages/Zygote/3To5I/src/compiler/interface.jl:94 [inlined]
   [65] withgradient(f::Function, args::PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}})
      @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface.jl:211
   [66] #1
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:77 [inlined]
   [67] optimtest(fg::Main.var"##Gradients#231".var"#1#3"{Int64, Nothing, PEPSKit.SequentialCTMRG}, x::Tuple{PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}}, d::PEPSKit.InfinitePEPS{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}; alpha::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, retract::typeof(PEPSKit.peps_retract), inner::typeof(PEPSKit.real_inner))
      @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/OptimKit.jl:132
   [68] macro expansion
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:70 [inlined]
   [69] macro expansion
      @ /opt/hostedtoolcache/julia/1.11.4/x64/share/julia/stdlib/v1.11/Test/src/Test.jl:1793 [inlined]
   [70] macro expansion
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:45 [inlined]
   [71] macro expansion
      @ /opt/hostedtoolcache/julia/1.11.4/x64/share/julia/stdlib/v1.11/Test/src/Test.jl:1793 [inlined]
   [72] top-level scope
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/ctmrg/gradients.jl:1812
   [73] include(mod::Module, _path::String)
      @ Base ./Base.jl:557
   [74] include(x::String)
      @ Main.var"##Gradients#231" ~/.julia/packages/SafeTestsets/raUNr/src/SafeTestsets.jl:28
   [75] macro expansion
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/runtests.jl:18 [inlined]
   [76] macro expansion
      @ /opt/hostedtoolcache/julia/1.11.4/x64/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [77] macro expansion
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/runtests.jl:18 [inlined]
   [78] top-level scope
      @ ~/.julia/packages/SafeTestsets/raUNr/src/SafeTestsets.jl:30
   [79] eval(m::Module, e::Any)
      @ Core ./boot.jl:430
   [80] macro expansion
      @ ~/.julia/packages/SafeTestsets/raUNr/src/SafeTestsets.jl:28 [inlined]
   [81] macro expansion
      @ ./timing.jl:581 [inlined]
   [82] macro expansion
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/runtests.jl:17 [inlined]
   [83] macro expansion
      @ ./timing.jl:581 [inlined]
   [84] top-level scope
      @ ~/work/PEPSKit.jl/PEPSKit.jl/test/runtests.jl:315
   [85] include(fname::String)
      @ Main ./sysimg.jl:38
   [86] top-level scope
      @ none:6
   [87] eval
      @ ./boot.jl:430 [inlined]
   [88] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:296
  
      nested task error: TypeError: in keyword argument alg, expected Union{TensorKit.SDD, TensorKit.SVD}, got a value of type PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}
      Stacktrace:
        [1] kwcall(::@NamedTuple{alg::PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, trunc::TensorKit.TruncationSpace{TensorKit.ComplexSpace}, p::Int64}, ::typeof(ChainRulesCore.rrule), ::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Function, ::TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}})
          @ ChainRulesCore ~/.julia/packages/ChainRulesCore/U6wNx/src/rules.jl:144
        [2] chain_rrule_kw
          @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:245 [inlined]
        [3] macro expansion
          @ ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0 [inlined]
        [4] _pullback(::Zygote.Context{false}, ::typeof(Core.kwcall), ::@NamedTuple{alg::PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, trunc::TensorKit.TruncationSpace{TensorKit.ComplexSpace}, p::Int64}, ::typeof(TensorKit.tsvd!), ::TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:91
        [5] #tsvd!#27
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/svd.jl:110 [inlined]
        [6] _pullback(::Zygote.Context{false}, ::PEPSKit.var"##tsvd!#27", ::TensorKit.TruncationSpace{TensorKit.ComplexSpace}, ::Int64, ::typeof(TensorKit.tsvd!), ::TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}}, ::PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
        [7] tsvd!
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/svd.jl:109 [inlined]
        [8] _pullback(::Zygote.Context{false}, ::typeof(Core.kwcall), ::@NamedTuple{trunc::TensorKit.TruncationSpace{TensorKit.ComplexSpace}}, ::typeof(TensorKit.tsvd!), ::TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}}, ::PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
        [9] compute_projector
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/projectors.jl:149 [inlined]
       [10] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.compute_projector), ::Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 3, Vector{ComplexF64}}}, ::Tuple{Int64, Int64, Int64}, ::PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, TensorKit.TruncationSpace{TensorKit.ComplexSpace}})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
       [11] sequential_projectors
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/sequential.jl:94 [inlined]
       [12] _pullback(::Zygote.Context{false}, ::typeof(PEPSKit.sequential_projectors), ::Tuple{Int64, Int64, Int64}, ::PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, ::PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, ::PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, TensorKit.TruncationSpace{TensorKit.ComplexSpace}})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
       [13] #328
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/algorithms/ctmrg/sequential.jl:80 [inlined]
       [14] _pullback(ctx::Zygote.Context{false}, f::PEPSKit.var"#328#329"{PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation}}, args::Tuple{Int64, Int64})
          @ Zygote ~/.julia/packages/Zygote/3To5I/src/compiler/interface2.jl:0
       [15] #rrule_via_ad#59
          @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:270 [inlined]
       [16] rrule_via_ad
          @ ~/.julia/packages/Zygote/3To5I/src/compiler/chainrules.jl:258 [inlined]
       [17] #15
          @ ~/work/PEPSKit.jl/PEPSKit.jl/src/utility/diffable_threads.jl:17 [inlined]
       [18] iterate
          @ ./generator.jl:48 [inlined]
       [19] _collect(c::SubArray{Tuple{Int64, Int64}, 1, Vector{Tuple{Int64, Int64}}, Tuple{UnitRange{Int64}}, true}, itr::Base.Generator{SubArray{Tuple{Int64, Int64}, 1, Vector{Tuple{Int64, Int64}}, Tuple{UnitRange{Int64}}, true}, PEPSKit.var"#15#18"{Zygote.ZygoteRuleConfig{Zygote.Context{false}}, PEPSKit.var"#328#329"{PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation}}}}, ::Base.EltypeUnknown, isz::Base.HasShape{1})
          @ Base ./array.jl:811
       [20] collect_similar
          @ ./array.jl:720 [inlined]
       [21] map
          @ ./abstractarray.jl:3371 [inlined]
       [22] #132
          @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:448 [inlined]
       [23] mapreduce_first
          @ ./reduce.jl:421 [inlined]
       [24] _mapreduce(f::OhMyThreads.Implementation.var"#132#135"{PEPSKit.var"#15#18"{Zygote.ZygoteRuleConfig{Zygote.Context{false}}, PEPSKit.var"#328#329"{PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation}}}, Tuple{Vector{Tuple{Int64, Int64}}}}, op::typeof(BangBang.append!!), ::IndexLinear, A::SubArray{UnitRange{Int64}, 1, Vector{UnitRange{Int64}}, Tuple{UnitRange{Int64}}, true})
          @ Base ./reduce.jl:432
       [25] _mapreduce_dim
          @ ./reducedim.jl:337 [inlined]
       [26] mapreduce
          @ ./reducedim.jl:329 [inlined]
       [27] #27
          @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:110 [inlined]
       [28] (::OhMyThreads.Implementation.var"#28#36"{StableTasks.AtomicRef{Any}, OhMyThreads.Implementation.var"#27#35"{Tuple{SubArray{UnitRange{Int64}, 1, Vector{UnitRange{Int64}}, Tuple{UnitRange{Int64}}, true}}, @NamedTuple{}, OhMyThreads.Implementation.var"#132#135"{PEPSKit.var"#15#18"{Zygote.ZygoteRuleConfig{Zygote.Context{false}}, PEPSKit.var"#328#329"{PEPSKit.InfiniteSquareNetwork{Tuple{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 4, Vector{ComplexF64}}}}, PEPSKit.CTMRGEnv{TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 1, 1, Vector{ComplexF64}}, TensorKit.TensorMap{ComplexF64, TensorKit.ComplexSpace, 3, 1, Vector{ComplexF64}}}, PEPSKit.HalfInfiniteProjector{PEPSKit.SVDAdjoint{TensorKit.SDD, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}, Nothing}, PEPSKit.FixedSpaceTruncation}}}, Tuple{Vector{Tuple{Int64, Int64}}}}, typeof(BangBang.append!!)}})()
          @ OhMyThreads.Implementation ~/.julia/packages/StableTasks/cSY7d/src/internals.jl:101

@pbrehmer
Copy link
Collaborator

Hmm, I am confused by this since I feel this definitely worked before. Judging by the stacktrace, this is not a numerical problem but an issue with the algorithm creation? I will take a look now and fix it.

...the SVD rrule alg should not have any effect when gradient_alg is nothing

Well, the SVD rrule alg should have an effect since nothing also just differentiates through all the tsvd! calls which use the rrule alg that is specified in the SVDAdjoint. Still, this should work.

Does it make sense to use the iterative rrule with the diffgauge iterscheme in the first place?

It would make sense to use an iterative rrule only when the specified forward SVD algorithm is iterative since then the SVD cannot return the full U, S and V such that TensorKit's rrule cannot be used. If, however, TensorKit's tsvd! is used in the forward pass anyway, then of course the rrule algorithm shouldn't be iterative.

@leburgel
Copy link
Member Author

It would make sense to use an iterative rrule only when the specified forward SVD algorithm is iterative since then the SVD cannot return the full U, S and V such that TensorKit's rrule cannot be used. If, however, TensorKit's tsvd! is used in the forward pass anyway, then of course the rrule algorithm shouldn't be iterative.

I agree it does not make much sense to use the iterative rrule with a tsvd forward pass, but my question was more whether this should actually run or not (even if it's not a smart thing to do). If we don't need this to run, we should probably explicitly disallow this through the keyword selector.

pbrehmer and others added 3 commits March 13, 2025 11:51
* Implement norm-preserving retractions

* Drop implicit assumption that state has unit norm

* Update src/algorithms/optimization/peps_optimization.jl

Co-authored-by: Lukas Devos <ldevos98@gmail.com>

* Increase tol for cotangent gauge sensitivity warnings

* Fix typo

* Not too high, not too low?

* Definitely too low

* Very arbitrary

* Restore tolerance, reduce bond dimension instead?

* No real difference, so restore bond dimension

---------

Co-authored-by: Lukas Devos <ldevos98@gmail.com>
@pbrehmer
Copy link
Collaborator

The problem was the typing of the SVD pullback. Apparently, some algorithm combinations led to the case of the TensorKit _tsvd function being used which explicitly restricts to Union{SDD,SVD} for the alg keyword. I circumvented this by handling all tsvd calls inside PEPSKit now (which I anyway find cleaner for the moment, given that the SVD interface will be refactored at some point) and defining the rrule on all the PEPSKit tsvd!s.

I agree it does not make much sense to use the iterative rrule with a tsvd forward pass, but my question was more whether this should actually run or not (even if it's not a smart thing to do).

I would say this should run since this is a valid algorithm combination in principle. Sure, one wouldn't use this for practical purposes but perhaps for methodological reason. Also given that the iterative rrules tend to have a few hickups, I find it useful to always test them in different combinations to see whether they work.

@leburgel
Copy link
Member Author

leburgel commented Mar 13, 2025

Thanks a lot for the fixes @pbrehmer! Everything seems okay now, but the CTMRG tests time out because the gradient test covers a lot more combinations now. Since this is not super urgent I was thinking of taking some more time here and:

  • Merging CTMRG support for PEPS-PEPO-PEPS networks #134 first
  • Splitting off the CTMRG gradient test into a separate workflow since this is very big just by itself
  • Adding some extra combinations for the p-wave superconductor test to ensure the stability improvements actually make everything usable for more challenging models as well

How does that sound?

@pbrehmer
Copy link
Collaborator

Likewise thanks for your work!

Everything seems okay now, but the CTMRG tests time out because the gradient test covers a lot more combinations now.

In my experience most of the time in that test is spent in first compiling Zygote (especially for the fermionic p-wave superconductor!) and then in all the combinations where gradient_alg=nothing. The rest usually is relatively fast. So it's good to keep that in mind if we want to speed up that test.

@leburgel
Copy link
Member Author

leburgel commented Mar 14, 2025

I split off the gradient tests into a separate workflow and tried some more combinations for the p-wave superconductor gradient test. Everything runs, which is nice, but is seems that the only combination that gives a correct gradient for the current setup is ctmrg_alg=:sequential, projector_alg=:halfinfinite and gradient_iterscheme=:diffgauge. The other CTMRG, projector and iterscheme flavors fail. I filed a separate issue on this (#158) and will have a go at figuring this out for a follow up, but I think actually solving this is a bit beyond the scope of this PR.

Another issue is the compilation time and the runtimes for gradient_alg=nothing, which really eat up most of the time limit of the workflow. I'll trim things down here until they no longer time out, but again we should probably figure out how we can test more combinations within a reasonable amount of time in a follow-up.

@lkdvos
Copy link
Member

lkdvos commented Mar 14, 2025

Just as a sidenote, if we feel this is warranted we can also increase the allowed time. I manually reduced this a bit to since this tends to indicate something fishy is going on, but if we know this is just how long it takes we can obviously increase that again

@leburgel
Copy link
Member Author

leburgel commented Mar 14, 2025

I bumped the environment bond dimension, which fixes #158, and added in all working combinations for both models. I left out the gradient_alg = nothing tests for the p-wave superconductor, since this takes over 15 minutes per test case at 6 possible combinations so that's pretty much hopeless.

This should now give an idea of how long things take. I can try to reduce the bond dimension a bit in the hopes that the p-wave tests then still pass, but still I think the only way of getting this to not time out is to either increase the limit (we'll see how far it gets now), or drop the gradient_alg = nothing tests for the Heisenberg model as well.

@leburgel
Copy link
Member Author

It turns out that a very small environment bond dimension increase is enough to make the p-wave gradient tests pass, so I reduced this again. Still, the gradient tests time out severely. I think completing all of them would take double the time of the current timeout, so just increasing the time limit would not be ideal as the test suite would then take very long to complete.

The only reasonable way to fix this aside from just increasing the time limit I can think of is to not test the trivial gradient_alg=nothing for all other algorithm combinations. These cases take several minutes each and really eat up the testing time. We could allow one test with gradient_alg=nothing per model just to make sure this works in the first place, and then skip this gradient mode for all other combinations. This is not ideal since we lose some coverage on naively differentiating through all algorithms, but all the building blocks of this are still covered by the other tests so making sure it works fully in one case might be enough. What do we think?

@lkdvos
Copy link
Member

lkdvos commented Mar 17, 2025

Which part of these tests are the expensive ones? Can we get away with doing a single gradient evaluation for all of them, and would that help? What I mean is, can we converge something, and then simply check if all methods give more or less the same gradient

@leburgel
Copy link
Member Author

leburgel commented Mar 17, 2025

The really expensive ones are those where Zygote just automatically backpropagates through all CTMRG iterations to get the gradient (these >5 minutes per case versus around one minute for all other gradient modes). In general, it's really the gradient computation itself that takes the most time, so even reusing a pre-converged CTMRG environment would not really help speed things up much I think.

@lkdvos
Copy link
Member

lkdvos commented Mar 17, 2025

Have we ever tried the trick of first computing the environment, and then doing AD through a limited number of iterations?

@leburgel
Copy link
Member Author

I'm pretty sure that's what we're already doing in the test actually. As far as I can tell the environment is pre-converged first and then optimtest is called on the derivative of a leading_boundary with the default miniter=4. So unless the first run did not converge within maxiter without us knowing, the derivative is computed using only 4 CTMRG iterations. I'll check if the first leading_boundary converges properly, but if it does I don't really know if there's more we can do.

@leburgel
Copy link
Member Author

I'm pretty sure that's what we're already doing in the test actually.

Okay that's not entirely true. We're pre-converging the environment once at the initial point but then for every retraction in the optimtest the gradient is computed on the full leading_boundary run for the retracted point starting from this one pre-converged environment. But these leading_boundary runs take at most 20 iterations for the step sizes chosen in the test, so that's really already not very much. I tried splitting the leading_boundary on the retracted points into two parts, where I first fully converge it and then compute the gradient using a dummy run with 4 iterations, but this breaks the tests for the naive gradient mode. It seems the gradient obtained by differentiating through a few effectively converged iterations is not accurate enough, which is also to be expected.

I also thought of decreasing the number of step sizes tried in the optimtest, but we're only using 5 step sizes right now so it really feels like we shouldn't decrease that further.

@pbrehmer
Copy link
Collaborator

The only reasonable way to fix this aside from just increasing the time limit I can think of is to not test the trivial gradient_alg=nothing for all other algorithm combinations.

I would suggest cutting down on the combinations with gradient_alg=nothing. We could for example limit these tests to the Heisenberg model with (ctmrg_alg, projector_alg, svd_rrule_alg, gradient_alg):

  • (:simultaneous, :halfinfinite, :tsvd, nothing)
  • (:simultaneous, :fullinfinite, :tsvd, nothing)
  • (:sequential, :halfinfinite, :tsvd, nothing)

Note that gradient_iterscheme doesn't make any difference with gradient_alg=nothing so we can fix that to one option. I think all other combinations should already be sufficiently tested by the other gradient algorithms, especially given the execution time constraints.

@leburgel
Copy link
Member Author

Even after these cuts, two timeouts on v1.11, by at most few minutes. I don't think we should cut any more test combinations and I don't really see any more obvious ways of speeding things up. How would we feel about increasing the timeout by 15 minutes?

@lkdvos
Copy link
Member

lkdvos commented Mar 17, 2025

You can do so by adding a timeout-minutes: 75 keyword here

Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go through this in detail again, but to me this definitely looks good. Maybe @pbrehmer can also approve and we can then merge?

Copy link
Collaborator

@pbrehmer pbrehmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leburgel for vastly improving the gradient tests! I left a comment on having a user option for gradient_alg=nothing but that can be addressed in another PR. We should keep in mind that the SVD interface is still quite messy but that's okay for now, I guess.

)
# instantiate because hook_pullback doesn't go through the keyword selector...
concrete_gradient_alg = if isnothing(gradient_alg)
nothing # TODO: add this to the PEPSKit.GradMode selector?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also considering this and I am not sure about it. From the user perspective it would make a lot of sense to have a :naive option which differentiates through all the CTMRG iterations via gradient_alg=nothing, mostly as a reference method. However, internally it would be weird to do this inside the GradMode constructor since Nothing is definitely not a subtype of GradMode. Of course we could ignore that for the sake usability but I am unsure about that.

In any case, this should probably be handled in a different PR, so let's keep this in mind for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main problem here was that I needed to instantiate the algorithm to use with hook_pullback, and my first attempt of calling GradMode(; alg=nothing) didn't work so I left the comment more as a reminder to myself. I don't think this is really much of a problem, as the keyword selector as a whole hadles calling fixedpoint(...; gradient_alg=nothing, ...) perfectly well. So we can open an issue and make it so we can use GradMode directly for this in the future, but I'm not sure if this is very useful.

A working example of how to do naive AD would solve everything as well, and I think we're anyway going to add that in the docs update.

@leburgel leburgel merged commit 3f62147 into master Mar 17, 2025
39 checks passed
@leburgel leburgel deleted the lb/svdsolve_rrule_tol branch March 17, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failing CTMRG gradient tests for fermionic Hamiltonian

3 participants