Skip to content

Normalize environments before SVD#276

Merged
lkdvos merged 2 commits intomasterfrom
lb/normalize_svd
Oct 22, 2025
Merged

Normalize environments before SVD#276
lkdvos merged 2 commits intomasterfrom
lb/normalize_svd

Conversation

@leburgel
Copy link
Member

Reviving #154 (which I couldn't just reopen for some reason), since it turns out that the issues that were being addressed there really weren't solved in the meantime.

This was originally closed after we merged #151, since there we moved to consistently normalizing individual PEPS tensors to have a unit Eulidean norm, which we then assumed also controls the normalization of the half infinite and full infinite environments before they are SVD'd to obtain the projectors.

However, while this norm is now under control, it turns out that when working with PEPS tensors with unit Euclidean norm the half and full infinite environments can end up having a norm that is still small enough to make it such that the pullback of the SVD is not handled properly in the TensorKit.tsvd rrule. This never really showed up in any of our test cases, but even there normalizing the environments before the SVD actually gives significantly more accurate gradients, the difference was just not large enough to cause any noticeable problems. However, for other models and especially with non-Abelian symmetries, our current choice can still lead to very wrong gradients.

For example, even for a simple Heisenberg model, not normalizing actually gives a remarkably inaccurate gradient:

using Random
using TensorKit
using PEPSKit
using KrylovKit
using OptimKit
using Zygote

sd = 1234
Random.seed!(sd)

D = 3
χ = 20
H = heisenberg_XYZ(InfiniteSquare(1, 1))

boundary_alg = SimultaneousCTMRG(;
    tol = 1.0e-10,
    maxiter = 500,
    verbosity = 1,
    projector_alg = :halfinfinite,
    trscheme = FixedSpaceTruncation()
)
gradient_alg = LinSolver(;
    verbosity = 1,
    solver_alg = BiCGStab(;
        tol = 5.0e-8, maxiter = 500, verbosity = 1,
    ),
    iterscheme = :fixed,
)

function fg((peps, env))
    E, g = Zygote.withgradient(peps) do ψ
        env2, = PEPSKit.hook_pullback(
            leading_boundary,
            env,
            ψ,
            boundary_alg;
            alg_rrule = gradient_alg,
        )
        return cost_function(ψ, env2, H)
    end
    return E, only(g)
end

# initialize randomly
P = physicalspace(H)
Vpeps = fill(ComplexSpace(D), size(H))
Venv = fill(ComplexSpace(χ), size(H))
peps = PEPSKit.peps_normalize(InfinitePEPS(randn, ComplexF64, P, Vpeps))
env0 = CTMRGEnv(randn, ComplexF64, peps, Venv)

# test gradient against finite-difference
alphas, fs, dfs1, dfs2 = OptimKit.optimtest(
    fg, (peps, env0);
    alpha = LinRange(-1.0e-5, 1.0e-5, 2),
    retract = PEPSKit.peps_retract,
    inner = PEPSKit.real_inner,
)
@show dfs1
@show dfs2

Without normalization before the SVD, this gives:

dfs1 = [0.5985298290451385]
dfs2 = [0.5988318959829178]

With normalization before the SVD, the gradient is much more accurate:

dfs1 = [0.5982738294742912]
dfs2 = [0.5982738231510877]

So while fixing the PEPS tensor norm does bound the norm of the half and full infinite environments like we assumed initially, there are cases where our choice for a unit tensor norm actually still gives rise to an 'extreme' case where things go quite wrong. This is a bit unfortunate, since for example fixing the tensor norm to 5.0 would have given much more accurate gradients with the current implementation, even though the choice should in principle be arbitrary.

I assume this will all be solved once we update to use the decompositions and rules from MatrixAlgebraKit.jl, but since this might take a bit of work still to sort out and the temporary fix is so simple with a very big effect, I would argue in favor of merging this right away, so that we can at least guarantee accurate gradients on the master branch from now on.

In addition, this whole thing is a good demonstration that while it shouldn't matter, the choice of PEPS normalization can really affect optimization results. As a simplest example, if we were to switch off the initial normalization at the start of a fixedpoint run, everything would still work as expected and the initial Euclidean norm would be preserved, but rescaling the initial state would then also rescale the resulting gradient (by the inverse factor). Just for this reason, it might be good to modify our cost function and gradient inner product such that the quantities that OptimKit.jl sees are entirely independent of the norm of the input PEPS. This could be addressed in a follow-up.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/algorithms/ctmrg/projectors.jl 82.69% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Confusio
Copy link

I think even in MatrixAlgebra.jl, the normalization is still essential for the default tol. So the normalization is a simple way to make sure the AD is stable.

Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this indeed makes sense, thanks for adding this back.

Would you be okay with starting a changelog (e.g. https://keepachangelog.com/en/1.0.0/) and adding an entry for this? I think I'm okay with not considering this breaking, even though it technically does alter the truncation schemes that were based on tolerances, since these will now effectively be relative tolerances instead of absolute.

@leburgel
Copy link
Member Author

leburgel commented Oct 21, 2025

Starting a changelog is fine by me. I think we're anyway at a breaking release for the next tag, since for example the InfiniteWeightPEPS type was removed after the last release and this was being exported.

@leburgel
Copy link
Member Author

I started the changelog. It would be good to update it with the progress since the last release to have a proper start, but I think that's for a separate PR.

Copy link
Member

@Yue-Zhengyuan Yue-Zhengyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestion that may help avoiding creating copies.

@Yue-Zhengyuan
Copy link
Member

@lkdvos Seems that only you can merge the PR now because of the failed tests with Julia 1.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants