Skip to content

Renormalize tensors before projections#148

Closed
Confusio wants to merge 0 commit intoQuantumKitHub:masterfrom
Confusio:master
Closed

Renormalize tensors before projections#148
Confusio wants to merge 0 commit intoQuantumKitHub:masterfrom
Confusio:master

Conversation

@Confusio
Copy link

@Confusio Confusio commented Mar 5, 2025

This commit addresses PEPSKit.jl Issue #137 by refactoring CTMRG expanded corners and projector implementations. Key changes include:

Core Improvements

  • Implement corner tensor renormalization during corner expansion.
  • Introduce a generalized compute_projector function that handles projectors for arbitrary left/right tensors connecting the bond.
  • Update both sequential_projectors and simultaneous_projectors to use the new unified interface

Workflow Notes

  • Tests currently require iterscheme=:diffgauge as a temporary workaround for a KrylovKit.jl issue (exact root cause remains undiagnosed)

New Utility

(t1::AbstractTensorMap, t2::AbstractTensorMap) = twist( t1, filter(i -> !isdual(space(t1, i)), domainind(t1))) * t2

This tensor contraction helper cleans up infinite_environment contractions in both full_infinite_environment and half_infinite_environment implementations.

@Confusio Confusio changed the title Renormalized tensors before projections Renormalize tensors before projections Mar 5, 2025
@codecov
Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
src/PEPSKit.jl 87.50% <ø> (ø)
src/algorithms/ctmrg/projectors.jl 100.00% <100.00%> (+4.54%) ⬆️
src/algorithms/ctmrg/sequential.jl 98.48% <100.00%> (+0.12%) ⬆️
src/algorithms/ctmrg/simultaneous.jl 98.33% <100.00%> (+0.15%) ⬆️
...rithms/optimization/fixed_point_differentiation.jl 94.89% <100.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pbrehmer
Copy link
Collaborator

pbrehmer commented Mar 7, 2025

Thanks for getting this started! I haven't had time to look at the PR properly but what I just noticed: The SimpleGrad and FixedPointGrad construction is actually not required and in fact already implemented. If you want to compute the gradient the SimpleGrad way you just have to supply gradient_alg=nothing to PEPSOptimize. That way the fixed-point differentiation is bypassed and the CTMRG run is differentiated "naively" (by differentiating all iterations). Sorry for not yet documenting this properly, that option is hidden away through the hook_pullback function.

Or perhaps I am missing the point: Was the reason you implemented SimpleGrad to limit the number of CTMRG iterations in the backpropagation?

@Confusio
Copy link
Author

Confusio commented Mar 7, 2025

Thanks for getting this started! I haven't had time to look at the PR properly but what I just noticed: The SimpleGrad and FixedPointGrad construction is actually not required and in fact already implemented. If you want to compute the gradient the SimpleGrad way you just have to supply gradient_alg=nothing to PEPSOptimize. That way the fixed-point differentiation is bypassed and the CTMRG run is differentiated "naively" (by differentiating all iterations). Sorry for not yet documenting this properly, that option is hidden away through the hook_pullback function.

Or perhaps I am missing the point: Was the reason you implemented SimpleGrad to limit the number of CTMRG iterations in the backpropagation?

I got it. I didn't notice gradient_alg=nothing does this job.

Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

I left a couple comments throughout the code, but I have some general comments as well:

Could you try and separate out changes that are functional from ones that are cosmetic? It is a bit hard to review PRs where both are going on at the same time, and when things go wrong point at what it is.
Specifically, if I got this correctly, there is

  1. a restructuring of the callchain for computing the projectors
  2. a different way of contracting parts of the environment together (with )
  3. a normalization before and after the SVD

Especially 2. scares me a bit, since the fermionic contractions really are a bit fragile: implementations that seemingly should give the same results can be different because contractions and multiplications are not the same

is the left and right tensors. After the projection, the arrow of the bond is now (L⊙P_L)←(P_R⊙R).

```
L⊙R=(L⊙R)*(L⊙R)^-1*(L⊙R)=(L⊙R)*(U*S*V)^-1*(L⊙R)=(L⊙R)*V'S^{-1}U'*(L⊙R)=L⊙(R*V'*S^{-1/2})*(S^{-1/2}*U'*L)⊙R=L⊙P_L*P_R⊙R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this docstring compared to the previous one, since the previous one had drawings. Here, the long equation is a bit hard to parse, so maybe you could merge the best of both worlds? (it could also be helpful to work a bit with spaces and indentation to improve readability)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the tensor contraction operator ⊙ is useful for deriving the projector operators on a selected bond. For example, consider the following diagrams:

             ----     ||       ----
    --->----|    |    ||      |    |
    ........|    |    ||      |    |---->----
    ........|    |----||->----|    |.........
    ---<--- |  L |....||......| R  |.........
    ........|    |----||-<----|    |----<----
    ---->---|    |    ||      |    |
             ----     ||       ----

No matter how many outer legs or contracted legs appear on the bond, the diagram can always be expressed as
  L ⊙ R
where ⊙ represents the tensor contraction rather than map composition. (Note that standard linear algebra operation (e.g. factorization and inverse) are valid only for maps, not for general tensor contractions.)

For the singular value decomposition (SVD) of L ⊙ R, we treat the entire contraction as a map. Thus, we can write

  L ⊙ R = U * S * V

where * denotes map composition, obeying the usual rules of linear algebra.

Since L ⊙ R is a map, it possesses an inverse given by

  (L ⊙ R)⁻¹ = V' * S⁻¹ * U'

so that

  (L ⊙ R) * (L ⊙ R)⁻¹ = (L ⊙ R)⁻¹ * (L ⊙ R) = I,

and we generally don't have

  (L ⊙ R) ⊙ (L ⊙ R)⁻¹ = I,

as $⊙$ is generally not a map.

From this, we obtain (note that before we implement linear algebra operators (e.g. $(a *b)^{-1}=b^{-1}*a^{-1}$), be sure it is a map)

  L ⊙ R = (L ⊙ R) * (L ⊙ R)⁻¹ * (L ⊙ R)
      = (L ⊙ R) * V' * S⁻¹ * U' * (L ⊙ R)
      = L ⊙ (R * V' * S^(–1/2)) * (S^(–1/2) * U' * L) ⊙ R
      = L ⊙ P_L * P_R ⊙ R,

where

  P_L = R * V' * S^(–1/2)
  P_R = S^(–1/2) * U' * L.

This explicit distinction between map composition (using *) and tensor contraction (using ⊙) helps beginners understand why we need extra care about the signs of fermions if we do the linear algebra operations (e.g. factorization and inverse). If no linear algebra operations, @tensor in TensorKit.jl automatically does the right thing.

Once this point is clear, addressing the fermion signs in fPEPS becomes much more straightforward. The key rule is: whenever you perform linear algebra operations—such as factorization or inversion—you must use * (map composition), not ⊙ (tensor contraction). If no linear algebra operation, use ⊙ (tensor contraction).

Although one can use the @tensor macro for contractions, it does not easily allow one to formulate the detailed derivation of the projectors, which in turn creates a barrier to understanding the fermion signs in fPEPS (at least for me).

Another point is that a composition of * and twist operations is sufficient for constructing the CTMRG projectors. This is nearly true because the projections are always derived from the expanded corners, which are already permuted into the proper shapes: C1(p1; p2), C2(p2; p3), C3(p3; p4), and C4(p4; p1). Therefore, a composition of * and twist operations suffices for both the half and full infinite environments. In my implementation, many contractions in the environment part can be removed in the file ctmrg_contraction.jl.

With this in mind, there is no need to distinguish between the full_infinite_environment and half_infinite_environment methods when constructing the projectors, since only the left and right tensors are of concern. The proper left and right tensors are clearly specified in the CTMRG steps found in sequential.jl and simultaneous.jl. Specifically, on the bond connecting C1 and C2:

  • For the half_infinite_environment, L = C1 and R = C2.
  • For the full_infinite_environment, L = C4 ⊙ C1 and R = C2 ⊙ C3,

where C1, C2, C3, and C4 are the expanded corners. With L and R are provided, the compute_projector function gives you projectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the more detailed explanation, I think this is definitely helpful. However, I mostly mean that it would be helpful to put that in the docstring, because while I know what is going on, others might not. You removed one of the drawings and replaced it with a single long equation without explanation, where the symbols in the explanation are not necessarily that explanatory. That was mostly the point of my comment.

Comment on lines -250 to -276
"""
contract_projectors(U, S, V, Q, Q_next)

Compute projectors based on a SVD of `Q * Q_next`, where the inverse square root
`isqS` of the singular values is computed.

Left projector:
```
-- |~~~~~~| -- |~~|
|Q_next| |V'| -- isqS --
== |~~~~~~| == |~~|
```

Right projector:
```
|~~| -- |~~~| --
-- isqS -- |U'| | Q |
|~~| == |~~~| ==
```
"""
function contract_projectors(U, S, V, Q, Q_next)
isqS = sdiag_pow(S, -0.5)
P_left = Q_next * V' * isqS # use * to respect fermionic case
P_right = isqS * U' * Q
return P_left, P_right
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to remove this function? It seems like you are still using it in your new implementation, but manually pasted it in?

Copy link
Author

@Confusio Confusio Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explain above, given Q (=L) and Q_next (=R) only, we can calculate the projectors in the compute_projector function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this, but by removing the function and the drawing, this is now slightly less transparent, and I'm wondering what the benefit is.


```
L⊙R=(L⊙R)*(L⊙R)^-1*(L⊙R)=(L⊙R)*(U*S*V)^-1*(L⊙R)=(L⊙R)*V'S^{-1}U'*(L⊙R)=L⊙(R*V'*S^{-1/2})*(S^{-1/2}*U'*L)⊙R=L⊙P_L*P_R⊙R
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to attach docstrings directly above the function in order to make them be correctly parsed as docstrings, Julia will interpret this as a simple string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about how to express this clearly. Only one formula can't explain the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment might not have been very clear. What I mean is, given that this is the docstring for compute_projector, you need to explicitly have it before that function in the code, without newlines or other functions inbetween:

"""
docstring
"""
function myfunction end

This way, it will show up if you type ?compute_projector in the REPL

Comment on lines +82 to +84
function ⊙(t1::AbstractTensorMap, t2::AbstractTensorMap)
return twist(t1, filter(i -> !isdual(space(t1, i)), domainind(t1))) * t2
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this symbol or this implementation.

For the symbol, there are two problems to me:

  • if you don't know what it does, there is no way to figure out what this function is except search through the code and see where it is defined. It's not obvious to me why this would be a contraction with twists, and I would actually intuitively think it is some kind of dot operation if I would not know what it is.
  • it seems like we already have way to many functions to deal with fermionic symmetries etc, and adding one just increases the mental space you need to keep track of it.

Implementation-wise this is fully equivalent to, but less efficient than the suggestion I had here in terms of explicit tensorcontract calls.

Do you think we could stick to tensorcontractions and multiplications? In particular, I would much rather just introduce a function combine_environment (or anything along those lines) that we can overload if necessary, because this is now making a lot of implicit assumptions about the structure of these environments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my above explanation. Actually, the expanded corner tensors take the form C1(p1; p2), C2(p2; p3), C3(p3; p4), and C4(p4; p1) quite naturally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I see how this relates to the issues I'm mentioning. Can you maybe elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so great an idea to start changing tests in the same PR as you are introducing some new functionality. This makes it hard to gauge if your implementation is actually working or not, since from your comment it seems like it is not and you had to adapt the tests to make them pass (which is of course not really the point of having the tests...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to a known issue with KrylovKit (Issue #100), the test can't pass. I need find out another initial peps by changing the random seed or switch to iterscheme=:diffgauge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer it if you keep it as non-passing in that case anyways, since it was passing before. What is different here, such that it no longer works? We can then decide if this can be safely ignored, because now it looks like your change makes it such that that issue appears, whereas before it was not appearing.

Comment on lines +63 to +71
corner_functions = Dict(
NORTHWEST => enlarge_northwest_corner,
NORTHEAST => enlarge_northeast_corner,
SOUTHEAST => enlarge_southeast_corner,
SOUTHWEST => enlarge_southwest_corner,
)

haskey(corner_functions, dir) || error("Wrong direction: $dir")
corner = corner_functions[dir](Q.E_1, Q.C, Q.E_2, Q.A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this change? It is somewhat orthogonal to the other changes, and in particular I don't think this is what we want: using the dictionary of functions most likely results in a forced runtime dispatch, which is not great for type stability or performance in general.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we don't need the changes. I didn't notice Copilot did the changes automatically.


haskey(corner_functions, dir) || error("Wrong direction: $dir")
corner = corner_functions[dir](Q.E_1, Q.C, Q.E_2, Q.A)
return corner / norm(corner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move the normalization higher up the callchain? TensorMap is a constructor, and as a function really is supposed to contract the tensors and give you a tensor, and implicitly renormalizing seems not entirely fair. (You might not always want to do this, for example if you need access to the value of the norm)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the norm of the PEPS tensor is either very small or very large, the contraction of the expanded corners (C1 * C2 * C3 * C4) can become numerically unstable because the PEPS tensor appears four times. For reasons that are not entirely clear, the gradient can sometimes become excessively large. As a result, the trial update during optimization (i.e., peps' = peps + grad * α) may acquire a very large norm, which leads to the breakdown of the LBFGS algorithm.

So I think the normalization here is useful, but we need find out a proper place to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not mean that we should not normalize, I mean that this should not happen in this function :)

Comment on lines +89 to +91
if dim(codomain(L)) > dim(domain(L))
_, L = leftorth!(L)
R, _ = rightorth!(R)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if this works with fermions? It seems like this would alter the arrows on the legs of what you are actually contracting/decomposing, and it confuses me a bit that there would be no extra parity factors showing up because of that.

Additionally, I think this is type-unstable: depending on the sizes of the tensors, L and R will now have an altered number of legs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my notation, the result bond has the arrow represented as
  (L ⊙ P_L) ← (P_R ⊙ R).
The leftorth and rightorth will be very helpful when L and R don't have full ranks, particularly for the case you mentioned they will have different number of outer legs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you should implement this differently in that case: before you use leftorth, L is an N by M tensormap. After, it will be a 1 by M tensormap. The compiler does not know if you are going to apply leftorth or not. Hence, after this block of code, you lose type stability.
This can be resolved by explicitly having a function barrier before the actual implementation.

The reason I'm checking in about the fermions is that leftorth does not always produce arrows in the same way, but looking at it in more detail now I think it's fine because you are applying the correct twists in the line below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about changing tests here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reason due to a known issue with KrylovKit (Issue #100).

@Confusio
Copy link
Author

Confusio commented Mar 7, 2025

Thanks for looking into this!

I left a couple comments throughout the code, but I have some general comments as well:

Could you try and separate out changes that are functional from ones that are cosmetic? It is a bit hard to review PRs where both are going on at the same time, and when things go wrong point at what it is. Specifically, if I got this correctly, there is

  1. a restructuring of the callchain for computing the projectors
  2. a different way of contracting parts of the environment together (with )
  3. a normalization before and after the SVD

Especially 2. scares me a bit, since the fermionic contractions really are a bit fragile: implementations that seemingly should give the same results can be different because contractions and multiplications are not the same

Items 1 and 2 might be optional since they mainly help clarify the fermion sign—though I'm not completely certain. Item 3 is essential as it enhances numerical stability. Please see my detailed thoughts.

@lkdvos
Copy link
Member

lkdvos commented Mar 7, 2025

I'm not saying that we cannot change all of these, I'm saying that you should have separate PRs for them, since they are somewhat orthogonal. That way, we can more easily review the code and obtain the best solutions, without having to go through a large amount of separate issues at the same time

@Confusio
Copy link
Author

Confusio commented Mar 8, 2025

Sorry for repeated commit message.
This pull request includes changes to the compute_projector function and related projector computation functions in the src/algorithms/ctmrg directory. The updates aim to improve the clarity and renormalize tensors before SVD and ensure consistency across different projector algorithms. Tests in heisenberg.jl and tf_ising.jl still fail due to KrylovKit issue (Jutho/KrylovKit.jl#110). Below are the changes:

Enhancements to compute_projector function:

  • The compute_projector function now accepts L::AbstractTensorMap and R::AbstractTensorMap as arguments instead of enlarged_corners and coordinate. The function has been updated to compute projection operators for dimension truncation between tensors L and R, with detailed documentation and visual representation added.

  • The mathematical foundation of the projector calculation has been elaborated, including the decomposition using SVD and the definition of left and right projectors P_L and P_R.

  • Explain the different roles of map composition and tensor contraction of fermion signs in fPEPS.

Updates to sequential and simultaneous projector functions:

  • The sequential_projectors function now normalizes tensors Q1 and Q2 before computing projectors.

  • The simultaneous_projectors function has been similarly updated to normalize tensors.

  • Different Infinite_Environment projection scheme now use the unified compute_projector function.

Minor corrections:

  • Fixed a typo in the EigSolver function, correcting Defauls to Defaults for the default solver parameter.

@Confusio
Copy link
Author

Confusio commented Mar 8, 2025

fix: Use fixed tolerance in Arnoldi algorithm to resolve KrylovKit issue

Previously, the tolerance for rrule_alg was set using the variable ctmrg_tol (with a value of 1.0e-8). Debug output from svdsolve.jl (see this link) revealed that while the vals array contained all expected eigenvalues, the corresponding rvals array was missing an entry near 7e-8.

(vals, rvals) = ([0.9972572805593066, 0.06451815859605552, 0.036184598804887586, 0.002434937441416787, 0.000257804461731779, 5.179320746859194e-5, 1.2009304246253585e-5, 9.930554302595036e-6, 7.877564742933071e-6, 4.449498137506608e-6, 9.658547169810283e-7, 6.521371763146172e-7, 2.906002397520167e-7, 7.149473802806458e-8, 6.731283066272181e-8, 3.68284807988075e-8], ComplexF64[0.9972572805593064 + 1.4059950420919265e-32im, 0.0645181585960555 + 5.5002370940311424e-33im, 0.03618459880488763 - 5.903859117468247e-32im, 0.002434937441416788 + 2.5572350764397176e-32im, 0.00025780446173177785 - 3.533902315109301e-33im, 5.1793207468593083e-5 - 3.239920849691298e-31im, 1.2009304246254148e-5 - 4.257380530297327e-25im, 9.930554302595809e-6 + 3.627657579880785e-24im, 7.877564742934238e-6 + 3.579591975126304e-24im, 4.449498137507113e-6 - 2.008137399273548e-24im, 9.65854703934024e-7 + 4.751819423053537e-19im, 6.521369232255494e-7 + 1.8750608545020293e-18im, 2.905964397554036e-7 - 9.161679754055458e-18im, 6.960340417111356e-8 + 1.518834871110928e-17im, 3.699007030825129e-8 + 4.687018494971075e-17im])

This discrepancy—occurring close to the original tolerance—suggests a precision issue with the 1.0e-8 threshold.

To resolve this, the tolerance has been hardcoded to 1.0e-10 in the Arnoldi algorithm:

const rrule_alg = Arnoldi(; tol=1.0e-10, krylovdim=48, verbosity=-1)

This adjustment improves numerical stability and consistency, addressing the problem described in KrylovKit.jl issue #110.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants