Skip to content

[Perf] EnlargedCorner optimizations#214

Merged
lkdvos merged 5 commits intomasterfrom
performance
Jun 13, 2025
Merged

[Perf] EnlargedCorner optimizations#214
lkdvos merged 5 commits intomasterfrom
performance

Conversation

@lkdvos
Copy link
Member

@lkdvos lkdvos commented Jun 9, 2025

Fixes #213 .

This manually fixes the contraction order for the enlarged corners. While it would be great to avoid having to manually check the optimal orders every time, at least for now it seems reasonable to manually fix some of them.

In particular, the @autoopt currently has absolutely no way of taking into account that some contraction orders that have equal cost might have different subleading costs due to the permutations, which I'm not entirely sure how to fix.

Additionally this fixes something that has been bothering me for a while: the enlarged corners now actually keep track of which one they are, so the TensorMap(Q::EnlargedCorner) no longer needs to get an additional argument.


Performance-wise, I ended up trying a bunch of orders and ended up with the results that contracting first the edges, then the bra and then the ket ends up on top consistently. With the spaces for SU(2) as linked in the parent issue I end up with:

# original:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 16.443 s (0.05% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 95320 allocations.

# ket then bra:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.969 s (0.11% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 94124 allocations.

# bra then ket:
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.612 s (0.09% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 88989 allocations.

I did verify that these results are consistent for the other sizes as well.

@codecov
Copy link

codecov bot commented Jun 9, 2025

Codecov Report

Attention: Patch coverage is 42.85714% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/algorithms/contractions/ctmrg_contractions.jl 20.00% 16 Missing ⚠️
Files with missing lines Coverage Δ
src/algorithms/ctmrg/sequential.jl 98.36% <100.00%> (ø)
src/algorithms/ctmrg/simultaneous.jl 98.27% <100.00%> (ø)
src/algorithms/ctmrg/sparse_environments.jl 30.76% <100.00%> (ø)
src/algorithms/contractions/ctmrg_contractions.jl 56.10% <20.00%> (-2.00%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lkdvos lkdvos requested a review from pbrehmer June 9, 2025 21:15
@lkdvos lkdvos enabled auto-merge (squash) June 9, 2025 21:15
@lkdvos lkdvos marked this pull request as draft June 10, 2025 01:34
auto-merge was automatically disabled June 10, 2025 01:34

Pull request was converted to draft

@lkdvos
Copy link
Member Author

lkdvos commented Jun 10, 2025

Update, it seems like the intermediate permutations actually make a huge difference, I found a different order (technically due to @ogauthe) that is another factor 2 faster. I'll look into how this can be implemented, but also how this could be automated in the future.


# PR state now
BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 6.661 s (0.12% GC) to evaluate,
 with a memory estimate of 6.50 GiB, over 88989 allocations.

# PR state with updated intermediate permutations
BenchmarkTools.Trial: 2 samples with 1 evaluation per sample.
 Range (min  max):  2.990 s    3.086 s  ┊ GC (min  max): 1.56%  4.75%
 Time  (median):     3.038 s              ┊ GC (median):    3.18%
 Time  (mean ± σ):   3.038 s ± 68.183 ms  ┊ GC (mean ± σ):  3.18% ± 2.25%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.99 s         Histogram: frequency by time        3.09 s <

 Memory estimate: 6.49 GiB, allocs estimate: 21237.

Copy link
Collaborator

@pbrehmer pbrehmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of this! I really wasn't aware of the big difference the intermediate permutations make. Let me know when I should review again or if I can help anywhere.

@lkdvos
Copy link
Member Author

lkdvos commented Jun 10, 2025

This seems to only matter for non-abelian tensors however, so there is still some interesting interplay going on here. I'll keep you posted

@ogauthe
Copy link
Contributor

ogauthe commented Jun 10, 2025

I'm glad you found out the explanation.

I expect this effect to matter for all tensors, but indeed it will be more important for non-abelian. On my benchmark, frostspin was slightly faster in the Trivial case. With SU(2), reaching the asymptotic behavior where contraction dominates requires very high bond dimensions.

@lkdvos lkdvos marked this pull request as ready for review June 12, 2025 21:22
@lkdvos
Copy link
Member Author

lkdvos commented Jun 12, 2025

I have now updated all the PEPS corner contractions appropriately.
On my machine, I get the following updated timings for contracting the corners (quite impressive actually):

| sym/(dir, d, D, chi)  | master            | dirty             | master / dirty |
|:----------------------|:-----------------:|:-----------------:|:--------------:|
| su2/(1, 4, 11, 121)   | 0.638 ± 0.059 s   | 0.174 ± 0.035 s   | 3.67 ± 0.81    |
| su2/(1, 4, 16, 255)   | 16.8 s            | 3.02 ± 0.013 s    | 5.56           |
| su2/(1, 4, 4, 16)     | 4.52 ± 0.05 ms    | 2.61 ± 0.035 ms   | 1.73 ± 0.03    |
| su2/(1, 4, 7, 49)     | 19.5 ± 1.8 ms     | 7.97 ± 0.79 ms    | 2.44 ± 0.33    |
| su2/(2, 4, 11, 121)   | 0.555 ± 0.028 s   | 0.173 ± 0.0092 s  | 3.21 ± 0.24    |
| su2/(2, 4, 16, 255)   | 16.4 s            | 2.97 ± 0.1 s      | 5.51           |
| su2/(2, 4, 4, 16)     | 4.53 ± 0.037 ms   | 2.62 ± 0.054 ms   | 1.73 ± 0.038   |
| su2/(2, 4, 7, 49)     | 17.5 ± 0.88 ms    | 7.88 ± 0.98 ms    | 2.22 ± 0.3     |
| su2/(3, 4, 11, 121)   | 0.528 ± 0.0044 s  | 0.174 ± 0.0092 s  | 3.04 ± 0.16    |
| su2/(3, 4, 16, 255)   | 15.9 s            | 2.9 ± 0.15 s      | 5.48           |
| su2/(3, 4, 4, 16)     | 4.58 ± 0.15 ms    | 2.55 ± 0.024 ms   | 1.8 ± 0.061    |
| su2/(3, 4, 7, 49)     | 17.4 ± 0.97 ms    | 7.67 ± 0.68 ms    | 2.27 ± 0.24    |
| su2/(4, 4, 11, 121)   | 0.539 ± 0.021 s   | 0.177 ± 0.021 s   | 3.04 ± 0.38    |
| su2/(4, 4, 16, 255)   | 15.6 s            | 2.93 ± 0.094 s    | 5.32           |
| su2/(4, 4, 4, 16)     | 4.52 ± 0.025 ms   | 2.59 ± 0.024 ms   | 1.74 ± 0.019   |
| su2/(4, 4, 7, 49)     | 17.5 ± 1.3 ms     | 7.89 ± 1.3 ms     | 2.22 ± 0.4     |
| trivial/(1, 4, 4, 16) | 2.02 ± 0.28 ms    | 1.58 ± 0.27 ms    | 1.28 ± 0.28    |
| trivial/(1, 4, 5, 25) | 12.3 ± 3.1 ms     | 9.53 ± 2.2 ms     | 1.29 ± 0.44    |
| trivial/(1, 4, 6, 36) | 0.0579 ± 0.0027 s | 0.0382 ± 0.0024 s | 1.52 ± 0.12    |
| trivial/(1, 4, 7, 49) | 0.243 ± 0.012 s   | 0.143 ± 0.015 s   | 1.7 ± 0.2      |
| trivial/(1, 4, 8, 64) | 0.892 ± 0.1 s     | 0.593 ± 0.089 s   | 1.5 ± 0.29     |
| trivial/(2, 4, 4, 16) | 1.85 ± 0.45 ms    | 1.6 ± 0.13 ms     | 1.16 ± 0.3     |
| trivial/(2, 4, 5, 25) | 12.6 ± 3.9 ms     | 9.75 ± 2.3 ms     | 1.29 ± 0.51    |
| trivial/(2, 4, 6, 36) | 0.0572 ± 0.0012 s | 0.0408 ± 0.0031 s | 1.4 ± 0.11     |
| trivial/(2, 4, 7, 49) | 0.245 ± 0.0069 s  | 0.147 ± 0.01 s    | 1.67 ± 0.12    |
| trivial/(2, 4, 8, 64) | 0.87 ± 0.11 s     | 0.603 ± 0.091 s   | 1.44 ± 0.29    |
| trivial/(3, 4, 4, 16) | 1.52 ± 0.32 ms    | 1.6 ± 0.36 ms     | 0.948 ± 0.29   |
| trivial/(3, 4, 5, 25) | 11.7 ± 2.9 ms     | 9.84 ± 2.3 ms     | 1.19 ± 0.4     |
| trivial/(3, 4, 6, 36) | 0.0548 ± 0.003 s  | 0.0386 ± 0.004 s  | 1.42 ± 0.17    |
| trivial/(3, 4, 7, 49) | 0.247 ± 0.016 s   | 0.148 ± 0.013 s   | 1.67 ± 0.18    |
| trivial/(3, 4, 8, 64) | 0.884 ± 0.1 s     | 0.622 ± 0.09 s    | 1.42 ± 0.26    |
| trivial/(4, 4, 4, 16) | 1.96 ± 0.37 ms    | 1.61 ± 0.3 ms     | 1.22 ± 0.32    |
| trivial/(4, 4, 5, 25) | 12 ± 3.3 ms       | 9.58 ± 2.3 ms     | 1.26 ± 0.46    |
| trivial/(4, 4, 6, 36) | 0.0567 ± 0.0017 s | 0.0406 ± 0.0048 s | 1.4 ± 0.17     |
| trivial/(4, 4, 7, 49) | 0.244 ± 0.0054 s  | 0.147 ± 0.0089 s  | 1.66 ± 0.11    |
| trivial/(4, 4, 8, 64) | 0.898 ± 0.14 s    | 0.565 ± 0.11 s    | 1.59 ± 0.38    |
| u1/(1, 4, 11, 121)    | 1.99 ± 0.14 s     | 1.32 ± 0.082 s    | 1.51 ± 0.14    |
| u1/(1, 4, 4, 16)      | 0.921 ± 0.084 ms  | 0.854 ± 0.033 ms  | 1.08 ± 0.11    |
| u1/(1, 4, 7, 49)      | 0.0351 ± 0.0013 s | 24.2 ± 1.3 ms     | 1.45 ± 0.095   |
| u1/(2, 4, 11, 121)    | 1.83 ± 0.11 s     | 1.34 ± 0.091 s    | 1.37 ± 0.13    |
| u1/(2, 4, 4, 16)      | 0.923 ± 0.059 ms  | 0.86 ± 0.042 ms   | 1.07 ± 0.087   |
| u1/(2, 4, 7, 49)      | 0.0356 ± 0.0033 s | 23.9 ± 1.6 ms     | 1.49 ± 0.17    |
| u1/(3, 4, 11, 121)    | 1.92 ± 0.14 s     | 1.29 ± 0.13 s     | 1.49 ± 0.19    |
| u1/(3, 4, 4, 16)      | 0.925 ± 0.063 ms  | 0.866 ± 0.046 ms  | 1.07 ± 0.092   |
| u1/(3, 4, 7, 49)      | 0.0348 ± 0.001 s  | 24.6 ± 2.1 ms     | 1.42 ± 0.13    |
| u1/(4, 4, 11, 121)    | 1.87 ± 0.14 s     | 1.27 ± 0.013 s    | 1.47 ± 0.11    |
| u1/(4, 4, 4, 16)      | 0.92 ± 0.1 ms     | 0.86 ± 0.044 ms   | 1.07 ± 0.13    |
| u1/(4, 4, 7, 49)      | 0.0358 ± 0.0028 s | 24.1 ± 1.1 ms     | 1.49 ± 0.13    |
| time_to_load          | 1.44 ± 0.086 s    | 1.47 ± 0.039 s    | 0.979 ± 0.064  |

@lkdvos lkdvos requested a review from pbrehmer June 12, 2025 21:26
@lkdvos lkdvos enabled auto-merge (squash) June 12, 2025 21:27
Copy link
Collaborator

@pbrehmer pbrehmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive improvement! Thanks also for including the benchmark study - that might become useful in the future in case we want to set up a proper benchmark suite for PEPSKit.

@lkdvos lkdvos merged commit 5a28693 into master Jun 13, 2025
44 of 45 checks passed
@lkdvos lkdvos deleted the performance branch June 13, 2025 11:36
lkdvos added a commit that referenced this pull request Jun 13, 2025
* Store `dir` in `EnlargedCorner`

* Manually fix enlarged corner contractions

* Update contractions

---

Co-authored-by: Olivier Gauthe
<olivier.gauthe.2011+github@polytechnique.org>
lkdvos added a commit that referenced this pull request Jun 13, 2025
* Store `dir` in `EnlargedCorner`

* Manually fix enlarged corner contractions

* Update contractions

---

Co-authored-by: Olivier Gauthe
<olivier.gauthe.2011+github@polytechnique.org>
lkdvos added a commit that referenced this pull request Jun 13, 2025
* Store `dir` in `EnlargedCorner`

* Manually fix enlarged corner contractions

* Update contractions

---

Co-authored-by: Olivier Gauthe
<olivier.gauthe.2011+github@polytechnique.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enlarged corners contraction order

3 participants