Skip to content

feat(layout-engine): balance columns at continuous section breaks (SD-2452)#2869

Open
tupizz wants to merge 7 commits intomainfrom
tadeu/sd-2452-feature-implement-column-balancing-for-continuous-section
Open

feat(layout-engine): balance columns at continuous section breaks (SD-2452)#2869
tupizz wants to merge 7 commits intomainfrom
tadeu/sd-2452-feature-implement-column-balancing-for-continuous-section

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented Apr 20, 2026

Comparison Results PDF
SD-2452-page-by-page.pdf

Summary

Implements ECMA-376 §17.18.77 column balancing for multi-column sections. Word produces a minimum-height balanced layout at the end of a continuous (and empirically, next-page) multi-column section; SuperDoc was either leaving content stacked in the first column or, in some layouts, producing overlapping fragments.

Linear: SD-2452

What changed

  1. layoutDocument builds a block → section map by walking blocks in document order and tracking the current section from the most recent sectionBreak (pm-adapter only stamps attrs.sectionIndex on sectionBreak blocks, not on content paragraphs).
  2. New balanceSectionOnPage helper performs section-scoped balancing with its own fragment-level positioning (no Y-grouping). Fragments are ordered by (x, y) and each is treated as its own block. The previous balancePageColumns grouped fragments by Y into rows, which collapsed fragments from different source columns at the same Y and produced overlap.
  3. calculateBalancedColumnHeight is a proper binary search for the minimum H such that greedy left-to-right fill places every block with every column ≤ H. Matches Word's left-heavy packing preference (e.g. 7 blocks / 3 cols → 3+3+1, not 2+2+3).
  4. Mid-page hook at forceMidPageRegion balances the ending section on the current page before starting the new region, and collapses both cursors to balanceResult.maxY so the next region begins just below the balanced columns. Sections handled mid-page are tracked in alreadyBalancedSections so the post-layout pass doesn't double-balance.
  5. Per-section post-layout loop replaces the prior "last page of document" heuristic — each multi-column section's last page is balanced, skipping sections already handled mid-page.

Results (Word vs SuperDoc)

Test Scenario Word SuperDoc before SuperDoc after
1 6 equal paragraphs, 2 cols (continuous break) 3+3 6+0 — not balanced 3+3 — exact match
2 5 paragraphs with unequal heights, 2 cols 2+3 5+0 — not balanced 2+3 — exact match
3 7 equal paragraphs, 3 cols 3+3+1 7+0+0 — not balanced 3+3+1 — exact match
4 13 paragraphs with multi-line bodies, 2 cols 7+6 Overlapping fragments 7+6 — exact match
5 Continuous + next-page sections (5+5) 3+2 / 3+2 Not balanced 3+2 / 3+2 — exact match

Side-by-side PDF comparison available locally at /tmp/sd-2452-fixtures/SD-2452-comparison.pdf (generated via new compare-word-vs-superdoc skill).

Test plan

  • 614 `@superdoc/layout-engine` tests pass (11 new for SD-2452)
  • 1,737 `@superdoc/pm-adapter` tests pass
  • 11,375 `super-editor` tests pass
  • 0 overlap regressions across local corpus (14 docs — none activate the balancing code path, fix is scope-gated to sections with `count > 1`)
  • Visual validation against Microsoft Word for all 5 fixtures
  • Browser sanity: scroll stable, zoom stable, no fragment overlaps
  • `pnpm test:layout` against production reference (blocked on wrangler re-auth locally — CI will run this)
  • Upload fixtures to R2 corpus for visual regression coverage

Demo tests

CleanShot 2026-04-20 at 15 11 27@2x CleanShot 2026-04-20 at 15 11 54@2x CleanShot 2026-04-20 at 15 12 05@2x CleanShot 2026-04-20 at 15 12 16@2x CleanShot 2026-04-20 at 15 12 29@2x CleanShot 2026-04-20 at 15 12 39@2x

Fixtures

  • `spec-test-1.docx` — Basic 2-column balance
  • `spec-test-2.docx` — Unequal paragraph heights
  • `spec-test-3.docx` — Three-column balance
  • `spec-test-4.docx` — Long content / overlap scenario
  • `spec-test-5.docx` — Continuous + next-page break combo

Plan is to upload these to the R2 corpus after the PR lands.

@linear
Copy link
Copy Markdown

linear Bot commented Apr 20, 2026

…-2452)

Implements ECMA-376 §17.18.77 column balancing for multi-column sections.
Word produces a minimum-height balanced layout at the end of a continuous
(and, empirically, next-page) multi-column section; SuperDoc was either
leaving content stacked in the first column or, in some layouts, producing
overlapping fragments.

The pagination pipeline now balances each multi-column section's last page
at layout time:

  - layoutDocument builds a block -> section map by walking blocks in
    document order and tracking the current section from the most recent
    sectionBreak (pm-adapter only stamps attrs.sectionIndex on sectionBreak
    blocks, not on content paragraphs).
  - A new balanceSectionOnPage helper performs section-scoped balancing
    with its own fragment-level positioning (no Y-grouping): fragments are
    ordered by (x, y) in document order and each is treated as its own
    block. The previous balancePageColumns grouped fragments by Y into
    "rows," which collapsed fragments from different source columns at the
    same Y and produced overlap.
  - calculateBalancedColumnHeight is now a proper binary search for the
    minimum column height H such that greedy left-to-right fill places
    every block with every column <= H. This matches Word's left-heavy
    packing preference (e.g. 7 blocks / 3 cols -> 3+3+1, not 2+2+3).
  - A mid-page hook at forceMidPageRegion balances the ending section on
    the current page before starting the new region, and collapses both
    cursors to balanceResult.maxY so the next region begins just below the
    balanced columns. Sections handled mid-page are tracked in
    alreadyBalancedSections so the post-layout pass doesn't double-balance.
  - The prior "last page of document" heuristic is replaced with a
    per-section post-layout loop that balances each multi-column section's
    last page, skipping sections already handled mid-page.

Tests:

  - 11 new unit/integration tests covering the 5 SD-2452 fixtures
    (2-col/3-col, equal and unequal heights, continuous and next-page
    breaks, multi-page sections, explicit column-break opt-out).
  - 614 layout-engine tests pass, 1737 pm-adapter tests pass,
    11375 super-editor tests pass.

Visual validation against Microsoft Word for all 5 fixtures:

  - Test 1 (6 paras / 2 cols):       3+3        exact match
  - Test 2 (5 mixed / 2 cols):       2+3        exact match
  - Test 3 (7 paras / 3 cols):       3+3+1      exact match
  - Test 4 (13 paras / 2 cols):      7+6        exact match, overlap gone
  - Test 5 (continuous + next-page): 3+2, 3+2   exact match
@tupizz tupizz force-pushed the tadeu/sd-2452-feature-implement-column-balancing-for-continuous-section branch from dd5aff7 to 5b2335a Compare April 20, 2026 17:40
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…uction (SD-2452)

When a mid-page section break reduced the column count (e.g. 2-col ->
1-col for test 4's 13-paragraph fixture followed by OVERLAP CHECK), the
mid-page hook's forced-page-break guard ran before balancing:

  if (columnIndexBefore >= newColumns.count) {
    state = paginator.startNewPage();
  }
  // ... balance ran here, on the empty new page

At the section transition, columnIndexBefore=1 (paginator was in col 1)
and newColumns.count=1, so the guard forced a new page before balancing
had a chance to reposition the ending section's fragments. Balancing
then ran on the empty new page (no-op), the paginator placed the
post-columns single-column content on the new page, and the old page's
fragments were balanced by the post-layout pass. Net effect: columns
looked correct on page 0 but OVERLAP CHECK ended up on page 1, while
Word fits everything on one page.

The guard exists to prevent new 1-col content from overwriting earlier
column content on the same page. With balancing, that risk disappears:
all ending-section fragments are repositioned within the section's own
vertical region, and the cursor moves to maxY below the balanced
columns. The new region starts safely below.

Fix: balance first. Only fall through to the forced-page-break guard
when the ending section won't be balanced (single-col -> multi-col,
explicit column break, or no section-1 fragments on the page).

Test 4 now renders on a single page, matching Word:
  - 7+6 balanced columns
  - OVERLAP CHECK heading at y=758 (right below columns)
  - "If this overlaps..." at y=794
  - Total: 1 page (was 2)

All 5 SD-2452 fixtures now match Word's pagination exactly. 614
layout-engine tests still pass.
@tupizz tupizz self-assigned this Apr 20, 2026
@tupizz tupizz marked this pull request as ready for review April 20, 2026 18:15
@tupizz tupizz requested a review from harbournick April 20, 2026 18:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4265964d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/layout-engine/layout-engine/src/index.ts
Comment thread packages/layout-engine/layout-engine/src/index.ts
…D-2646) (#2930)

* fix(pm-adapter): emit section break before non-paragraph nodes (SD-2646)

Per ECMA-376 §17.6.17, a <w:sectPr> inside a paragraph defines the section
that ENDS with that paragraph. All body children preceding it — paragraphs,
tables, top-level drawings, SDTs — belong to that section.

Section ranges were indexed purely by paragraph count, and section-break
blocks were emitted only inside handleParagraphNode. A table that sat
between two sectPr-marker paragraphs was emitted into the flow stream
BEFORE the section break that declared its column config, so the layout
engine laid it out under the prior section's settings.

This is the root cause of IT-945 rendering a 114-row 2-col continuous
table in column 0 across three pages with column 1 empty: the table was
placed in the 1-col section, not the 2-col section.

Fix:
- Track nodeIndex over every top-level doc.content child in
  findParagraphsWithSectPr and SectionRange (alongside paragraphIndex,
  which SDT handlers still use for intra-SDT transitions).
- Add maybeEmitNextSectionBreakForNode in sections/breaks.ts and call
  it from internal.ts's main dispatch loop BEFORE every top-level
  handler. Any non-paragraph node crossing a section boundary now
  triggers the break.
- Section-model primer in pm-adapter/README.md with spec citations.

Tests: 1739/1739 pass in pm-adapter (including new end-tagged.test.ts
and integration test in index.test.ts asserting flow-block order).

* fix(layout-engine): split dominant table at row boundary when balancing section-final page (SD-2646)

The column balancer treats each fragment as an atomic block. A
multi-page two-column continuous section's final page can end up with
a single table fragment taller than totalSectionHeight / columnCount.
The atomic-block binary search then places the whole table in one
column and leaves the other empty — diverging from Word, which
balances by splitting the table at a row boundary per ECMA-376
§17.18.77 ("a continuous section break balances the content of the
previous section").

Fix: add splitDominantTableAtRowBoundary as a preprocessor inside
balanceSectionOnPage. When the section has a single splittable table
fragment larger than target, split it at the row whose cumulative
height first meets or exceeds totalSectionHeight / columnCount. The
two halves are inserted in place of the original; the rest of the
balancer runs unchanged and naturally assigns one to each column.

Also add getBalancingHeight so empty sectPr-marker paragraphs
(measured lines with width=0) contribute 0 to balancing — matching
Word's behavior of not rendering an empty line for such markers.
This keeps both columns top-aligned on the section-final page.

On IT-945: page 2 now splits 14/14 from y=96 in both columns, matching
Word's top-alignment. Before this fix page 2 rendered all 28 remaining
rows in col 1 with col 0 empty.

Tests: strengthened existing "balances the section-ending page" test
(it was passing trivially via `if (sectionFragments.length > 1)`
guard). Added narrow-table multi-page regression test. 616/616 pass.
@harbournick harbournick requested a review from a team as a code owner April 30, 2026 18:07
@harbournick harbournick self-assigned this Apr 30, 2026
@harbournick
Copy link
Copy Markdown
Collaborator

@tupizz please double check layout testing and the below. I see some documents that have definitely regressed.
Also double check these:

  1. Use each section’s own page metrics when rebalancing

    The post-layout balancing pass appears to rebalance earlier multi-column sections using the final active margins/page
    size from a later section. If a two-column section with one set of margins is followed by another section with different
    margins or page size, fragments from the earlier page can be rewritten to the later section’s x/width values.

    Balancing should use the page/section metrics for the section being rebalanced, not the final active layout state.

  2. Keep balancing document-wide column layouts
    When callers use LayoutOptions.columns without section break metadata, sectionColumnsMap is empty, so the
    section-based balancing loop never runs. That leaves the final page stacked in column 0.
    The previous activeColumns.count > 1 path handled document-wide multi-column layouts, so this needs a fallback for
    the active/options column config when there are no section-scoped entries.

  3. Preserve blank paragraph height when balancing columns
    A normal blank paragraph can measure as a line with width === 0, but it should still consume line height. The
    current logic treats that like a zero-height sectPr marker, so the column cursor does not advance and the next
    paragraph can overlap the blank line.

    This should be gated on actual section-property marker metadata rather than line width alone.

  4. Write table row boundaries using the fragment metadata shape
    When a dominant table is split for balancing, row boundaries are stored using the renderer’s serialized keys: i,
    h, min, r.
    TableFragmentMetadata.rowBoundaries expects index, height, minHeight, and resizable. Because the DOM
    renderer later serializes those contract fields, split table fragments can produce row-boundary data with undefined
    values, breaking row resize handles.

@harbournick
Copy link
Copy Markdown
Collaborator

@luccas-harbour can you pls work with Tadeu on getting this one to the finish line next week? Be extra careful around layout testing pls make sure no regressions!

@luccas-harbour
Copy link
Copy Markdown
Contributor

hey @tupizz! nothing to add right now apart from Nick's comment. I'll have another look once those are addressed.

also, note that the build is failing, which caused the behavior tests to fail running.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants