Skip to content

_sum() stacks dead terms from masked variables #16

@FBumann

Description

@FBumann

Problem

A variable created with a sparse mask has labels = -1 for inactive positions. When .sum(dim) is called, it stacks the entire dimension into _term via .stack() — including all masked-out entries where vars == -1. These dead terms then propagate through every downstream operation.

Where in the code: expressions.py _sum(), lines ~1172-1176:

ds = (
    data[["coeffs", "vars"]]
    .reset_index(dim, drop=True)
    .rename({TERM_DIM: STACKED_TERM_DIM})
    .stack({TERM_DIM: [STACKED_TERM_DIM] + dim}, create_index=False)
)

This .stack() blindly includes all entries along dim, regardless of whether vars == -1.

Impact

With 300 contributors, 20 effects, 30 active per effect: _term becomes 300 instead of 30, with 90% dead terms. Every downstream op (-, merge, add_constraints) pays for the 90% waste. At real-world scale (277 contributors, 22 effects, 2190 timesteps) this contributed to 44+ GB OOM.

Reproducer

dev-scripts/story1.py:

import numpy as np, xarray as xr, linopy
m = linopy.Model()
mask = xr.DataArray(np.zeros((300, 20), dtype=bool), dims=["contributor", "effect"])
rng = np.random.default_rng(42)
for e in range(20):
    mask.values[rng.choice(300, 30, replace=False), e] = True
var = m.add_variables(coords=[range(300), range(20), range(2000)],
                      dims=["contributor", "effect", "time"],
                      name="share", mask=mask)
expr = var.sum("contributor")
print(expr.sizes["_term"])                   # 300 — should be ~30
print((expr.data.vars.values == -1).mean())  # 90% dead

What needs to change

_sum() should filter or compact dead terms (vars == -1) before or after the .stack(). The challenge is that the mask varies per slice along the remaining dimensions (each effect has a different set of active contributors), so a simple pre-filter isn't possible — but post-stack compaction (dropping vars == -1 terms) or a per-slice approach could work.

Neither PR #12 nor #13 addresses this issue.

Profiling

See branch feature/memory-usage-issues, dev-scripts/story1.py and dev-scripts/story1_profile.md.

Part of #14.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions