Bug: in Photonics2D, simulate(returned_design) ≠ optimization_history[-1]
What you'd expect
If you optimise a design and then simulate the design optimize() returns, the score simulate() reports should match the last value in optimization_history. They're meant to be the same number.
What actually happens
The two disagree by ~50%.
import numpy as np
from engibench.utils.all_problems import BUILTIN_PROBLEMS
problem = BUILTIN_PROBLEMS["photonics2d"](seed=7)
start = np.full(problem.design_space.shape, 0.5, dtype=np.float32)
design, history = problem.optimize(start, config={"num_optimization_steps": 30})
score = problem.simulate(design)
print("history[-1] =", history[-1].obj_values[0]) # 3.976
print("simulate() =", score[0]) # 1.826
Why
Both optimize and simulate push the design rho through the same internal pipeline before scoring it:
rho → blur → project → ε(rho) → FDFD → score
That's defined once in epsr_parameterization and re-used everywhere. The optimiser's history records the score of rho after this pipeline, step by step.
The bug is that optimize() runs project one more time on the design before returning it (v0.py:457–462):
rho_optimum = rho_optimum_flat.reshape((num_elems_x, num_elems_y))
rho_optimum = operator_proj(rho_optimum, ...) # ← extra projection
rho_optimum = np.rint(rho_optimum) # ← then rounding
return rho_optimum.astype(np.float32), opti_steps_history
So the design that gets returned is not the design Adam was scoring. When you call simulate() on the returned design, the pipeline runs blur+project on something that has already been projected — the second blur smears the binary pattern, the second projection lands on a different ε, and the score drops.
Picture:
What history recorded: rho ─→ blur ─→ project ─→ score = 3.976
What is returned: rho ─→ project ─→ rint ─→ returned_design
What simulate computes: returned_design ─→ blur ─→ project ─→ score = 1.826
^^^^^^^^^^^^^^^^^
the "extra round" through
blur+project causes the gap
(np.rint itself doesn't cost anything — at high β the projection is already nearly binary, so rounding is a no-op. The damage comes from the extra project.)
Fix
Just don't post-process. Return the design Adam was actually optimising:
# v0.py:457–462
rho_optimum = rho_optimum_flat.reshape((num_elems_x, num_elems_y)).astype(np.float32)
return rho_optimum, opti_steps_history
Now simulate() runs the pipeline once on the same rho Adam was scoring, and the numbers agree.
Proof
Same script as above, with the fix applied:
BEFORE: history[-1] = 3.976, simulate() = 1.826, gap = +2.150
AFTER: history[-1] = 3.976, simulate() = 3.990, gap = +0.014 ← float noise
Regression test
def test_optimize_simulate_consistency():
problem = Photonics2D(seed=0)
start = np.full(problem.design_space.shape, 0.5, dtype=np.float32)
design, history = problem.optimize(start, config={"num_optimization_steps": 20})
sim = problem.simulate(design)
assert np.isclose(sim[0], history[-1].obj_values[0], rtol=1e-2)
Why this matters for users
Anything that compares "the score the optimiser thought it achieved" to "the score simulate() reports" is reading two different numbers. In particular:
- The published dataset was generated with the buggy code. The saved
optimal_design is the binarised post-projection design, while the saved optimization_history was logged on the continuous design. So dataset["total_overlap"] (= simulate(saved_design)) is much smaller than |optimization_history[-1]| for the same row — by exactly the same gap mechanism.
- With the fix, fresh
simulate(returned_design) ≈ history[-1] — they agree, as you'd expect.
- Existing dataset entries are frozen artifacts of the bug; comparing fresh fixed runs to old saved
total_overlap will look favourable until the dataset is regenerated.
- A separate, smaller note: the saved
optimization_history is sign-flipped vs. what current optimize() returns (stored as negative −(total_overlap − penalty), current code returns positive). Worth either regenerating the dataset or adding a sign conversion in the loader.
Possible same-class bug elsewhere
Worth a sweep of other Problem subclasses for optimize() methods that post-process a design before returning it without re-recording the final history step on the returned design.
Bug: in
Photonics2D,simulate(returned_design) ≠ optimization_history[-1]What you'd expect
If you optimise a design and then simulate the design
optimize()returns, the scoresimulate()reports should match the last value inoptimization_history. They're meant to be the same number.What actually happens
The two disagree by ~50%.
Why
Both
optimizeandsimulatepush the designrhothrough the same internal pipeline before scoring it:That's defined once in
epsr_parameterizationand re-used everywhere. The optimiser's history records the score ofrhoafter this pipeline, step by step.The bug is that
optimize()runsprojectone more time on the design before returning it (v0.py:457–462):So the design that gets returned is not the design Adam was scoring. When you call
simulate()on the returned design, the pipeline runs blur+project on something that has already been projected — the second blur smears the binary pattern, the second projection lands on a different ε, and the score drops.Picture:
(
np.rintitself doesn't cost anything — at high β the projection is already nearly binary, so rounding is a no-op. The damage comes from the extraproject.)Fix
Just don't post-process. Return the design Adam was actually optimising:
Now
simulate()runs the pipeline once on the samerhoAdam was scoring, and the numbers agree.Proof
Same script as above, with the fix applied:
Regression test
Why this matters for users
Anything that compares "the score the optimiser thought it achieved" to "the score
simulate()reports" is reading two different numbers. In particular:optimal_designis the binarised post-projection design, while the savedoptimization_historywas logged on the continuous design. Sodataset["total_overlap"](=simulate(saved_design)) is much smaller than|optimization_history[-1]|for the same row — by exactly the same gap mechanism.simulate(returned_design) ≈ history[-1]— they agree, as you'd expect.total_overlapwill look favourable until the dataset is regenerated.optimization_historyis sign-flipped vs. what currentoptimize()returns (stored as negative−(total_overlap − penalty), current code returns positive). Worth either regenerating the dataset or adding a sign conversion in the loader.Possible same-class bug elsewhere
Worth a sweep of other
Problemsubclasses foroptimize()methods that post-process a design before returning it without re-recording the final history step on the returned design.