feat: add substitutions to rewriters #238

Brendan-Reid1991 · 2025-07-04T09:09:48Z

Description

A mammoth PR. Happy to split this into two or more if necessary.

Cliff Notes

Added an ignore code to flake8. When defining abstract methods with ...: black forces ... commands onto the same line as the def, which flake8 hates. Conversely, having a pass statement instead of ... raises this as dead code in pytest, and contributes to lower test coverage. I think adding this code to be ignored is fine, because all other instances of multiple statements on one line will be caught and corrected by black!
Created and exposed the sympy_rewriter factory method in bartiq/analysis/rewriters. Also, made this folder public.
Rather than an Instructions Enum and trying to make Assumptions and Substitutions fit that format, I made a dummy class that all instructions inherit from, including Assumptions. This makes type checking nicer.
Added substitute and _substitute methods to the base class, and defined _substitute in the SympyExpressionRewriter. Logic here is fairly gnarly!
Updated the focus method such that linked parameters are included. i.e. for an expression with a variable a, if we substitute a with b, and try to focus('a'), it will show us all terms with b. This works for multiple levels of substitution.
Updated tests to reflect all these changes

Please verify that you have completed the following steps

I have self-reviewed my code.
I have included test cases validating introduced feature/fix.
I have updated documentation.

…ing and improved descriptions

dexter2206 · 2025-07-04T13:40:04Z

.flake8

+    # Ignore multiple statements on one line
+    E704,
 per-file-ignores =
    tests/*:D100,D101,D102,D103,D104
    __init__.py:F401


suggestion: Given that this only affects expression_rewriter.py, I suggest moving this ignore to per-file-ignores section.

Suggested change

# Ignore multiple statements on one line

E704,

per-file-ignores =

tests/*:D100,D101,D102,D103,D104

__init__.py:F401

per-file-ignores =

tests/*:D100,D101,D102,D103,D104

__init__.py:F401

src/bartiq/analysis/rewriters/expression_rewriter.py:E704

There is one other use of this in sympy_backend (link). I created that protocol during the refactor of the sympy backend, when we added arguments to the parser that broke some of the typing and this was the fix I found.

But, I am adding a per-file-ignore for the lambda functions in _wildcard_substitutions.

IMO it would be better to do it in a particular place where it occurs or just put a noqa in that specific file, rather than here?

I'm happy either way, I only updated this file because I thought having noqa: <> in multiple different places looked a bit messy.

Putting noqa in specific file would not work (or maybe I was doing something wrong), so I think we are down to either local noqas or this per-file-ignores in the config. I think per-file-ignores makes sense in this particular case, because TBH how this rule is implemented is messed up. There is no way to exclude ellipsis on the same line (which is a common formatting practice for protocol methods), and ignoring each one of those separately seems messy.

Ok, sounds good to me!

src/bartiq/analysis/rewriters/expression_rewriter.py

src/bartiq/analysis/rewriters/sympy_rewriter.py

dexter2206 · 2025-07-04T14:23:21Z

src/bartiq/analysis/rewriters/utils.py

+    symbol_or_expr: str
+    replacement: str
+    backend: SymbolicBackend
+    wild: list[str] = field(default_factory=list)


suggestion: Let's make it a tuple, it fits better the this frozen dataclass.

Suggested change

wild: list[str] = field(default_factory=list)

wild: tuple[str, ...] = field(default_factory=tuple)

src/bartiq/analysis/rewriters/utils.py

dexter2206 · 2025-07-04T14:40:59Z

src/bartiq/analysis/rewriters/utils.py

-        "positive": ((gt or gte) and value_positive) or None,
-        "negative": ((lt or lte) and value_negative) or None,
-    }
+    return {"positive": ((gt or gte) and value_positive) or None, "negative": ((lt or lte) and value_negative) or None}


issue: I missed this in the previous review, but we don't handle the case of 0 correctly, or at least the naming is incorrect.

If reference_value is 0, both value_negative and value_positive are true.

Depending on the operator, either "positive" or "negative" is set to true, but if gte or lte, the symbol could be actually 0, which is neither positive nor negative.

Maybe we just should call those keys "nonnegative" and "nonpositive" instead of "positive" and "negative"?

How about positive_or_zero and negative_or_zero? I know these aren't user-facing variables anyway, but because nonnegative and nonpositive are defining something as the negative of something else, it takes me an extra beat to parse the info

I'm fine either way. I think nonnegative and nonpositive are natural words to use when you're mathematicians, otherwise they take a bit to parse.
I think that this is a place in the code where the choice will not be very consequential.

It would affect the usage of this class in the SympyRewriter. At the moment, I can create a symbol from an assumption like :

sym = Symbol("sym", **assumption.symbol_properties)

If we moved to nonnegative and nonpositive, which are also sympy predicates, we get this unexpected interaction:

from sympy import Symbol a = Symbol('a', nonnegative=True) a.is_positive # None

So even changing the name to positive_or_zero would require some logic change, i.e.

props = assumption.symbol_properties() sym = Symbol( "sym", **{ positive: props["nonnegative" / "positive_or_zero"], negative: props["nonpositive" / "negative_or_zero"] } )

Wait, I think I didn't get that. What is wrong with this code?

from sympy import Symbol a = Symbol('a', nonnegative=True) a.is_positive # None

To me it tells the truth: it is not known if a is positive.

Also, "nonnegative" and "nonpositive" are correct mathematical words to describe something that is >= or <= 0, but I am not insisting on those. Whatever you decide just make sure that it is precise :)

~~It means that the following would not simplify:~~

rewriter = sympy_rewriter_factory("max(0,a)") rewriter.assume("a>=0") # If we use nonnegative predicate >>> max(0, a)

~~This, to me, is certainly unexpected.~~

Nevermind, this is not true!

src/bartiq/analysis/rewriters/sympy_rewriter.py

Brendan-Reid1991 · 2025-07-07T08:15:53Z

src/bartiq/analysis/rewriters/sympy_rewriter.py

@@ -0,0 +1,302 @@
+# Copyright 2025 PsiQuantum, Corp.


I made sure to git mv this file so this exact scenario didn't happen, but here we are.

mstechly · 2025-07-09T09:49:11Z

.flake8

+    # Ignore multiple statements on one line
+    E704,
 per-file-ignores =
    tests/*:D100,D101,D102,D103,D104
    __init__.py:F401


IMO it would be better to do it in a particular place where it occurs or just put a noqa in that specific file, rather than here?

mstechly · 2025-07-09T09:57:44Z

src/bartiq/analysis/rewriters/expression_rewriter.py

+        )

-    def _unwrap_history(self) -> list[tuple[Instruction | str, ExpressionRewriter[T] | None]]:
+    def _unwrap_history(self) -> list[tuple[Instruction, ExpressionRewriter[T] | None]]:


[question] I'm wondering if having a history like this will not slow things down a lot when you have done a lot of tinkering and have a long history, especially for really gnarly expressions.
Have you tried working with an object that has history of length 100+?
I don't really think it would big enough to clog memory or degrade the performance, as history is used as "write-only" for most of the time (I think?).
But I'd feel better if you spent 15 minutes checking it :)

I'll answer this because it was me who suggested immutable rewriters, for which the current implementation of history is the only one that makes sense.

Storing of the history is almost free in terms of both time and memory, because each new rewriter instance stores only one existing instance (namely: the one it originated from). Therefore, comparing to not having a history, the cost is roughly around 1 variable assignment + passing variable to the initializer, which is miniscule.

Retrieving history is of course linear in the number of history items. A quick benchmark that I just performed shows that obtaining a history of a rewriter after a 1000 steps takes around 61 microseconds on my M1. In the benchmark, I created an expression summing variables x_1 + ... x_1000, and then substituted x_i -> y_i one by one (to get this nice 1000-items long history). Also, I measured the memory footprint of a rewriter with such history, and the whole thing was talking around 1.7MB.

Nerding out on the impact of storing the history.
To properly isolate how storing a history in this way impacts overall performance, we can perform a simple benchmark with two classes differing the same operation, except one stores the history and one does not.

@dataclass class Foo: x: int previous: Foo | None = None def next(self): return replace(self, x=self.x+1, previous=self) @dataclass class Bar: x: int def next(self): return replace(self, x=self.x+1) def f(cls, n_iters): obj = cls(1) for _ in range(n_iters): obj = obj.next()

Measuring f(Foo, n) and f(Bar, n) on my machine shows that the difference in performance is roughly 0.7 microseconds per 10 iterations, which is negligible.

mstechly · 2025-07-09T10:01:08Z

src/bartiq/analysis/rewriters/expression_rewriter.py

+
+    def substitute(self, symbol_or_expr: str, replace_with: str) -> Self:
+        """Substitute a symbol or subexpression for another symbol or subexpression.
+        By default performs a one-to-one mapping, unless wildcard symbols are implemented.


[minor] "unless wildcard symbols are implemented. -> "unless there are wildcards present in replace_with" ? I'm not entirely sure what you meant here.

Also, it looks to me that this docstring (and some other as well) are missing Args section :)

src/bartiq/analysis/rewriters/utils.py

mstechly · 2025-07-09T10:08:52Z

src/bartiq/analysis/rewriters/utils.py


+@dataclass(frozen=True)
+class Substitution(Instruction):
+    """Substitute a symbol with an expression."""


[minor] class attributes lack description. In particular wild, it's unclear to me what it represents.
Also, I personally wouldn't call it symbol_or_expr, but rather just expr – symbol is just a special case of a "single symbol expression", no?

mstechly · 2025-07-09T10:12:30Z

src/bartiq/analysis/rewriters/utils.py

-    At present this only detects positivity/negativity.
-
-    If the properties are unknowable due to lack of information, they are None.
+    At present this only detects positivity/negativity. The two are calculated independently,


[nitpick] this only detects positivity/negativity -> this only detects positivity/negativity of a given symbol/expression ?

mstechly · 2025-07-09T10:22:41Z

src/bartiq/analysis/rewriters/utils.py

+    def __post_init__(self):
+        object.__setattr__(self, "wild", _get_wild_characters(self.symbol_or_expr))
+
+    def _get_linked_parameters(self) -> dict[str, Iterable[str]]:


[issue] I don't like this method, I think there are a couple of reasons:

It's a private method but it's used in expression_rewriter. Maybe it should be public then?

It could be just a function, which takes in Substitution object, doesn't need to be a class method. It's the only Instruction which has some special methods defined (except from_str for Assumption, but that's 100% sensible to me), so it breaks symmetry a bit.

Thoughts?

Possibly a nice middle ground:

Made it a private function that takes in a Substitution

Added a new input to the Substitution dataclass that has init=False

set the attr in the _post_init_

Logic is outside Substitution, but the attribute is kept there. i.e function is no longer needed in expression_rewriter

mstechly · 2025-07-09T10:24:50Z

src/bartiq/analysis/rewriters/utils.py

-        "positive": ((gt or gte) and value_positive) or None,
-        "negative": ((lt or lte) and value_negative) or None,
-    }
+    return {"positive": ((gt or gte) and value_positive) or None, "negative": ((lt or lte) and value_negative) or None}


I'm fine either way. I think nonnegative and nonpositive are natural words to use when you're mathematicians, otherwise they take a bit to parse.
I think that this is a place in the code where the choice will not be very consequential.

mstechly · 2025-07-09T10:27:07Z

tests/analysis/rewriters/basic_rewriter_tests.py

+            Initial(),
+            Expand(),
+            Simplify(),
+            Assumption.from_string("beth>0"),


[question] Actually, why it's not justAssumption("beth>0")?

I tried to get that format working a couple of different times, and while it is possible, the logic is very messy.

Unless we want to move to only allowing strings as input, i.e.:

@dataclass(frozen=True) class Assumption(Instruction): assumption_string: str

in order to have the current implementation also accept strings as well as the three current arguments, we'd have to overwrite __new__ to handle strings only, and fallback to the init for everything else. And because it's a dataclass, it's annoying.

I did this a few weeks ago when I was writing the Assumptions, and when I finally got it working the class looked like a complete mess. I decided it wasn't worth it, because users most likely won't interact with Assumption directly anyway.

Ok, thank you for the clarification!

dexter2206

My comments have been addressed.

mstechly · 2025-07-10T08:31:45Z

.flake8

+    # Ignore multiple statements on one line
+    E704,
 per-file-ignores =
    tests/*:D100,D101,D102,D103,D104
    __init__.py:F401


Ok, sounds good to me!

mstechly · 2025-07-10T08:33:29Z

tests/analysis/rewriters/basic_rewriter_tests.py

+            Initial(),
+            Expand(),
+            Simplify(),
+            Assumption.from_string("beth>0"),


Ok, thank you for the clarification!

Brendan-Reid1991 and others added 30 commits May 9, 2025 16:54

chore: Updated tutorials with Compilation Flags

473ffd8

refactor: creating analysis submodule.

b7753c1

feat: Began work on sympy manipulator

8ee76d7

feat: added instruction types enum; adding functionality to SM

9c4665f

refactor: split base class + sympyManipulator into separate modules

84c8dfa

chore: removed generic types

b6e14c7

refactor: Moved InstructionsType enum

415d2cf

feat: Added focus method. Generic types added

af32cbf

refactor: Renamed SympyManipulation -> SympyManipulator

2bf8f30

fix: removed testing code

efea576

refactor: renamed analysis.py -> optimization.py

b5dcdf4

refactor: created analysis module

3f8f32a

feat: created rewriters module

b98cf54

feat: added expressionwriter abc

03427b5

feat: subclassed from expressionwriter

e0dd510

chore: improving typing

350d422

feat: Added basic tests for rewriters

c62ec31

chore: improved typing, added docstrings

176b25d

feat: improved typing; added docstrings

122a23c

chore: ran isort

ce196c2

Merge branch 'main' into symbolic_manipulator

2b525c0

chore: fixed typing issues

e071b86

chore: added module level docstrings

e4fba55

chore: removed unused import

2bb75cd

chore: erroneous line introduced

41ab585

chore: added module level docstring

1b270ab

chore: improved typing in update_expression

6b3f738

chore: added module level docstring; removed return section of docstr…

26de84c

…ing and improved descriptions

chore: Type -> type

29bc2ca

Merge branch 'main' into symbolic_manipulator

e2e43f3

Brendan-Reid1991 added 3 commits July 3, 2025 17:17

improving tests

e0c7a35

tighting typing

53c966c

added tests for linked parameters in focus

3589be9

cla-bot bot added the cla-signed label Jul 4, 2025

Brendan-Reid1991 changed the title ~~Substitutions~~ feat: add substitutions to rewriters Jul 4, 2025

dexter2206 requested changes Jul 4, 2025

View reviewed changes

Brendan-Reid1991 added 7 commits July 4, 2025 16:44

added per-file ignore code for lambdas

14fe72c

made wild chars stored in tuples rather than list

049a4d3

removed None type from original_expression

8e1419c

removed noqa

a8b22fd

simplified logic after Konrads comments

f55e1f9

minor typing typo (typ-o)

38f50dd

fixing typos in docstring

aa63931

Brendan-Reid1991 commented Jul 7, 2025

View reviewed changes

mstechly requested changes Jul 9, 2025

View reviewed changes

dexter2206 approved these changes Jul 9, 2025

View reviewed changes

Brendan-Reid1991 added 6 commits July 9, 2025 17:56

updating docstrings

5ab93c9

minor factor on substitutions

470690c

updating tests

a2a3d83

fixing attr error

35d078a

fixing tests

1728085

removed testig code

3a917b5

mstechly approved these changes Jul 10, 2025

View reviewed changes

Brendan-Reid1991 added 4 commits July 10, 2025 09:46

Merge branch 'main' into substitutions

a1b1f09

updated assumption properties derivation; improved docstrings

09c9bee

updated tests

418f453

missing word in docstring

9d5ea9b

Brendan-Reid1991 merged commit 3a65c93 into main Jul 10, 2025
9 checks passed

Brendan-Reid1991 deleted the substitutions branch July 10, 2025 10:35

release-please bot mentioned this pull request Jul 9, 2025

chore(main): release 0.14.0 #218

Merged

	wild: list[str] = field(default_factory=list)
	wild: tuple[str, ...] = field(default_factory=tuple)

feat: add substitutions to rewriters #238

feat: add substitutions to rewriters #238

Uh oh!

Conversation

Brendan-Reid1991 commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Cliff Notes

Please verify that you have completed the following steps

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Brendan-Reid1991 Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dexter2206 Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dexter2206 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Brendan-Reid1991 commented Jul 4, 2025 •

edited

Loading

Brendan-Reid1991 Jul 9, 2025 •

edited

Loading

dexter2206 Jul 9, 2025 •

edited

Loading