-
Notifications
You must be signed in to change notification settings - Fork 24
feat: add substitutions to rewriters #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing and improved descriptions
| # Ignore multiple statements on one line | ||
| E704, | ||
| per-file-ignores = | ||
| tests/*:D100,D101,D102,D103,D104 | ||
| __init__.py:F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Given that this only affects expression_rewriter.py, I suggest moving this ignore to per-file-ignores section.
| # Ignore multiple statements on one line | |
| E704, | |
| per-file-ignores = | |
| tests/*:D100,D101,D102,D103,D104 | |
| __init__.py:F401 | |
| per-file-ignores = | |
| tests/*:D100,D101,D102,D103,D104 | |
| __init__.py:F401 | |
| src/bartiq/analysis/rewriters/expression_rewriter.py:E704 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one other use of this in sympy_backend (link). I created that protocol during the refactor of the sympy backend, when we added arguments to the parser that broke some of the typing and this was the fix I found.
But, I am adding a per-file-ignore for the lambda functions in _wildcard_substitutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better to do it in a particular place where it occurs or just put a noqa in that specific file, rather than here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy either way, I only updated this file because I thought having noqa: <> in multiple different places looked a bit messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting noqa in specific file would not work (or maybe I was doing something wrong), so I think we are down to either local noqas or this per-file-ignores in the config. I think per-file-ignores makes sense in this particular case, because TBH how this rule is implemented is messed up. There is no way to exclude ellipsis on the same line (which is a common formatting practice for protocol methods), and ignoring each one of those separately seems messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good to me!
| symbol_or_expr: str | ||
| replacement: str | ||
| backend: SymbolicBackend | ||
| wild: list[str] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Let's make it a tuple, it fits better the this frozen dataclass.
| wild: list[str] = field(default_factory=list) | |
| wild: tuple[str, ...] = field(default_factory=tuple) |
| "positive": ((gt or gte) and value_positive) or None, | ||
| "negative": ((lt or lte) and value_negative) or None, | ||
| } | ||
| return {"positive": ((gt or gte) and value_positive) or None, "negative": ((lt or lte) and value_negative) or None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I missed this in the previous review, but we don't handle the case of 0 correctly, or at least the naming is incorrect.
- If
reference_valueis 0, bothvalue_negativeandvalue_positiveare true. - Depending on the operator, either "positive" or "negative" is set to true, but if
gteorlte, the symbol could be actually 0, which is neither positive nor negative.
Maybe we just should call those keys "nonnegative" and "nonpositive" instead of "positive" and "negative"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about positive_or_zero and negative_or_zero? I know these aren't user-facing variables anyway, but because nonnegative and nonpositive are defining something as the negative of something else, it takes me an extra beat to parse the info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way. I think nonnegative and nonpositive are natural words to use when you're mathematicians, otherwise they take a bit to parse.
I think that this is a place in the code where the choice will not be very consequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would affect the usage of this class in the SympyRewriter. At the moment, I can create a symbol from an assumption like :
sym = Symbol("sym", **assumption.symbol_properties)If we moved to nonnegative and nonpositive, which are also sympy predicates, we get this unexpected interaction:
from sympy import Symbol
a = Symbol('a', nonnegative=True)
a.is_positive # NoneSo even changing the name to positive_or_zero would require some logic change, i.e.
props = assumption.symbol_properties()
sym = Symbol(
"sym",
**{
positive: props["nonnegative" / "positive_or_zero"],
negative: props["nonpositive" / "negative_or_zero"]
}
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I think I didn't get that. What is wrong with this code?
from sympy import Symbol
a = Symbol('a', nonnegative=True)
a.is_positive # NoneTo me it tells the truth: it is not known if a is positive.
Also, "nonnegative" and "nonpositive" are correct mathematical words to describe something that is >= or <= 0, but I am not insisting on those. Whatever you decide just make sure that it is precise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that the following would not simplify:
rewriter = sympy_rewriter_factory("max(0,a)")
rewriter.assume("a>=0") # If we use nonnegative predicate
>>> max(0, a) This, to me, is certainly unexpected.
Nevermind, this is not true!
| @@ -0,0 +1,302 @@ | |||
| # Copyright 2025 PsiQuantum, Corp. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure to git mv this file so this exact scenario didn't happen, but here we are.
| # Ignore multiple statements on one line | ||
| E704, | ||
| per-file-ignores = | ||
| tests/*:D100,D101,D102,D103,D104 | ||
| __init__.py:F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better to do it in a particular place where it occurs or just put a noqa in that specific file, rather than here?
| ) | ||
|
|
||
| def _unwrap_history(self) -> list[tuple[Instruction | str, ExpressionRewriter[T] | None]]: | ||
| def _unwrap_history(self) -> list[tuple[Instruction, ExpressionRewriter[T] | None]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] I'm wondering if having a history like this will not slow things down a lot when you have done a lot of tinkering and have a long history, especially for really gnarly expressions.
Have you tried working with an object that has history of length 100+?
I don't really think it would big enough to clog memory or degrade the performance, as history is used as "write-only" for most of the time (I think?).
But I'd feel better if you spent 15 minutes checking it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll answer this because it was me who suggested immutable rewriters, for which the current implementation of history is the only one that makes sense.
Storing of the history is almost free in terms of both time and memory, because each new rewriter instance stores only one existing instance (namely: the one it originated from). Therefore, comparing to not having a history, the cost is roughly around 1 variable assignment + passing variable to the initializer, which is miniscule.
Retrieving history is of course linear in the number of history items. A quick benchmark that I just performed shows that obtaining a history of a rewriter after a 1000 steps takes around 61 microseconds on my M1. In the benchmark, I created an expression summing variables x_1 + ... x_1000, and then substituted x_i -> y_i one by one (to get this nice 1000-items long history). Also, I measured the memory footprint of a rewriter with such history, and the whole thing was talking around 1.7MB.
Nerding out on the impact of storing the history.
To properly isolate how storing a history in this way impacts overall performance, we can perform a simple benchmark with two classes differing the same operation, except one stores the history and one does not.
@dataclass
class Foo:
x: int
previous: Foo | None = None
def next(self):
return replace(self, x=self.x+1, previous=self)
@dataclass
class Bar:
x: int
def next(self):
return replace(self, x=self.x+1)
def f(cls, n_iters):
obj = cls(1)
for _ in range(n_iters):
obj = obj.next()Measuring f(Foo, n) and f(Bar, n) on my machine shows that the difference in performance is roughly 0.7 microseconds per 10 iterations, which is negligible.
|
|
||
| def substitute(self, symbol_or_expr: str, replace_with: str) -> Self: | ||
| """Substitute a symbol or subexpression for another symbol or subexpression. | ||
| By default performs a one-to-one mapping, unless wildcard symbols are implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] "unless wildcard symbols are implemented. -> "unless there are wildcards present in replace_with" ? I'm not entirely sure what you meant here.
Also, it looks to me that this docstring (and some other as well) are missing Args section :)
|
|
||
| @dataclass(frozen=True) | ||
| class Substitution(Instruction): | ||
| """Substitute a symbol with an expression.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] class attributes lack description. In particular wild, it's unclear to me what it represents.
Also, I personally wouldn't call it symbol_or_expr, but rather just expr – symbol is just a special case of a "single symbol expression", no?
| At present this only detects positivity/negativity. | ||
| If the properties are unknowable due to lack of information, they are None. | ||
| At present this only detects positivity/negativity. The two are calculated independently, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] this only detects positivity/negativity -> this only detects positivity/negativity of a given symbol/expression ?
| def __post_init__(self): | ||
| object.__setattr__(self, "wild", _get_wild_characters(self.symbol_or_expr)) | ||
|
|
||
| def _get_linked_parameters(self) -> dict[str, Iterable[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[issue] I don't like this method, I think there are a couple of reasons:
- It's a private method but it's used in
expression_rewriter. Maybe it should be public then? - It could be just a function, which takes in
Substitutionobject, doesn't need to be a class method. It's the onlyInstructionwhich has some special methods defined (exceptfrom_strforAssumption, but that's 100% sensible to me), so it breaks symmetry a bit.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a nice middle ground:
- Made it a private function that takes in a
Substitution - Added a new input to the
Substitutiondataclass that hasinit=False - set the attr in the
_post_init_ - Logic is outside
Substitution, but the attribute is kept there. i.e function is no longer needed inexpression_rewriter
| "positive": ((gt or gte) and value_positive) or None, | ||
| "negative": ((lt or lte) and value_negative) or None, | ||
| } | ||
| return {"positive": ((gt or gte) and value_positive) or None, "negative": ((lt or lte) and value_negative) or None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way. I think nonnegative and nonpositive are natural words to use when you're mathematicians, otherwise they take a bit to parse.
I think that this is a place in the code where the choice will not be very consequential.
| Initial(), | ||
| Expand(), | ||
| Simplify(), | ||
| Assumption.from_string("beth>0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Actually, why it's not justAssumption("beth>0")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to get that format working a couple of different times, and while it is possible, the logic is very messy.
Unless we want to move to only allowing strings as input, i.e.:
@dataclass(frozen=True)
class Assumption(Instruction):
assumption_string: strin order to have the current implementation also accept strings as well as the three current arguments, we'd have to overwrite __new__ to handle strings only, and fallback to the init for everything else. And because it's a dataclass, it's annoying.
I did this a few weeks ago when I was writing the Assumptions, and when I finally got it working the class looked like a complete mess. I decided it wasn't worth it, because users most likely won't interact with Assumption directly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you for the clarification!
dexter2206
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments have been addressed.
| # Ignore multiple statements on one line | ||
| E704, | ||
| per-file-ignores = | ||
| tests/*:D100,D101,D102,D103,D104 | ||
| __init__.py:F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good to me!
| Initial(), | ||
| Expand(), | ||
| Simplify(), | ||
| Assumption.from_string("beth>0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you for the clarification!
Description
A mammoth PR. Happy to split this into two or more if necessary.
Cliff Notes
...:blackforces...commands onto the same line as thedef, which flake8 hates. Conversely, having apassstatement instead of...raises this as dead code in pytest, and contributes to lower test coverage. I think adding this code to be ignored is fine, because all other instances of multiple statements on one line will be caught and corrected by black!sympy_rewriterfactory method inbartiq/analysis/rewriters. Also, made this folder public.InstructionsEnum and trying to makeAssumptionsandSubstitutionsfit that format, I made a dummy class that all instructions inherit from, includingAssumptions. This makes type checking nicer.substituteand_substitutemethods to the base class, and defined_substitutein theSympyExpressionRewriter. Logic here is fairly gnarly!focusmethod such that linked parameters are included. i.e. for an expression with a variablea, if we substituteawithb, and try tofocus('a'), it will show us all terms withb. This works for multiple levels of substitution.Please verify that you have completed the following steps