-
Notifications
You must be signed in to change notification settings - Fork 14
Add similarity cuts #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add similarity cuts #1057
Changes from all commits
4433b17
8c785af
0eb5139
9213e08
77c7df4
df01433
d9c6e63
608191c
3f92282
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -359,6 +359,8 @@ class CutsPolicy(enum.Enum): | |
| NOCUTS = "nocuts" | ||
| FROMFIT = "fromfit" | ||
| FROM_CUT_INTERSECTION_NAMESPACE = "fromintersection" | ||
| FROM_SIMILAR_PREDICTIONS_NAMESPACE = "fromsimilarpredictions" | ||
|
|
||
|
|
||
| class Cuts(TupleComp): | ||
| def __init__(self, name, path): | ||
|
|
@@ -396,6 +398,40 @@ def load(self): | |
| self._full = True | ||
| return np.arange(self.ndata) | ||
|
|
||
| class SimilarCuts(TupleComp): | ||
| def __init__(self, inputs, threshold): | ||
| if len(inputs) != 2: | ||
| raise ValueError("Expecting two input tuples") | ||
| firstcuts, secondcuts = inputs[0][0].cuts, inputs[1][0].cuts | ||
| if firstcuts != secondcuts: | ||
| raise ValueError("Expecting cuts to be the same for all datasets") | ||
| self.inputs = inputs | ||
| self.threshold = threshold | ||
| super().__init__(self.inputs, self.threshold) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Zaharid just a general question about why we are inheriting from
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The methods of >>> from validphys.core import PDF
>>> pdf1 = PDF("foo")
>>> pdf2 = PDF("foo")
>>> pdf1 == pdf2
TrueWhich is useful because as far as we are concerned in this case a PDF is fully defined by its name (there should only be one Answering where this gets used: probably by the reportengine resource builder at some point to find unique dependencies (but I don't know - Zahari can answer). However, also all the classes which have their There is also another double underscore method which means that classes print nicely: >>> pdf1
PDF(name='foo')
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh and I should say by default the first thing I said isn't the case: >>> class TestClass:
... def __init__(self, a, b): self.a = a; self.b = b
...
>>> cls1 = TestClass(1, 2)
>>> cls2 = TestClass(1, 2)
>>> cls1 == cls2
FalsePerhaps I misunderstood something but that's my understanding anyway
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TupleComp is mostly a simpleminded attempt to implement something like dataclasses from the standard library, before they existed (and with only the required properties). I would probably use that nowadays, despite some unwelcome complexity (and in fact this is done in various newer interfaces already such as the fkparser). Note there is #408.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you both! |
||
|
|
||
| def load(self): | ||
| # TODO: Update this when a suitable interace becomes available | ||
| from validphys.convolution import central_predictions | ||
| from validphys.commondataparser import load_commondata | ||
| from validphys.covmats import covmat_from_systematics | ||
|
|
||
| first, second = self.inputs | ||
| first_ds = first[0] | ||
| exp_err = np.sqrt( | ||
| np.diag( | ||
| covmat_from_systematics( | ||
| load_commondata(first_ds.commondata).with_cuts(first_ds.cuts) | ||
| ) | ||
| ) | ||
| ) | ||
| # Compute matched predictions | ||
| delta = np.abs( | ||
| (central_predictions(*first) - central_predictions(*second)).squeeze(axis=1) | ||
| ) | ||
| ratio = delta / exp_err | ||
| passed = ratio < self.threshold | ||
| return passed[passed].index | ||
|
|
||
|
|
||
| def cut_mask(cuts): | ||
| """Return an objects that will act as the cuts when applied as a slice""" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| from validphys.api import API | ||
| from validphys.core import SimilarCuts | ||
| from validphys.tests.conftest import THEORYID, PDF, DATA | ||
|
|
||
|
|
||
|
|
||
| def test_similarity_cuts(): | ||
| plain = [{"dataset": dt["dataset"]} for dt in DATA] | ||
| inp = { | ||
| "theoryid": THEORYID, | ||
| "pdf": PDF, | ||
| "cut_similarity_threshold": 1.5, | ||
| "use_cuts": "fromsimilarpredictions", | ||
| "cuts_intersection_spec": [{"dataset_inputs": DATA}, {"dataset_inputs": plain}], | ||
| "dataset_input": DATA[1], | ||
| } | ||
| ds = API.dataset(**inp) | ||
| assert isinstance(ds.cuts, SimilarCuts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not just set the
"cuts":matched_cutsor does that break something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems sub optimal to have to reproduce the exact same set of cuts, especially considering we saved them earlier but I don't know about patching objects into the namespace directly.