Add support for parsing decay descriptors#573
Conversation
|
Hi @admorris, I am not forgetting to check this. It's just that I have been working on urgent and important suff. Will get back to you very soon. |
|
|
||
| // Particle names start with alphanumeric/underscore and can then include | ||
| // common descriptor suffix symbols such as +, -, *, ', and ~. | ||
| PARTICLE: /[A-Za-z0-9_][A-Za-z0-9_+*'~̄-]*/ |
There was a problem hiding this comment.
This definition does not match that in https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/data/decfile.lark. Some name may be overlooked, such as those with parentheses?
There was a problem hiding this comment.
Allowing parentheses is causing a headache where it matches the parentheses from a sub-decay as part of the first or last particle name. I am trying to debug it without coming up with grotesque regex patterns.
There was a problem hiding this comment.
Ah, indeed that's not an easy one!
| def sub_decay(self, items: list[Any]) -> DecayChainDict: | ||
| # sub_decay: LPAR decay RPAR | ||
| for item in items: | ||
| if isinstance(item, dict): |
There was a problem hiding this comment.
Just for my understanding - this code here would only ever return the first found dict in the list. Is there some subtlety I'm missing for things to work overall, likely thanks to the way the Lark Transformer works?
There was a problem hiding this comment.
The items parameter is always a list. In this case we should always expect a list of length 1. Maybe I could raise an exception if it's a different lenght.
There was a problem hiding this comment.
Seems better IMO. Basically anything that is not in the expected format should get an exception.
| return cls(mother, decay_modes) | ||
|
|
||
| @classmethod | ||
| def from_string( |
There was a problem hiding this comment.
At some point it would make sense to "synchronise" this from_string function with the existing to_string one, since they should effectively be the "mirror of each other". Else one would name this function to from_descriptor. WDYT?
| descriptor : str | ||
| The decay descriptor string, e.g. | ||
| ``"D*+ -> (D0 -> K+ pi-) pi+"``. | ||
| grammar_file : str or Path, optional |
There was a problem hiding this comment.
This could be a place to state where the default grammar is available? Seems useful and relevant. This would be similar to the docs of edit_model_name_terminals in https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py. WDYT?
| cls, | ||
| descriptor: str, | ||
| *, | ||
| grammar_file: str | Path | None = None, |
There was a problem hiding this comment.
You could do similary to https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py#L300, meaning provide the name of the default grammar, stating where it is located (I realise the code docstring I am refering to could be improved a bit). Similarly, you could use https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py#L313 below?
|
|
||
| // Particle names start with alphanumeric/underscore and can then include | ||
| // common descriptor suffix symbols such as +, -, *, ', and ~. | ||
| PARTICLE: /[A-Za-z0-9_][A-Za-z0-9_+*'~̄-]*/ |
There was a problem hiding this comment.
In any case, why do you need this copy? Then worth having a comment about it in the file, I reckon.
There was a problem hiding this comment.
I see you do test with a double arrow. Yeah, just add a comment about it to be trivial to any reader :).
eduardo-rodrigues
left a comment
There was a problem hiding this comment.
Thank you so much for this, @admorris! It's a really nice enhancement 👍.
I left a few little suggestions but this is looking excellent anway.
I am well aware of the limitation you point out. It does annoy me.
Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch>
Implements the rest of #200
Added
DecayChain.from_stringmethodDecay descriptors are parsed with Lark. A
Transformerclass converts them intoDecayChainDictobjects, which are then used to initialiseDecayChainobjects.Custom descriptor formats can be used by pointing to another
.larkfile in an argument ofDecayChain.from_string. These pretty much only have the freedom to modifyARROW,LPARandRPAR. The rest of the structure is assumed by the Transformer. i.e. I did not find a way to support sub-decays written with the mother outside of braces likeA -> B (-> C D) EOne glaring limitation (which is inherent to
DecayChain/_build_decay_modes) is that sub-decays of identically named particles are not supported: e.g.."B_s0 -> (phi -> K+ K-) (phi -> K+ K-)"will result in an exception. This could possibly be handled by adding internal/hidden uniqueness when duplicates are encountered.