Skip to content

Add support for parsing decay descriptors#573

Open
admorris wants to merge 5 commits into
scikit-hep:mainfrom
admorris:descriptor_parsing
Open

Add support for parsing decay descriptors#573
admorris wants to merge 5 commits into
scikit-hep:mainfrom
admorris:descriptor_parsing

Conversation

@admorris
Copy link
Copy Markdown
Contributor

Implements the rest of #200

Added DecayChain.from_string method

Decay descriptors are parsed with Lark. A Transformer class converts them into DecayChainDict objects, which are then used to initialise DecayChain objects.

Custom descriptor formats can be used by pointing to another .lark file in an argument of DecayChain.from_string. These pretty much only have the freedom to modify ARROW, LPAR and RPAR. The rest of the structure is assumed by the Transformer. i.e. I did not find a way to support sub-decays written with the mother outside of braces like A -> B (-> C D) E

One glaring limitation (which is inherent to DecayChain/_build_decay_modes) is that sub-decays of identically named particles are not supported: e.g.. "B_s0 -> (phi -> K+ K-) (phi -> K+ K-)" will result in an exception. This could possibly be handled by adding internal/hidden uniqueness when duplicates are encountered.

@eduardo-rodrigues
Copy link
Copy Markdown
Member

Hi @admorris, I am not forgetting to check this. It's just that I have been working on urgent and important suff. Will get back to you very soon.

@eduardo-rodrigues eduardo-rodrigues added the enhancement New feature or request label May 1, 2026

// Particle names start with alphanumeric/underscore and can then include
// common descriptor suffix symbols such as +, -, *, ', and ~.
PARTICLE: /[A-Za-z0-9_][A-Za-z0-9_+*'~̄-]*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition does not match that in https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/data/decfile.lark. Some name may be overlooked, such as those with parentheses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing parentheses is causing a headache where it matches the parentheses from a sub-decay as part of the first or last particle name. I am trying to debug it without coming up with grotesque regex patterns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed that's not an easy one!

Comment thread src/decaylanguage/data/descriptor.lark
def sub_decay(self, items: list[Any]) -> DecayChainDict:
# sub_decay: LPAR decay RPAR
for item in items:
if isinstance(item, dict):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding - this code here would only ever return the first found dict in the list. Is there some subtlety I'm missing for things to work overall, likely thanks to the way the Lark Transformer works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items parameter is always a list. In this case we should always expect a list of length 1. Maybe I could raise an exception if it's a different lenght.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better IMO. Basically anything that is not in the expected format should get an exception.

Comment thread src/decaylanguage/decay/decay.py
return cls(mother, decay_modes)

@classmethod
def from_string(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it would make sense to "synchronise" this from_string function with the existing to_string one, since they should effectively be the "mirror of each other". Else one would name this function to from_descriptor. WDYT?

descriptor : str
The decay descriptor string, e.g.
``"D*+ -> (D0 -> K+ pi-) pi+"``.
grammar_file : str or Path, optional
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a place to state where the default grammar is available? Seems useful and relevant. This would be similar to the docs of edit_model_name_terminals in https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py. WDYT?

cls,
descriptor: str,
*,
grammar_file: str | Path | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do similary to https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py#L300, meaning provide the name of the default grammar, stating where it is located (I realise the code docstring I am refering to could be improved a bit). Similarly, you could use https://github.com/scikit-hep/decaylanguage/blob/main/src/decaylanguage/dec/dec.py#L313 below?

Comment thread src/decaylanguage/decay/decay.py Outdated

// Particle names start with alphanumeric/underscore and can then include
// common descriptor suffix symbols such as +, -, *, ', and ~.
PARTICLE: /[A-Za-z0-9_][A-Za-z0-9_+*'~̄-]*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Copy Markdown
Member

@eduardo-rodrigues eduardo-rodrigues May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, why do you need this copy? Then worth having a comment about it in the file, I reckon.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you do test with a double arrow. Yeah, just add a comment about it to be trivial to any reader :).

Copy link
Copy Markdown
Member

@eduardo-rodrigues eduardo-rodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this, @admorris! It's a really nice enhancement 👍.

I left a few little suggestions but this is looking excellent anway.

I am well aware of the limitation you point out. It does annoy me.

admorris and others added 3 commits May 7, 2026 17:18
Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants