Skip to content

Conversation

@jthorton
Copy link
Contributor

@jthorton jthorton commented Sep 3, 2025

Fixes #566 by implementing an isin method on a ChemicalSystem which can check if an instance of type of a Component are present in the system and optionally return any matches.

Tips

  • Comment "pre-commit.ci autofix" to have pre-commit.ci atomically format your PR.
    Since this will create a commit, it is best to make this comment when you are finished with your work.

Checklist

  • Added a news entry

Developers certificate of origin

@jthorton jthorton requested review from IAlibay and atravitz September 3, 2025 12:00
@jthorton
Copy link
Contributor Author

jthorton commented Sep 3, 2025

pre-commit.ci autofix

@codecov
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (990033a) to head (dac4595).
⚠️ Report is 28 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #608   +/-   ##
=======================================
  Coverage   99.03%   99.04%           
=======================================
  Files          40       40           
  Lines        2394     2410   +16     
=======================================
+ Hits         2371     2387   +16     
  Misses         23       23           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a utility standpoint, this does everything we need it to. So I'm giving it my thumbs up with that hat on.

The only thing I'm not sure about is if the return should be tuple[bool, list[Component]] or just a list that could be empty if you're asking for return_matches.

matches.append(comp)

if return_matches:
return len(matches) > 0, matches
Copy link
Member

@IAlibay IAlibay Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding this here so that @atravitz can weigh in.

I'm not 100% sure if it might not be better for this to be just a return on matches. That way the return width is always 1 and len(matches) == 0 just means no matches.

edit: by not sure, I really do mean it, I can't tell which is more Pythonic.

Copy link
Contributor

@atravitz atravitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, I prefer the suggestion of returning an empty list if there are no matches.

@jthorton jthorton requested a review from atravitz September 4, 2025 09:23
@jthorton
Copy link
Contributor Author

jthorton commented Sep 4, 2025

pre-commit.ci autofix

@jthorton
Copy link
Contributor Author

jthorton commented Sep 4, 2025

pre-commit.ci autofix

@atravitz
Copy link
Contributor

atravitz commented Sep 4, 2025

@jthorton I'd like to step back and understand what problem we're trying to solve with this.

Language Note: I believe this function should be named contains, not isin, since we're querying whether the ChemicalSystem contains the Component. isin might work if we wanted the inverse, e.g. specific_component.isin(ChemicalSystem).

See pandas isin and contains as reference.

1. Checking for a specific Component present in a Chemical System

Currently, to check for the existence of a specific Component in a ChemicalSystem, we can use:

specific_component in my_chemical_system.components.values()

which I personally don't think needs a wrapper function, but I'm open to it, since I believe that's what is requested in #566.

If we something that's fast on large systems (by exiting after the first match found) and just returns a bool, I think that ChemicalSystem.contains() might be the move here:

class ChemicalSystem
...
    def contains(self, value: type[Component] | Component) -> bool:

where you don't even build up a list, just return a bool as soon as you find a match.

2. Checking for any instance of a Component present in a Chemical System

2a)

The ChemicalSystem.contains() function above would work for this as well, as long as we're okay with it taking in either a Component or a type.

2b)

Alternatively , we add a function ChemicalSystem.get_component_types() (or a property ChemicalSystem.component_types, no preference) that returns a set of all unique types present in the ChemicalSystem, such that you could do:

SolventComponent in my_chemical_system.get_component_types()

which would make the septop code way simpler:

required_component_types = [SolventComponent, ProteinComponent]
for component_type in required_component_types:
    if component_type not in state

3. Retrieving all instances of a Component Type that are present in a Chemical System

I think this makes sense as a separate function, since the isin function as-is will always return a list of length 1 that contains exactly the Component you pass in. This doesn't seem like a helpful behavior:

(from the tests)

isin, matches = solvated_complex.isin(prot_comp, return_matches=True)
assert isin is True
assert matches == [prot_comp]

In this case, I agree it'd be a helpful utility to have a function that returns all instances of given type within a chemical system, and this already is used in the get_single_comps() helper function here.

We can flip that around and make it a method of ChemicalSystem, e.g.:

class ChemicalSystem
... 
def get_components_of_type(self, type:type[Component])->list[Component]:
    # insert logic you already wrote for this

septop already uses this functionality, so we'd just be polishing it and putting it upstream in gufe

@ijpulidos
Copy link
Contributor

Oh, this is great. I also wrote a utility function for this in feflow for the protein mutation protocol. Just in case you find anything useful there, but also great to see we all need this functionality in some way. https://github.com/OpenFreeEnergy/feflow/blob/1bcfdd31d069f166adfa616fdc45faf4aa71085d/feflow/utils/misc.py#L12

I like the way to extract specific types of components instead of getting a list of all the types that we then have to dissect further. The main difference from my approach is that I'm using a set, I don't know, I like sets but that's about it haha.

Copy link
Contributor

@atravitz atravitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @jthorton! Just a couple things to address and some non-blocking thoughts. And optionally get @hannahbaumann and @ijpulidos feedback since they'll be users.

@atravitz atravitz added this to the Release 1.7 milestone Sep 18, 2025
@jthorton
Copy link
Contributor Author

pre-commit.ci autofix

@IAlibay IAlibay requested a review from atravitz September 29, 2025 09:01
@atravitz atravitz moved this to Sprint - In Progress in gufe : advancement sprints Oct 1, 2025
@atravitz atravitz moved this from Sprint - In Progress to Done in gufe : advancement sprints Oct 1, 2025
@atravitz atravitz moved this from Done to Sprint - In Review in gufe : advancement sprints Oct 1, 2025
Copy link
Contributor

@atravitz atravitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! I just fixed some docs formatting. merge when ready @jthorton

@jthorton jthorton merged commit 77a55a2 into main Oct 8, 2025
13 checks passed
@jthorton jthorton deleted the isin branch October 8, 2025 08:39
@github-project-automation github-project-automation bot moved this from Sprint - In Review to Done in gufe : advancement sprints Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

ChemicalSystem "isin" method to check if a Component belongs to a ChemicalSystem

5 participants