Skip to content

Modifications to ITPParser#2408

Merged
richardjgowers merged 15 commits intoMDAnalysis:developfrom
lilyminium:fix-itp
Jan 27, 2020
Merged

Modifications to ITPParser#2408
richardjgowers merged 15 commits intoMDAnalysis:developfrom
lilyminium:fix-itp

Conversation

@lilyminium
Copy link
Member

Changes made in this Pull Request:

  • Types parsed to Bonds/Angles/Dihedrals/Impropers match that of TPRParser

Although, more interactions are treated as bonds by TPRParser/ITPParser than should be, according to the manual. These directives are pretty unusual in an ITP file, though.

Particles are considered bonded when they are connected by “chemical” bonds ([ bonds ] types 1 to 5, 7 or 8) or constraints ([ constraints ] type 1). Type 5 [ bonds ] can be used to create a connection between two atoms without creating an interac- tion. There is a harmonic interaction ([ bonds ] type 6) that does not connect the atoms by a chemical bond. There is also a second constraint type ([ constraints ] type 2) that fixes the distance, but does not connect the atoms by a chemical bond.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Nov 20, 2019

Codecov Report

Merging #2408 into develop will increase coverage by 0.05%.
The diff coverage is 95.2%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2408      +/-   ##
===========================================
+ Coverage    90.13%   90.18%   +0.05%     
===========================================
  Files          177      177              
  Lines        22529    22710     +181     
  Branches      2915     2944      +29     
===========================================
+ Hits         20306    20481     +175     
- Misses        1620     1627       +7     
+ Partials       603      602       -1
Impacted Files Coverage Δ
package/MDAnalysis/topology/base.py 95.74% <100%> (+0.39%) ⬆️
package/MDAnalysis/topology/ITPParser.py 95.22% <95.13%> (+1.4%) ⬆️
package/MDAnalysis/topology/tpr/utils.py 74.54% <0%> (-0.9%) ⬇️
package/MDAnalysis/__init__.py 91.89% <0%> (ø) ⬆️
__init__.py 91.89% <0%> (ø) ⬆️
topology/base.py 97.87% <0%> (+0.19%) ⬆️
package/MDAnalysis/topology/TPRParser.py 87.87% <0%> (+0.99%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0d435a...ce13299. Read the comment docs.

@richardjgowers
Copy link
Member

Related (but ancient) #588

So back in the past @orbeckst said that constraints aren't bonds, but then later wanted his water constraints to show up as bonds :)

@orbeckst
Copy link
Member

orbeckst commented Nov 20, 2019 via email

@jbarnoud
Copy link
Contributor

@jbarnoud might have some insights. I haven’t looked into the gmx topologies in detail in a while.

If I recall correctly, I used the logic quoted from the manual when I rewored the TPR parser.

@lilyminium
Copy link
Member Author

I think bond type 6 (HARMONIC) and constraints type 2 (CONSTRNC) are still included in creating bonds:

if ik_obj.name in ['BONDS', 'G96BONDS', 'MORSE', 'CUBICBONDS',
'CONNBONDS', 'HARMONIC', 'FENEBONDS',
'RESTRAINTPOT', 'CONSTR', 'CONSTRNC',
'TABBONDS', 'TABBONDSNC']:
bonds += list(ik_obj.process(ias))

@jbarnoud
Copy link
Contributor

Hum... They should not. It seems I mistook my "this is what I should do" with a "this is what I have done" (#588 (comment)).

@lilyminium
Copy link
Member Author

[ settles ] is a bit hard to do here since it requires knowing which atoms are Hs to add bonds from the constraints. Below is tip5p from OPLS to illustrate the problem. I'm not sure how GROMACS figures out constraints, unless the 2nd and 3rd atoms are always hydrogen?

After looking at the generic water ITPs they all define bonds if the FLEXIBLE keyword isn't passed in anyway. This means unless the #ifs are treated properly there'll be redundant interactions.

; original reference: [M. W. Mahoney and W. L. Jorgensen, J. Chem. Phys. 112 , 2000]
;
; Note that there are various issues with tip5p and the different forcefields.
; Discussion is here: http://redmine.gromacs.org/issues/1348

[ moleculetype ]
; molname       nrexcl
SOL             2

[ atoms ]
; id    at type res nr  residu name     at name         cg nr   charge
1       opls_118     1       SOL              OW             1       0
2       opls_119     1       SOL             HW1             1       0.241
3       opls_119     1       SOL             HW2             1       0.241
4       opls_120     1       SOL             LP1             1      -0.241
5       opls_120     1       SOL             LP2             1      -0.241

#ifndef FLEXIBLE

[ settles ]
; i     funct   doh     dhh
1       1       0.09572 0.15139

#else

[ bonds ]
; i     j       funct   length  force.c.
1       2       1       0.09572 502416.0 0.09572        502416.0
1       3       1       0.09572 502416.0 0.09572        502416.0

[ angles ]
; i     j       k       funct   angle   force.c.
2       1       3       1       104.52  628.02  104.52  628.02

#endif

[ virtual_sites3 ]
; The position of the virtual site is computed as follows:
;
; The distance from OW to L is 0.07 nm, the geometry is tetrahedral
; (109.47 deg)
; Therefore, a = b = 0.07 * cos (109.47/2) / | xOH1 + xOH2 |
;            c = 0.07 * sin (109.47/2) / | xOH1 X xOH2 |
;
; Using | xOH1 X xOH2 | = | xOH1 | | xOH2 | sin (H1-O-H2)
;       | xOH1 + xOH2 | = 2 | xOH1 | cos (H1-O-H2)
; Vsite pos x4 = x1 + a*x21 + b*x31 + c*(x21 X x31)

; Vsite from                    funct   a       b               c
4       1       2       3       4       -0.344908  -0.344908  -6.4437903493
5       1       2       3       4       -0.344908  -0.344908   6.4437903493

[ exclusions ]
1       2       3       4       5
2       1       3       4       5
3       1       2       4       5
4       1       2       3       5
5       1       2       3       4

@jbarnoud
Copy link
Contributor

This means unless the #ifs are treated properly there'll be redundant interactions.

I do not know how the ITP parser deals with the preprocessor instructions, but it does not make sense to read an ITP file without the set of defined variables.

An ITP file can describe completely different systems depending on these #if macros.

For [settle], the TPR file assumes (or use to assume at least) that the triangle was i, i+1, and i+2. Since the TPR file does not change the order of the atoms, I guess it is safe to make the same assumption for the ITP.

@jbarnoud
Copy link
Contributor

I think bond type 6 (HARMONIC) and constraints type 2 (CONSTRNC) are still included in creating bonds:

I have been working on the TPR parser these last few days and remember why I ended up including all the bonds as bond. By excluding some bonds or constraints that are defined in the TPR from the bond we read, we make it impossible to access these bonds. Since we cannot know why the bonds are there, and what the user wants to analyze, it is safer to read everything. One use case I encountered is the analysis of an elastic network. Such a network can be defined with bonds of type 6 which do not cause an exclusion. If we follow the logic we discussed above, they become impossible to access from MDAnalysis, making their analyses much harder.

The solution would be to store what type of bond each bond is and to expose a way to filter them. Until then, it is better to read everything and not assume the intent of the user.

"""
Skip lines in if condition, *inclusive* of the #endif or #else.
"""
line = next(self.itpfile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a fun example of where the walrus operator is actually useful (if we could jump straight to 3.8)

@lilyminium lilyminium mentioned this pull request Jan 13, 2020
6 tasks
continue
if line.startswith('#'): #ignore include, ifdefs, etc

if line.startswith('#ifdef'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe being pedantic, are nested ifdefs allowed in itp files? If so I think maybe this approach won't work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am almost sure they are allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case you need some sort of if stack thing like it looks like was planned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did something similar to @jbarnoud's _skip_lines_until_endif() thing, although without the IOError catching. There's a test case for nested ifs.

@richardjgowers richardjgowers self-assigned this Jan 14, 2020
@jbarnoud
Copy link
Contributor

I wrote this ITP parser not long ago. It may help with inspiration: https://github.com/jbarnoud/omm-top

@lilyminium
Copy link
Member Author

Thanks @jbarnoud! Today I learned about yield from :-)

Should ITPParser parse #include and #define? I don't think many molecule files have them.

@jbarnoud
Copy link
Contributor

yield from is python 3 only, but it should soon not be a problem 😉

'#include is crucial. A top file will use it to include the force field and the molecule description.

#define is less crucial, but should ultimately be there to be full-featured and avoid surprises to the user.

@lilyminium
Copy link
Member Author

'#include is crucial. A top file will use it to include the force field and the molecule description.

Yes... I can't think of a time I've seen it in an ITP file. I suppose it's not too difficult to put in and make this a generic top/itp parser. I'll look at this tomorrow. And add testing for flawed files (e.g. missing #endifs, etc).

yield from is python 3 only, but it should soon not be a problem 😉

I look forward to being able to use f-strings everywhere someday.

@jbarnoud
Copy link
Contributor

There is no difference between a top and itp file. The difference in file extension is only a matter of convention to show which file are meant to be top level and which ones are meant to be included.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, @jbarnoud can you take a last check?

@jbarnoud
Copy link
Contributor

I'll have a look over the week-end.

@richardjgowers richardjgowers merged commit 3be5c98 into MDAnalysis:develop Jan 27, 2020
@lilyminium lilyminium deleted the fix-itp branch April 14, 2020 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants