-
Notifications
You must be signed in to change notification settings - Fork 823
Closed
Description
Expected behavior
- Code should only guess when authorized by the user (see Print warning when a property is guessed #2222 ).
- If property cannot be guessed (atom types, bonds, masses) then a proper error message or warning should alert the user and suggest how to enable guessing.
- Users should be able to override guesses.
- Guessers should use the best available approach.
Actual behavior
- Masses are always guessed, sometimes incorrectly (Print warning when a property is guessed #2222); users cannot opt-out of guessing
- Masses are always guessed from
atom_types,, which are either highly forcefield dependent and cannot be used with our simple guesser (failsdef guess_masses(atom_types): ) or we guess the atom type anyway from thedef validate_atom_types(atom_types): atomnameInstead it would make more sense for masses of atoms to first determine the element because this gives a unique mass. If we have to guess then the following might be better:def guess_atom_type(atomname): - guess masses from original atom_types if they exist
- guess any remaining undefined (0) masses from the atom names.
- (optional: also guess other masses from atom names and compare with masses guessed from types; warn if there are discrepancies)
- Bond-guessing is fragile: it may not work if atom_types are not defined and throw a
ValueError, see guess_bonds which guessing types #2143
Existing discussions
- Implement fancy guessers (e.g., force-field aware) #598 for "fancy guessers"
- How to use information in topology files #942 had a long discussion on the different required and optional topology attributes (ordered by categories), summarized in Wiki: Topology attribute handling
- print warnings when guessing (Print warning when a property is guessed #2222) – this is implemented for mass guessers
- bond guessing can fail guess_bonds which guessing types #2143
Code to reproduce the behavior
Example for bad mass guessing from atom_types
import MDAnalysis as mda
from MDAnalysis.tests import datafiles
import numpy as np
u = mda.Universe(datafiles.PDBQT_input)
not_guessed = u.atoms[u.atoms.masses == 0]
print("Missing masses: {}/{}".format(not_guessed.n_atoms, u.atoms.n_atoms))
print("Types without masses:", set(not_guessed.types))
# guessing by name finds all masses
masses_from_name = np.array([mda.topology.guessers.guess_atom_mass(name) for name in not_guessed.names])
print("Guessed masses from missing with name: {}/{}".format(sum(masses_from_name != 0), not_guessed.n_atoms))
# even guessing again from the types improves
masses_from_types_again = np.array([mda.topology.guessers.guess_atom_mass(atype) for atype in not_guessed.types])
print("Guessed masses (again) from missing with type: {}/{}".format(sum(masses_from_types_again != 0), not_guessed.n_atoms))
print("Types without masses:", set(not_guessed[masses_from_types_again == 0].types))gives
~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: A
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: HD
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/topology/guessers.py:80: UserWarning: Failed to guess the mass for the following atom types: OA
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
Missing masses: 726/1805
Types without masses: {'HD', 'OA', 'A'}
Guessed masses from missing with name: 726/726
Guessed masses (again) from missing with type: 580/726
Types without masses: {'A'}
I actually don't understand why on parsing the topology, the types HD, OA, and A cannot be used for guessing the mass but later HD and OA work and only A fails:
u = mda.Universe(datafiles.PDBQT_input)
ng = np.array([mda.topology.guessers.guess_atom_mass(atype) for atype in u.atoms.types]) == 0
print("No masses for types ", set(u.atoms[ng].types))
Again, warnings as above, but then No masses for types {'A'} only.
Currently version of MDAnalysis
0.20.1
Reactions are currently unavailable