Skip to content

The format of timerseries for hydrogen bond and water bridge analysis #2177

@xiki-tempula

Description

@xiki-tempula

When one is interested in understanding the hydrogen bond from a hydrogen bond donor to a hydrogen bond acceptor, one might want to know the name of the heavy atom corresponding to the hydrogen bond donor. The current practice implemented in hydrogen bond analysis is that when a hydrogen bond is detected only the donor and acceptor were recorded and the heavy atom corresponding the donor is added later, which cause problems such as #1687.

@orbeckst has suggested that a workaround will be incorporating the heavy atom information directly into the timeseries object so as to save the trouble afterword. I'm also in favour of this option. I have a PR open which adds multi-order water bridge support to the existing water bridge analysis which will, rather unfortunately, break the existing time series attribute of the water bridge analysis.

So the question is how should we reformulate the time series attribute. Currently, the format has this form.

[<atom1 index>, <atom2 index>, (atom1 resname, atom1 resid, atom1 name), (atom2 resname, atom2 resid, atom2 name), <distance>, <angle>]

In hydrogen bond analysis, atom 1 is the donor and atom 2 is the receptor. In water bridge analysis, since two donors can be joined by a water molecule. The atom1 is the atoms more closely linked to selection 1 and atom2 is the atom more closely linked to selection 2. So if hydrogen bond acceptor α from selection 1 and hydrogen bond donor atom β from selection B is joined by atom γ and δ from a water molecule. The time series will have this form.

[<α index>, <γ index>, (α resname, α resid, α name), (γ resname, γ resid, γ name), <distance>, <angle>],
[<δ index>, <β index>, (δ resname, δ resid, δ name), (β resname, β resid, β name), <distance>, <angle>],

My suggestion of how we incorporate heavy atom information will be adding the heavy atom name to the last column of the tuple. So that the time series will have the form:

[<α index>, <γ index>, (α resname, α resid, α name), (γ resname, γ resid, γ name, **γ heavy atom name**), <distance>, <angle>],
[<δ index>, <β index>, (δ resname, δ resid, δ name), (β resname, β resid, β name, **β heavy atom name**), <distance>, <angle>],

For hydrogen-bond analysis, instead of taking the form of where we have donor precede acceptor.

[<donor index>, <acceptor index>, (donor resname, donor resid, donor name), (acceptor resname, acceptor resid, acceptor name), <distance>, <angle>]

We have atoms from selection 1 precede the selection 2.

[<sele1 index>, <sele2 index>, (sele1 resname, sele1 resid, sele1 name), (sele2 resname, sele2 resid, sele2 name), <distance>, <angle>]

The heavy atom name is added to the atom which is hydrogen bond donor. If sele1 is hydrogen bond donor, we have:

[<sele1 index>, <sele2 index>, (sele1 resname, sele1 resid, sele1 name, **sele1 heavy atom name**), (sele2 resname, sele2 resid, sele2 name), <distance>, <angle>]

If sele2 is hydrogen bond donor, we have:

[<sele1 index>, <sele2 index>, (sele1 resname, sele1 resid, sele1 name), (sele2 resname, sele2 resid, sele2 name, **sele2 heavy atom name**), <distance>, <angle>]

Thus, we have a consistent format between hydrogen bond analysis and water bridge analysis and the two functionality can be merged together. The donor and acceptor can be relatively easy distinguished by the length of the third and fourth column as the donor atom has a length of four (atom resname, atom resid, atom name, atom heavy atom name) and the acceptor has a length of three (atom resname, atom resid, atom name).

I'm welcoming suggestions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions