Skip to content

Implemented extraction of chains from DMS formats#2872

Merged
orbeckst merged 13 commits intoMDAnalysis:developfrom
ianmkenney:fix/dms-chains
Jul 29, 2020
Merged

Implemented extraction of chains from DMS formats#2872
orbeckst merged 13 commits intoMDAnalysis:developfrom
ianmkenney:fix/dms-chains

Conversation

@ianmkenney
Copy link
Member

@ianmkenney ianmkenney commented Jul 26, 2020

Used PDBParser as a reference

Addresses #1387

Changes made in this Pull Request:

  • Repeated residues are not squashed
  • Multiple segids are created when multiple chains are present

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 26, 2020

Hello @ianmkenney! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-28 23:01:55 UTC

@codecov
Copy link

codecov bot commented Jul 26, 2020

Codecov Report

Merging #2872 into develop will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2872      +/-   ##
===========================================
- Coverage    92.80%   92.78%   -0.02%     
===========================================
  Files          185      185              
  Lines        24205    24214       +9     
  Branches      3133     3137       +4     
===========================================
+ Hits         22463    22467       +4     
- Misses        1696     1701       +5     
  Partials        46       46              
Impacted Files Coverage Δ
package/MDAnalysis/topology/DMSParser.py 90.54% <100.00%> (+1.65%) ⬆️
coordinates/base.py 94.19% <0.00%> (-0.63%) ⬇️
util.py 89.42% <0.00%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2550d06...28dd7b7. Read the comment docs.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @ianmkenney looks good but needs a simple test

@ianmkenney
Copy link
Member Author

@richardjgowers I'm creating an altered ADK dms (adk_closed_domains.dms) from the already existing adk_closed.dms. Domain definitions are from:

Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. https://doi.org/10.1371/journal.pcbi.1004568

Setting the domains as follows from sqlite3:

UPDATE particle SET segid = "CORE";

UPDATE particle 
SET segid = "NMP" 
WHERE resid BETWEEN 30 AND 59;

UPDATE particle 
SET segid = "LID" 
WHERE resid BETWEEN 122 AND 159;

A check on the selections of those domains should be sufficient for tests, right?

["A", "B", "A"] -> ["A", "B"]
Derived from adk_closed.dms and edited in sqlite3

UPDATE particle SET segid = "CORE";

UPDATE particle
SET segid = "NMP"
WHERE resid BETWEEN 30 AND 59;

UPDATE particle
SET segid = "LID"
WHERE resid BETWEEN 122 AND 159;

where the domain definitions can be found in:

Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. https://doi.org/10.1371/journal.pcbi.1004568
Added new selections in test_dms.py and relabeled previous selections
@ianmkenney
Copy link
Member Author

@richardjgowers should be ready for a review

@orbeckst orbeckst added the Format-Desmond Desmond and Anton (DMS, STK) label Jul 28, 2020
@orbeckst orbeckst linked an issue Jul 28, 2020 that may be closed by this pull request
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments and could you please also look at the PEP8 issues?

EDIT: Please also update CHANGELOG with a Fix entry and include issue #1387 in the message.

vals = cur.fetchall()
except sqlite3.DatabaseError:
errmsg = "Failed reading the atoms from DMS Database"
raise IOError(errmsg) from None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for removing raise from None? See PR #2357 for rationale.

bonds = cur.fetchall()
except sqlite3.DatabaseError:
errmsg = "Failed reading the bonds from DMS Database"
raise IOError(errmsg) from None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for removing raise from None?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy once @orbeckst 's question is answered, thanks @ianmkenney !

@orbeckst orbeckst self-assigned this Jul 28, 2020
@orbeckst orbeckst added this to the 1.0.x milestone Jul 28, 2020
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have one potential performance comment (see inline).

I changed your CHANGELOG entry to be under 2.0.0.

@orbeckst orbeckst merged commit 2f16a16 into MDAnalysis:develop Jul 29, 2020
@orbeckst
Copy link
Member

Thanks @ianmkenney !

@orbeckst orbeckst mentioned this pull request Jul 29, 2020
4 tasks
cbouy pushed a commit to cbouy/mdanalysis that referenced this pull request Jul 31, 2020
* partially addresses MDAnalysis#1387 for the DMS parser
* Changes made in this Pull Request:
   * Repeated residues are not squashed
   * Multiple segids are created when multiple chains are present
   * Implemented extraction of chains from DMS formats
* Properly split repeating segids: ["A", "B", "A"] -> ["A", "B"]
* tests: New DMS file that includes multiple segids
   Derived from adk_closed.dms and edited in sqlite3
    ```sql
    UPDATE particle SET segid = "CORE";
    
    UPDATE particle
    SET segid = "NMP"
    WHERE resid BETWEEN 30 AND 59;
    
    UPDATE particle
    SET segid = "LID"
    WHERE resid BETWEEN 122 AND 159;
    ```
    where the domain definitions can be found in:

    Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying     
    Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. https://doi.org/10.1371/journal.pcbi.1004568
* tests: 
   - Modified atom selections in test_dms.py
   - added new selections in test_dms.py and relabeled previous selections
   - Added DMS with no segid
* Updated CHANGELOG
orbeckst pushed a commit that referenced this pull request Aug 12, 2020
* partially addresses #1387 for the DMS parser
* Changes made in this Pull Request:
   * Repeated residues are not squashed
   * Multiple segids are created when multiple chains are present
   * Implemented extraction of chains from DMS formats
* Properly split repeating segids: ["A", "B", "A"] -> ["A", "B"]
* tests: New DMS file that includes multiple segids
   Derived from adk_closed.dms and edited in sqlite3
    ```sql
    UPDATE particle SET segid = "CORE";

    UPDATE particle
    SET segid = "NMP"
    WHERE resid BETWEEN 30 AND 59;

    UPDATE particle
    SET segid = "LID"
    WHERE resid BETWEEN 122 AND 159;
    ```
    where the domain definitions can be found in:

    Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying
    Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. https://doi.org/10.1371/journal.pcbi.1004568
* tests:
   - Modified atom selections in test_dms.py
   - added new selections in test_dms.py and relabeled previous selections
   - Added DMS with no segid
* Updated CHANGELOG

(cherry picked from commit 2f16a16)
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* partially addresses MDAnalysis#1387 for the DMS parser
* Changes made in this Pull Request:
   * Repeated residues are not squashed
   * Multiple segids are created when multiple chains are present
   * Implemented extraction of chains from DMS formats
* Properly split repeating segids: ["A", "B", "A"] -> ["A", "B"]
* tests: New DMS file that includes multiple segids
   Derived from adk_closed.dms and edited in sqlite3
    ```sql
    UPDATE particle SET segid = "CORE";
    
    UPDATE particle
    SET segid = "NMP"
    WHERE resid BETWEEN 30 AND 59;
    
    UPDATE particle
    SET segid = "LID"
    WHERE resid BETWEEN 122 AND 159;
    ```
    where the domain definitions can be found in:

    Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying     
    Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. https://doi.org/10.1371/journal.pcbi.1004568
* tests: 
   - Modified atom selections in test_dms.py
   - added new selections in test_dms.py and relabeled previous selections
   - Added DMS with no segid
* Updated CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants