Skip to content

Make the TPR parser a bit faster#2804

Merged
richardjgowers merged 3 commits intodevelopfrom
slightly-faster-tpr
Jul 2, 2020
Merged

Make the TPR parser a bit faster#2804
richardjgowers merged 3 commits intodevelopfrom
slightly-faster-tpr

Conversation

@jbarnoud
Copy link
Contributor

@jbarnoud jbarnoud commented Jun 29, 2020

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

Changes made in this Pull Request:

PR Checklist

- [ ] Tests?
- [ ] Docs?

  • CHANGELOG updated?
  • Issue raised/referenced?

jbarnoud added 2 commits June 29, 2020 17:05
A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbarnoud looks solid, thanks!

@codecov
Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #2804 into develop will decrease coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2804      +/-   ##
===========================================
- Coverage    92.22%   92.08%   -0.14%     
===========================================
  Files          184      183       -1     
  Lines        24141    23670     -471     
  Branches      3123     3083      -40     
===========================================
- Hits         22263    21797     -466     
+ Misses        1813     1808       -5     
  Partials        65       65              
Impacted Files Coverage Δ
package/MDAnalysis/topology/tpr/obj.py 96.72% <100.00%> (-0.06%) ⬇️
package/MDAnalysis/auxiliary/base.py 90.75% <0.00%> (-0.57%) ⬇️
package/MDAnalysis/coordinates/chain.py 91.94% <0.00%> (-0.37%) ⬇️
package/MDAnalysis/coordinates/base.py 93.88% <0.00%> (-0.33%) ⬇️
package/MDAnalysis/coordinates/TRZ.py 88.16% <0.00%> (-0.27%) ⬇️
package/MDAnalysis/coordinates/GSD.py 88.63% <0.00%> (-0.26%) ⬇️
package/MDAnalysis/coordinates/chemfiles.py 88.12% <0.00%> (-0.22%) ⬇️
package/MDAnalysis/coordinates/INPCRD.py 93.33% <0.00%> (-0.22%) ⬇️
package/MDAnalysis/coordinates/GMS.py 92.30% <0.00%> (-0.16%) ⬇️
package/MDAnalysis/coordinates/TXYZ.py 93.18% <0.00%> (-0.16%) ⬇️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8314f7d...2fc42b3. Read the comment docs.

@orbeckst
Copy link
Member

Could be backported to 1.0.1 via PR #2798.

@richardjgowers richardjgowers merged commit 61e236d into develop Jul 2, 2020
@richardjgowers richardjgowers deleted the slightly-faster-tpr branch July 2, 2020 17:16
orbeckst pushed a commit that referenced this pull request Jul 3, 2020
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for #2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
orbeckst pushed a commit that referenced this pull request Jul 12, 2020
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for #2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for MDAnalysis#2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants