Skip to content

Swap call to repo to use enhanced repocache#240

Merged
jonc125 merged 30 commits intomasterfrom
191-commit-cache
Feb 4, 2020
Merged

Swap call to repo to use enhanced repocache#240
jonc125 merged 30 commits intomasterfrom
191-commit-cache

Conversation

@sroderick-g5sro
Copy link
Copy Markdown
Contributor

@sroderick-g5sro sroderick-g5sro commented Jan 20, 2020

Fixes #191

Changes to use enhanced repocache were possible.

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

I have swapped obvious calls from repo to repocache. To proceed further, I am going to need to cache the filenames and filesizes. @jonc125 is this your understanding? and shall I proceed to do that. I guess I will need new entities for these.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 20, 2020

Codecov Report

Merging #240 into master will increase coverage by <.1%.
The diff coverage is 93.7%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #240     +/-   ##
========================================
+ Coverage    94.6%   94.7%   +<.1%     
========================================
  Files          65      65             
  Lines        2638    2656     +18     
  Branches      276     278      +2     
========================================
+ Hits         2498    2516     +18     
- Misses        101     102      +1     
+ Partials       39      38      -1
Impacted Files Coverage Δ
weblab/experiments/views.py 98.3% <ø> (ø) ⬆️
weblab/entities/repository.py 97.4% <ø> (+0.5%) ⬆️
weblab/repocache/models.py 99% <100%> (+2.1%) ⬆️
weblab/entities/templatetags/entities.py 100% <100%> (ø) ⬆️
weblab/entities/models.py 94.7% <100%> (ø) ⬆️
...cache/management/commands/populate_entity_cache.py 85.7% <50%> (-6.6%) ⬇️
weblab/entities/views.py 93.7% <91.3%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 165aa85...385c6f1. Read the comment docs.

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 20, 2020

I'll try to take a look at this first thing tomorrow. Personally I don't think the gain from caching filenames is sufficient to be worth it at this stage.

Copy link
Copy Markdown
Contributor

@jonc125 jonc125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for the bits it touches, but there's more we can do just with the repocache additions we have so far. Some examples in entities/views.py, there may be more:

  • EntityVersionMixin.get_commit: in most (but not quite all) places where this is called, a CachedEntityVersion instance would suffice. So perhaps add a new get_version method that returns the cache object, and review calls to see which can change over.
  • Similarly EntityVersionMixin.get_context_data can use the cache object for the version field most (hopefully all!) of the time. This is a little trickier to test, as templates don't give you an error on failed lookups by default. So might need to use something like https://stackoverflow.com/a/7854404 to help catch instances! I suspect author lookups and numfiles should be the only ones that differ between Commit and CachedEntityVersion so there shouldn't be many issues.
  • EntityVersionListView and EntityRunExperimentView hopefully don't need to use repo.get_commit

Comment thread weblab/repocache/models.py Outdated
@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

I'm just testing changes to EntityVersionMixin. I will commit these shortly.

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

sroderick-g5sro commented Jan 27, 2020

I hope this is now ready for re-review. In some of the tests I decided to stick with add_version and the lookup the cached version later. An alterative approach I tried was to switch to add_fake_version. But this causes problems with add_tag. I tried to do an add_version_tag but the repo option has a tags or tag property that the repocache doesn't, this property is required by one of the mixins. Hopefully my approach is acceptable?

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

I have had several attempts (see check ins) to add the string_if_invalid option to test.py but it has failed referencing BASE_DIR. I am not sure if there is an advantage of adding this in test. Possible for test systems such as Travis

Comment thread weblab/config/settings/dev.py Outdated
Copy link
Copy Markdown
Contributor

@jonc125 jonc125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking promising, though I've not tried running it yet, just added some comments from reading through the diff.

Comment thread weblab/entities/models.py Outdated
Comment thread weblab/entities/views.py
Comment thread weblab/entities/views.py Outdated
Comment thread weblab/entities/views.py Outdated
Comment thread weblab/entities/views.py Outdated
Comment thread weblab/entities/views.py Outdated
Comment thread weblab/config/settings/test.py Outdated
@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

Some tests are failing now with missing filenames so looks like I have been over zealous with the swapping - on it now,.

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 28, 2020

Rather than back out most of the changes as you've done in f5c08eb, you can change the references in templates. The error looks like "Undefined template variable 'commit.filenames' in 'entities/entity_versions.html'". We didn't cache filenames, but we did I think cache numfiles, and most occurrences of filenames in templates pipe it through len, so the number of files is what they actually need, not the names. In other words, change commit.filenames|length to commit.numfiles in templates and we should be OK I think.

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

ah yes.. sorry I must having been having a few funny mins. makes sense I will do that now.

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

sroderick-g5sro commented Jan 28, 2020

Made that change... I am now somewhat concern that I can't see my files in say the following screen...
image
I can see that we are using Javascript to display these. I will investigate if that code requires something that we are not passing and might be excluded from the fail-on-template-vars test

... In fact I can see this comes from the datasets functionality not the views. so should be ok, don't know why they are mising on my local machine

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 28, 2020

I'll run it here in a bit and check whether I see the same thing.

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

thanks it works ok on linux. I'm just seeing if I can spot anything obvious on windows

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 28, 2020

I had to run ./manage.py populate_entity_cache on my local dev setup to fill in the new cache entries (hence the little hack in b12368d!) but everything seems to be displaying correctly. I can also tell that views like run experiments load much faster, which is good!

It might just be that your browser needs to re-fetch the latest JS code into its cache?

Spotted one task remaining: _url_friendly_label in entities/templatetags/entities.py still uses the repo where it could use the cache. After that I think #191 is done!

@sroderick-g5sro
Copy link
Copy Markdown
Contributor Author

sroderick-g5sro commented Jan 29, 2020

I've had a look at replacing the use of tag_dict as suggested. I have come up with the following:

def _url_friendly_label(entity, commit):
    """
    Get URL-friendly version label for a commit

    :param entity: Entity the commit belongs to
    :param commit: `git.Commit` object
    """
    tag = entity.cachedentity.versions.get(sha=commit.sha).tags.last()
    if tag is None or tag.tag in ['new', 'latest']:
        return commit.sha
    return tag.tag

unfortunately. It seems that sometimes the entity passed is a model but the commit was for a protocol. I have gone back to master and this was still the case. It didn't matter because repo.tag_dict presumably has both type of commits.

I cant, at the moment. understand why the passed entity type is wrong. One approach I tried was to query all cached objects with CACHED_VERSION_TYPE_MAP but that seems overkill.

I guess the other option is to maintain our own repocache.tag_dict

I wondered if you had any suggestions please? In the meantime I will try and understand why models are being passed in by the templates for a protocol commit.

......

I may have been overthinking this, I am now looking at just getting tags from the commit object.

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 29, 2020

That's indeed the replacement code I'd expect. The commit not belonging to the entity sounds like a bug! In such circumstances the original code will just return the commit.sha - the entity.repo.tag_dict.get(commit.sha, ['/']) bit returns the supplied default value ['/'], rather than raising an error as the repocache version will do, so the bug has been hidden.

Are there particular tests that hit this bug? Or just in general browsing?

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 29, 2020

Possibly the culprit is

diff --git i/weblab/templates/entities/entity_runexperiments.html w/weblab/templates/entities/entity_runexperiments.html
index 223ae0d..f1f5505 100644
--- i/weblab/templates/entities/entity_runexperiments.html
+++ w/weblab/templates/entities/entity_runexperiments.html
@@ -46,7 +46,7 @@
                        name="model_protocol_list[]"/>
               {% endif %}
               <strong>
-                <a class="entityversionlink" href="{% entity_version_url 'version' entity entity_version.commit %}">
+                <a class="entityversionlink" href="{% entity_version_url 'version' entity_object entity_version.commit %}">
                   {% include "./includes/version_name.html" with tags=entity_version.tags version=entity_version.commit only %}
                 </a>
               </strong>
@@ -84,7 +84,7 @@
                        name="model_protocol_list[]"/>
               {% endif %}
               <strong>
-                <a class="entityversionlink" href="{% entity_version_url 'version' entity entity_version.commit %}">
+                <a class="entityversionlink" href="{% entity_version_url 'version' entity_object entity_version.commit %}">
                   {% include "./includes/version_name.html" with tags=entity_version.tags version=entity_version.commit only %}
                 </a>
               </strong>

In which case, mea culpa!

@@ -134,10 +134,14 @@ def _url_friendly_label(entity, commit):
:param entity: Entity the commit belongs to
:param commit: `git.Commit` object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring needs updating: we now need a CachedEntityVersion object here.

@jonc125
Copy link
Copy Markdown
Contributor

jonc125 commented Jan 30, 2020

I've done a few experiments to see how much this PR improves the open file handles issue. I ran the entities test_views using the following patch to measure how many files are left open by the end. It only works on Mac or Linux though.

diff --git i/weblab/entities/tests/test_views.py w/weblab/entities/tests/test_views.py
index 8eefff7..c9ca969 100644
--- i/weblab/entities/tests/test_views.py
+++ w/weblab/entities/tests/test_views.py
@@ -23,6 +23,25 @@ from repocache.models import ProtocolInterface
 from repocache.populate import populate_entity_cache


+def get_open_fds():
+    '''
+    return the number of open file descriptors for current process
+
+    .. warning: will only work on UNIX-like os-es.
+    '''
+    import subprocess
+    import os
+
+    pid = os.getpid()
+    procs = subprocess.check_output(
+        ["lsof", '-w', '-Ff', "-p", str(pid)]
+    ).splitlines()
+    print(procs)
+
+    nprocs = len([s for s in procs if s and s.decode()[0] == 'f' and s.decode()[1:].isdigit()])
+    return nprocs
+
+
 @pytest.fixture
 def analysis_task(protocol_with_version):
     """A single AnalysisTask instance with associated Protocol version & repocache set up."""
@@ -2668,3 +2687,8 @@ class TestEntityRunExperiment:
             assert planned_experiment.protocol == protocol
             assert planned_experiment.protocol_version == proto_commit1.sha
             assert (planned_experiment.model, planned_experiment.model_version) in expected_model_versions
+
+
+def test_zzz_open_fds():
+    num_open_files = get_open_fds()
+    assert num_open_files < 3
  • On master there were 81. With the change in 7e91e7c this went down to 57.
  • On this branch there were 57. With 7e91e7c it went down to 39. With 6ce08ec it went down to 27.

So looking promising!

Copy link
Copy Markdown
Contributor

@jonc125 jonc125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now ready to merge!

@jonc125 jonc125 merged commit 3a697f0 into master Feb 4, 2020
@jonc125 jonc125 deleted the 191-commit-cache branch February 4, 2020 12:16
MauriceHendrix pushed a commit that referenced this pull request Jul 22, 2021
Swap call to repo to use enhanced repocache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unnecessary use of 'Commit' objects in views

3 participants