Skip to content

Speedup PP load by caching rules action results on relevant parts of the field.#589

Closed
pp-mo wants to merge 5 commits intoSciTools:masterfrom
pp-mo:pp_speedup_memorules
Closed

Speedup PP load by caching rules action results on relevant parts of the field.#589
pp-mo wants to merge 5 commits intoSciTools:masterfrom
pp-mo:pp_speedup_memorules

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Jul 1, 2013

Captures field accesses on rules action execution (to work out which bits of the field the rule result depends on).
Then caches the relevant result for later re-use, keyed on the field attributes referred to.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 1, 2013

Context:

Analysis of building cubes with rules gave roughly these proportions of time percentage for a simple iris.load_raw call....
15.5 % : file loading + PPField construction
15.6% : rules evaluation (= condition testing)
37.7% : rule actions execution (= building cube components like coords + attributes)
31.2% : rules result processing (= adding cube components into the cube)

This PR addresses the third of these.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 1, 2013

Implementation Notes:

  • For sufficient efficiency gains, it was necessary to also accelerate deepcopy operations for a number of object classes used in rules results. This part could very possibly be bypassed in some cases, with significant performance benefit (see below)
  • the PPField classes have a number of changes
    • support for the 'attribute access logging'
    • changes to ensure that all 'derived attributes' (i.e. properties) make visible which 'basic' elements they are using (i.e. the ones in pp field itself)
    • changes to ensure that version-2 and version-3 headers behave compatibly for purposes of access logging.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 1, 2013

Test results:

Testing with a standard test-data file...
Test code (in /net/home/h05/itpp/Iris/odds/pp_speedtest_2.sh) :

 #!/usr/bin/env python2.7
    from datetime import datetime as dt
    import shutil
    import tempfile
    import os.path

    import iris
    import iris.tests

    source_filepath = iris.tests.get_data_path(('PP','COLPEX','theta_and_orog_subset.pp'))
    with tempfile.NamedTemporaryFile(suffix='.pp', delete=False) as f:
      temp_local_filename = f.name
    shutil.copyfile(source_filepath, temp_local_filename)

    try:
      for i_try in range(4):
        t1=dt.now()
        iris.load_raw(temp_local_filename)
        t2=dt.now()
        print 'LOAD TOOK:',(t2-t1).total_seconds()
    finally:
      os.remove(temp_local_filename)

Results:

    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > /home/h05/itpp/Iris/odds/pp_speedtest_2.sh              
    LOAD TOOK: 2.39284
    LOAD TOOK: 2.391867
    LOAD TOOK: 2.356091
    LOAD TOOK: 2.356643
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > git checkout pp_speedup_memorules         
    Switched to branch 'pp_speedup_memorules'
    Purging pyc files and empty directories...
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > /home/h05/itpp/Iris/odds/pp_speedtest_2.sh
    LOAD TOOK: 2.103705
    LOAD TOOK: 2.076695
    LOAD TOOK: 2.094704
    LOAD TOOK: 2.068886
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > 

    >>> t_old = np.mean([
    ... 2.39284,
    ... 2.391867,
    ... 2.356091,
    ... 2.356643])
    >>> 
    >>> t_new = np.mean([
    ... 2.103705,
    ... 2.076695,
    ... 2.094704,
    ... 2.068886])
    >>> 
    >>> 100.0*(t_old-t_new)/t_old
    12.144860915692959

Speedup on bigger file was larger, up to ~15%.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 1, 2013

Future possibility:
The need to copy the cached results makes this much less efficient than it might be, because the copy operations are still quite slow.
There should be a real possibility of future improvement by just returning the original result for a cube element (i.e. not a copy). That won't work in general, because cubes should not share components, but in many cases this might be sorted out in a "merge" -- e.g., trivially, if result is a single cube for iris.load_cube.
It may also be possible in many cases to have merge use "is" instead of "==" for some comparisons, which could have considerable additional speed benefits.

@cpelley
Copy link

cpelley commented Jul 11, 2013

For sufficient efficiency gains, it was necessary to also accelerate deepcopy operations for a number of object classes used in rules results.

Can I ask for further clarification of the changes made, why is specifying your own shallow and or deep copy operations proving beneficial in your specific case? To be more specific, GeogCS(CoordSystem) I have been looking at. It would be good to capture this reasoning on this ticket in any case for future enhancements or code interpretation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test for each object with shallow/deep copy overrides to ensure that any changes to these objects at present or in future do not result in a malformed 'copy' which may go possibly unnoticed (copying by reference or not when unexpected). An objectA.attribute is not objectB.attribute test for each object attribute would do it I should think for the deep copy overrides.

@rhattersley
Copy link
Member

@pp-mo - I've posted a question on the iris-dev discussion group to see how this PR might fit into the bigger picture.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldnt bother putting this into a variable

@cpelley
Copy link

cpelley commented Jul 15, 2013

This PR requires #614 to be merged first since unit testing which are to be written as a result of this PR, would likely be affected

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coord_system not immutable! your passing by reference here
cannot overemphasize the need for a unittest as described above, for checking that all object components are not references to the original if your overriding deepcopy operations. The others work because they have getter-setter properties.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

points = self.points
bounds = self.bounds
see comment below for reasoning

@cpelley
Copy link

cpelley commented Jul 15, 2013

There are a number of issues identified which require to be addressed before I continue this review. Handing back over to you to address them @pp-mo

@rhattersley
Copy link
Member

It'll be interesting to see what the impact is on performance of the fixes.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

re: "Copy the coord_system in Coord deepcopy." (pp-mo@53613dc)

@cpelley coord_system not immutable! ... your coordinate deepcopy is currently broken

@rhattersley It'll be interesting to see what the impact is on performance of the fixes.

Luckily, it only makes a small difference, presumably as copying coordinate systems was already optimised.
In my test, load-time speedup was reduced from about 17% to 15%.
Presumably, loading rotated-pole data will suffer more, as RotatedGeogCS is slower to copy.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

@cpelley To be honest, why not just modify the from_coord static method so that it can bypass the points and bounds getters and setters, then call this from your deepcopy:

In offline conversation with @esc24 he expressed some doubts about this way : His view is that 'from_coord' is really just a (somewhat dodgy) way of converting coordinate types (Aux <-> Dim). It probably shouldn't be used for anything else much right now, as we might even want to get rid of it in future.

Note: if we do do this, I think we can then implement it at the abstract Coord level. In which case we should also declare from_coord there as an abstract method. The messy almost-identical implementations of from_coord in Dim/Aux would still be needed, but we could at least have just one deepcopy method.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

@cpelley coord_system not immutable! ... your coordinate deepcopy is currently broken

Just for the complete record.. I speculated that we could consider a CoordSystem 'effectively' immutable, if we disallowed changes to the basic properties (things like 'semi_major_axis', 'longitude_of_prime_meridian' or 'ellipsoid').
We could do that by adding setter methods or a setattr (though it can still be broken in an inheritor).
GeogCS, at least, would then be virtually like a NamedTuple implementation.
However @esc24 said he himself is already using code that modifies these things on an existing object, which pretty much rules it out.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

"offline conversation with @esc24" ...
( see https://github.com/SciTools/iris/pull/589/files#r5215959 )

More... What we are doing here with deepcopy is not a true deepcopy, but relies on design-contract type details.
Basically, it assumes it knows which attributes can be used as constructor arguments to construct an 'identical' object, and which of these won't need copying because they should always be simple immutable values (or "effectively" so).

For instance, even the simplest GeogCS.deepcopy is assuming ...

  • there are only 4 relevant attributes of the object, whose names we know (and are themselves immutable!)
  • the attribute values are all immutable (in this case, floating point numbers, most likely)

But of course, it is always still possible to attach arbitrary extra attributes (which then won't get copied at all), or define new ones in an inheriting subclass, or replace a 'number' with something else which then might not be immutable (e.g. a numpy scalar??) .

One very particular problem with this is that, once you provide a __deepcopy__ method, it is like a method override, and there is no way to revert to the 'normal' copy.deepcopy operation.

We should probably discuss this with @rhattersley. I can think of at least 3 possible ways forward...

  • don't worry about it, just accept that we understand and specify the implementation + valid usage of these things, so we know it is ok.
  • use full deepcopy for everything, or at least make a deepcopy of all constructor args.
  • implement the current operation, but call it something different from deepcopy. For the objects we're interested in, this could probably just be 'copy' : That would be consistent with the way we currently use Iris components like Coord and CoordSystem, where we avoid using any references.
    (N.B. I'm not sure what work may be required to ensure this gets called in all the appropriate places).

For reference, I added a full deepcopy of all constructor arguments to all the deepcopy overrides provided in this branch, and tested for speed. The result almost exactly cancels out, to give no net speed advantage over un-memoised code.
( and N.B. this is still not a full generic deepcopy, as it is still assuming the list of relevant constructor arguments )

@esc24
Copy link
Member

esc24 commented Jul 16, 2013

Good summary @pp-mo. Thanks.

@cpelley
Copy link

cpelley commented Jul 16, 2013

With regards to the from_coord method, it is currently there and its not deprecated either with no intention of its purpose amongst the userguide or docstring, also its not yet in a state of deprecation. I dont see how it can be acceptable to pretty much copy-paste the exact same functionality of from_coord to the __deepcopy__ special method.

I think, move the functionality from from_coord into __deepcopy__, possibly give from_coord a deprecation warning decorator if its going and point it to __deepcopy__ for now, until we actually remove it.

@cpelley
Copy link

cpelley commented Jul 16, 2013

On the above comment of effectively locking down the coord_system to being more or less immutable, I'm with @esc24 on this. We would have to ensure that this wouldnt impact a significant number of users who already make use of this functionality for this route to take place. I think this would effectively put this PR on hold and require a google group discussion if this route were to be taken.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

@cpelley On the above comment of effectively locking down the coord_system to being more or less immutable

I'm not proposing it. I just wanted to record why, on reflection, it is not an option.

@cpelley
Copy link

cpelley commented Jul 16, 2013

I think that if using overrides to the deepcopy operation, I would only be happy to do so for possibly private classes, which wont help much here, like you said, a user could very well associate additional information to these objects, its Python after all. Your third option sounds like a possibility.
Id like to know what other people think.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 16, 2013

Addressed most comments, but not all.
Awaiting discussion of deepcopy usage.
If/when this is resolved, still need to address outstanding problems:

  • whether Coord deepcopy speedup code should be moved to 'from_coord'
  • general ugliness of faking lbsec(d)/day(d) to make pp v2 and v3 fields appear similar in access logging

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...also, since these variables only have relevance to the ppfield, why not define them within there? (again just a thought, might make it neater since the structure then describes where these things tie together). This way, also we can refer to these objects from the subclasses of PPField:

class PPField(object):
    _NOCACHE_ELEMENT_NAMES = blabla
    _CACHABLE_ELEMENT_NAMES = blabla
    _V2_V3_HEADER_VARIANT_NAMES = blabla

    def __init__(self):
        .................................

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @cpelley. That looks much neater.

@pp-mo
Copy link
Member Author

pp-mo commented Jul 17, 2013

Fix copy problem?

Something odd may be happening with Travis. These errors are not happening on the desktop.

@cpelley
Copy link

cpelley commented Jul 31, 2013

A number of change have been made by @rhattersley which may remove some of the benefits of this PR.
Might I recommend that you rebase and evaluate the benefits of this PR. This would then add weight to your PR if it remains to offer significant benefit.

@rhattersley
Copy link
Member

Might I recommend that you rebase and evaluate the benefits of this PR

That's no simple task you're suggesting! The rules system has changed considerably since this PR was raised (and in large part, thanks to this very PR), so an equally considerable modification would be required.

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there are improvements made in this PR which have yet to be extracted elsewhere.

@rhattersley
Copy link
Member

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there are improvements made in this PR which have yet to be extracted elsewhere.

I'd still like to get your opinion - but in the meantime I'm closing this PR.

@rhattersley rhattersley closed this Aug 6, 2013
@pp-mo
Copy link
Member Author

pp-mo commented Aug 6, 2013

re: @rhattersley

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there > are improvements made in this PR which have yet to be extracted elsewhere
I'd still like to get your opinion - but in the meantime I'm closing this PR.

I don't think the subsequent PRs really capture the original intent here.
The key idea was to address the obvious fact that, for raw cube construction over a typical large number of input fields, the effort of making cube elements such as coordinates and cube attributes is all repeated many, many times in producing duplicate identical results. So that sounds like we should be able to reduce it by a large factor.

The PRs referred to claim to save about 3% and 10% of a load_cubes.
The initial work on this PR, simply using multiple references to the same elements, yielded about 30% improvement on load_raw. That isn't directly comparable, but will probably relate to about 15% overall -- i.e. when cube merge is included.
Unfortunately, the original promise of this PR was seriously degraded (30% --> 15% of load_raw timing) by the need to decouple all the duplicate information (i.e. to "deepcopy" all the cached results).

However, even before that degradation, I always felt that a much better win should be possible if result duplication could also be used to speed up the merge process.
The logic of this would be ...

  • rules operations may return cubes with duplicated (aka "cross-linked") elements
  • the merge operation must then "unlink" these where necessary -- but that can be many times less work, if lots of raw cubes produce only a few merged ones (i.e. the usual case with large file loads)
  • merge should also be able to save a lot of the time spent in equality testing between cube elements, if this is shortcut when the objects are actually equal
    • e.g. (trivially) cube1.coord('x') is cube2.coord('x') ==> cube1.coord('x') == cube2.coord('x')

So I still think this approach could have very significant benefits, additional to what has already been achieved. But it seems that can only really work if you tackle result construction and merge together. Unfortunately, that wasn't clear when this began (we thought copying would be cheap !!). And, as 'build' and 'merge' were explicitly separated in this investigation, we still don't have even an estimate of what might be achieved that way.

@pp-mo pp-mo deleted the pp_speedup_memorules branch March 14, 2016 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants