Speedup PP load by caching rules action results on relevant parts of the field. by pp-mo · Pull Request #589 · SciTools/iris

pp-mo · 2013-07-01T19:27:37Z

Captures field accesses on rules action execution (to work out which bits of the field the rule result depends on).
Then caches the relevant result for later re-use, keyed on the field attributes referred to.

…the field.

pp-mo · 2013-07-01T19:36:34Z

Context:

Analysis of building cubes with rules gave roughly these proportions of time percentage for a simple iris.load_raw call....
15.5 % : file loading + PPField construction
15.6% : rules evaluation (= condition testing)
37.7% : rule actions execution (= building cube components like coords + attributes)
31.2% : rules result processing (= adding cube components into the cube)

This PR addresses the third of these.

pp-mo · 2013-07-01T19:39:26Z

Implementation Notes:

For sufficient efficiency gains, it was necessary to also accelerate deepcopy operations for a number of object classes used in rules results. This part could very possibly be bypassed in some cases, with significant performance benefit (see below)
the PPField classes have a number of changes
- support for the 'attribute access logging'
- changes to ensure that all 'derived attributes' (i.e. properties) make visible which 'basic' elements they are using (i.e. the ones in pp field itself)
- changes to ensure that version-2 and version-3 headers behave compatibly for purposes of access logging.

pp-mo · 2013-07-01T19:43:37Z

Test results:

Testing with a standard test-data file...
Test code (in /net/home/h05/itpp/Iris/odds/pp_speedtest_2.sh) :

 #!/usr/bin/env python2.7
    from datetime import datetime as dt
    import shutil
    import tempfile
    import os.path

    import iris
    import iris.tests

    source_filepath = iris.tests.get_data_path(('PP','COLPEX','theta_and_orog_subset.pp'))
    with tempfile.NamedTemporaryFile(suffix='.pp', delete=False) as f:
      temp_local_filename = f.name
    shutil.copyfile(source_filepath, temp_local_filename)

    try:
      for i_try in range(4):
        t1=dt.now()
        iris.load_raw(temp_local_filename)
        t2=dt.now()
        print 'LOAD TOOK:',(t2-t1).total_seconds()
    finally:
      os.remove(temp_local_filename)

Results:

    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > /home/h05/itpp/Iris/odds/pp_speedtest_2.sh              
    LOAD TOOK: 2.39284
    LOAD TOOK: 2.391867
    LOAD TOOK: 2.356091
    LOAD TOOK: 2.356643
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > git checkout pp_speedup_memorules         
    Switched to branch 'pp_speedup_memorules'
    Purging pyc files and empty directories...
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > /home/h05/itpp/Iris/odds/pp_speedtest_2.sh
    LOAD TOOK: 2.103705
    LOAD TOOK: 2.076695
    LOAD TOOK: 2.094704
    LOAD TOOK: 2.068886
    itpp@eld238: /net/home/h05/itpp/git/iris/iris_main > 

    >>> t_old = np.mean([
    ... 2.39284,
    ... 2.391867,
    ... 2.356091,
    ... 2.356643])
    >>> 
    >>> t_new = np.mean([
    ... 2.103705,
    ... 2.076695,
    ... 2.094704,
    ... 2.068886])
    >>> 
    >>> 100.0*(t_old-t_new)/t_old
    12.144860915692959

Speedup on bigger file was larger, up to ~15%.

pp-mo · 2013-07-01T19:55:18Z

Future possibility:
The need to copy the cached results makes this much less efficient than it might be, because the copy operations are still quite slow.
There should be a real possibility of future improvement by just returning the original result for a cube element (i.e. not a copy). That won't work in general, because cubes should not share components, but in many cases this might be sorted out in a "merge" -- e.g., trivially, if result is a single cube for iris.load_cube.
It may also be possible in many cases to have merge use "is" instead of "==" for some comparisons, which could have considerable additional speed benefits.

cpelley · 2013-07-11T16:06:17Z

For sufficient efficiency gains, it was necessary to also accelerate deepcopy operations for a number of object classes used in rules results.

Can I ask for further clarification of the changes made, why is specifying your own shallow and or deep copy operations proving beneficial in your specific case? To be more specific, GeogCS(CoordSystem) I have been looking at. It would be good to capture this reasoning on this ticket in any case for future enhancements or code interpretation.

cpelley · 2013-07-12T15:51:39Z

lib/iris/aux_factory.py

We need a test for each object with shallow/deep copy overrides to ensure that any changes to these objects at present or in future do not result in a malformed 'copy' which may go possibly unnoticed (copying by reference or not when unexpected). An objectA.attribute is not objectB.attribute test for each object attribute would do it I should think for the deep copy overrides.

rhattersley · 2013-07-15T08:32:43Z

@pp-mo - I've posted a question on the iris-dev discussion group to see how this PR might fit into the bigger picture.

cpelley · 2013-07-15T09:18:30Z

lib/iris/coord_systems.py

I wouldnt bother putting this into a variable

cpelley · 2013-07-15T15:38:23Z

This PR requires #614 to be merged first since unit testing which are to be written as a result of this PR, would likely be affected

cpelley · 2013-07-15T15:43:52Z

lib/iris/coords.py

coord_system not immutable! your passing by reference here
cannot overemphasize the need for a unittest as described above, for checking that all object components are not references to the original if your overriding deepcopy operations. The others work because they have getter-setter properties.

cpelley · 2013-07-15T15:55:51Z

lib/iris/coords.py

points = self.points
bounds = self.bounds
see comment below for reasoning

cpelley · 2013-07-15T16:21:01Z

There are a number of issues identified which require to be addressed before I continue this review. Handing back over to you to address them @pp-mo

rhattersley · 2013-07-16T08:34:52Z

It'll be interesting to see what the impact is on performance of the fixes.

pp-mo · 2013-07-16T12:08:45Z

re: "Copy the coord_system in Coord deepcopy." (pp-mo@53613dc)

@cpelley coord_system not immutable! ... your coordinate deepcopy is currently broken

@rhattersley It'll be interesting to see what the impact is on performance of the fixes.

Luckily, it only makes a small difference, presumably as copying coordinate systems was already optimised.
In my test, load-time speedup was reduced from about 17% to 15%.
Presumably, loading rotated-pole data will suffer more, as RotatedGeogCS is slower to copy.

pp-mo · 2013-07-16T13:31:50Z

@cpelley To be honest, why not just modify the from_coord static method so that it can bypass the points and bounds getters and setters, then call this from your deepcopy:

In offline conversation with @esc24 he expressed some doubts about this way : His view is that 'from_coord' is really just a (somewhat dodgy) way of converting coordinate types (Aux <-> Dim). It probably shouldn't be used for anything else much right now, as we might even want to get rid of it in future.

Note: if we do do this, I think we can then implement it at the abstract Coord level. In which case we should also declare from_coord there as an abstract method. The messy almost-identical implementations of from_coord in Dim/Aux would still be needed, but we could at least have just one deepcopy method.

pp-mo · 2013-07-16T13:59:50Z

@cpelley coord_system not immutable! ... your coordinate deepcopy is currently broken

Just for the complete record.. I speculated that we could consider a CoordSystem 'effectively' immutable, if we disallowed changes to the basic properties (things like 'semi_major_axis', 'longitude_of_prime_meridian' or 'ellipsoid').
We could do that by adding setter methods or a setattr (though it can still be broken in an inheritor).
GeogCS, at least, would then be virtually like a NamedTuple implementation.
However @esc24 said he himself is already using code that modifies these things on an existing object, which pretty much rules it out.

pp-mo · 2013-07-16T14:22:40Z

"offline conversation with @esc24" ...
( see https://github.com/SciTools/iris/pull/589/files#r5215959 )

More... What we are doing here with deepcopy is not a true deepcopy, but relies on design-contract type details.
Basically, it assumes it knows which attributes can be used as constructor arguments to construct an 'identical' object, and which of these won't need copying because they should always be simple immutable values (or "effectively" so).

For instance, even the simplest GeogCS.deepcopy is assuming ...

there are only 4 relevant attributes of the object, whose names we know (and are themselves immutable!)
the attribute values are all immutable (in this case, floating point numbers, most likely)

But of course, it is always still possible to attach arbitrary extra attributes (which then won't get copied at all), or define new ones in an inheriting subclass, or replace a 'number' with something else which then might not be immutable (e.g. a numpy scalar??) .

One very particular problem with this is that, once you provide a __deepcopy__ method, it is like a method override, and there is no way to revert to the 'normal' copy.deepcopy operation.

We should probably discuss this with @rhattersley. I can think of at least 3 possible ways forward...

don't worry about it, just accept that we understand and specify the implementation + valid usage of these things, so we know it is ok.
use full deepcopy for everything, or at least make a deepcopy of all constructor args.
implement the current operation, but call it something different from deepcopy. For the objects we're interested in, this could probably just be 'copy' : That would be consistent with the way we currently use Iris components like Coord and CoordSystem, where we avoid using any references.
(N.B. I'm not sure what work may be required to ensure this gets called in all the appropriate places).

For reference, I added a full deepcopy of all constructor arguments to all the deepcopy overrides provided in this branch, and tested for speed. The result almost exactly cancels out, to give no net speed advantage over un-memoised code.
( and N.B. this is still not a full generic deepcopy, as it is still assuming the list of relevant constructor arguments )

esc24 · 2013-07-16T14:48:46Z

Good summary @pp-mo. Thanks.

cpelley · 2013-07-16T15:13:42Z

With regards to the from_coord method, it is currently there and its not deprecated either with no intention of its purpose amongst the userguide or docstring, also its not yet in a state of deprecation. I dont see how it can be acceptable to pretty much copy-paste the exact same functionality of from_coord to the __deepcopy__ special method.

I think, move the functionality from from_coord into __deepcopy__, possibly give from_coord a deprecation warning decorator if its going and point it to __deepcopy__ for now, until we actually remove it.

cpelley · 2013-07-16T15:16:33Z

On the above comment of effectively locking down the coord_system to being more or less immutable, I'm with @esc24 on this. We would have to ensure that this wouldnt impact a significant number of users who already make use of this functionality for this route to take place. I think this would effectively put this PR on hold and require a google group discussion if this route were to be taken.

pp-mo · 2013-07-16T15:20:37Z

@cpelley On the above comment of effectively locking down the coord_system to being more or less immutable

I'm not proposing it. I just wanted to record why, on reflection, it is not an option.

cpelley · 2013-07-16T15:23:14Z

I think that if using overrides to the deepcopy operation, I would only be happy to do so for possibly private classes, which wont help much here, like you said, a user could very well associate additional information to these objects, its Python after all. Your third option sounds like a possibility.
Id like to know what other people think.

…r args.

pp-mo · 2013-07-16T16:41:09Z

Addressed most comments, but not all.
Awaiting discussion of deepcopy usage.
If/when this is resolved, still need to address outstanding problems:

whether Coord deepcopy speedup code should be moved to 'from_coord'
general ugliness of faking lbsec(d)/day(d) to make pp v2 and v3 fields appear similar in access logging

cpelley · 2013-07-17T08:49:44Z

lib/iris/fileformats/pp.py

...also, since these variables only have relevance to the ppfield, why not define them within there? (again just a thought, might make it neater since the structure then describes where these things tie together). This way, also we can refer to these objects from the subclasses of PPField:

class PPField(object): _NOCACHE_ELEMENT_NAMES = blabla _CACHABLE_ELEMENT_NAMES = blabla _V2_V3_HEADER_VARIANT_NAMES = blabla def __init__(self): .................................

Nice one @cpelley. That looks much neater.

pp-mo · 2013-07-17T11:09:42Z

Fix copy problem?

Something odd may be happening with Travis. These errors are not happening on the desktop.

cpelley · 2013-07-31T13:59:08Z

A number of change have been made by @rhattersley which may remove some of the benefits of this PR.
Might I recommend that you rebase and evaluate the benefits of this PR. This would then add weight to your PR if it remains to offer significant benefit.

rhattersley · 2013-07-31T14:14:41Z

Might I recommend that you rebase and evaluate the benefits of this PR

That's no simple task you're suggesting! The rules system has changed considerably since this PR was raised (and in large part, thanks to this very PR), so an equally considerable modification would be required.

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there are improvements made in this PR which have yet to be extracted elsewhere.

rhattersley · 2013-08-06T12:30:49Z

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there are improvements made in this PR which have yet to be extracted elsewhere.

I'd still like to get your opinion - but in the meantime I'm closing this PR.

pp-mo · 2013-08-06T13:57:56Z

re: @rhattersley

@pp-mo - in the light of #611, #637, and #642, I'd be very interested to hear your opinion on whether there > are improvements made in this PR which have yet to be extracted elsewhere
I'd still like to get your opinion - but in the meantime I'm closing this PR.

I don't think the subsequent PRs really capture the original intent here.
The key idea was to address the obvious fact that, for raw cube construction over a typical large number of input fields, the effort of making cube elements such as coordinates and cube attributes is all repeated many, many times in producing duplicate identical results. So that sounds like we should be able to reduce it by a large factor.

The PRs referred to claim to save about 3% and 10% of a load_cubes.
The initial work on this PR, simply using multiple references to the same elements, yielded about 30% improvement on load_raw. That isn't directly comparable, but will probably relate to about 15% overall -- i.e. when cube merge is included.
Unfortunately, the original promise of this PR was seriously degraded (30% --> 15% of load_raw timing) by the need to decouple all the duplicate information (i.e. to "deepcopy" all the cached results).

However, even before that degradation, I always felt that a much better win should be possible if result duplication could also be used to speed up the merge process.
The logic of this would be ...

rules operations may return cubes with duplicated (aka "cross-linked") elements
the merge operation must then "unlink" these where necessary -- but that can be many times less work, if lots of raw cubes produce only a few merged ones (i.e. the usual case with large file loads)
merge should also be able to save a lot of the time spent in equality testing between cube elements, if this is shortcut when the objects are actually equal
- e.g. (trivially) cube1.coord('x') is cube2.coord('x') ==> cube1.coord('x') == cube2.coord('x')

So I still think this approach could have very significant benefits, additional to what has already been achieved. But it seems that can only really work if you tackle result construction and merge together. Unfortunately, that wasn't clear when this began (we thought copying would be cheap !!). And, as 'build' and 'merge' were explicitly separated in this investigation, we still don't have even an estimate of what might be achieved that way.

Speedup PP load by caching rules action results on relevant parts of …

76953ea

…the field.

pp-mo mentioned this pull request Jul 2, 2013

Revert version-specific CF standard-names usage in .travis.yml. #592

Closed

cpelley reviewed Jul 12, 2013
View reviewed changes

cpelley reviewed Jul 15, 2013
View reviewed changes

lib/iris/coord_systems.py Outdated

Copy link

cpelley Jul 15, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldnt bother putting this into a variable

cpelley reviewed Jul 15, 2013
View reviewed changes

lib/iris/coords.py Outdated

Copy link

cpelley Jul 15, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

points = self.points
bounds = self.bounds
see comment below for reasoning

Copy the coord_system in Coord deepcopy.

53613dc

Address review issues: Cosmetic tweaks, deepcopy some more constructo…

c734853

…r args.

cpelley reviewed Jul 17, 2013
View reviewed changes

pp-mo added 2 commits July 17, 2013 11:43

Move global defs inside AttributeLogger mixin class.

55ec9c9

Fix copy problem?

94f9e05

rhattersley closed this Aug 6, 2013

pp-mo deleted the pp_speedup_memorules branch March 14, 2016 12:44

Conversation

pp-mo commented Jul 1, 2013

Uh oh!

pp-mo commented Jul 1, 2013

Uh oh!

pp-mo commented Jul 1, 2013

Uh oh!

pp-mo commented Jul 1, 2013

Uh oh!

pp-mo commented Jul 1, 2013

Uh oh!

cpelley commented Jul 11, 2013

Uh oh!

cpelley Jul 12, 2013

Choose a reason for hiding this comment

Uh oh!

rhattersley commented Jul 15, 2013

Uh oh!

cpelley Jul 15, 2013

Choose a reason for hiding this comment

Uh oh!

cpelley commented Jul 15, 2013

Uh oh!

cpelley Jul 15, 2013

Choose a reason for hiding this comment

Uh oh!

cpelley Jul 15, 2013

Choose a reason for hiding this comment

Uh oh!

cpelley commented Jul 15, 2013

Uh oh!

rhattersley commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

esc24 commented Jul 16, 2013

Uh oh!

cpelley commented Jul 16, 2013

Uh oh!

cpelley commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

cpelley commented Jul 16, 2013

Uh oh!

pp-mo commented Jul 16, 2013

Uh oh!

cpelley Jul 17, 2013

Choose a reason for hiding this comment

Uh oh!

pp-mo Jul 17, 2013

Choose a reason for hiding this comment

Uh oh!

pp-mo commented Jul 17, 2013

Uh oh!

cpelley commented Jul 31, 2013

Uh oh!

rhattersley commented Jul 31, 2013

Uh oh!

rhattersley commented Aug 6, 2013

Uh oh!

pp-mo commented Aug 6, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants