Skip to content

Conversation

@poke1024
Copy link
Contributor

Currently, Definitions.last_changed does a full tree traversal each time the expression timestamp is checked. This is bad if the expression is huge and deep, as caching speed is dependent on the expression size.

This PR introduces a set of symbols that can be efficiently checked without the need for a full tree traversal; usually the number of symbols even in a big nested list is small and thus can be regarded a small constant.

The main target for this PR is faster operations with big lists, e.g. using list = Nest[Partition[#, 2]&, Range[2 ^ 14], 14]:

without this PR:
In[2]:= Timing[Length[list]] Out[2]= {0.334763, 1}

with this PR:
In[2]:= Timing[Length[list]] Out[2]= {0.209859, 1}

The remaining runtime stems not from Definitions.last_changed but from other effects that are targeted in a separate PR.

Apart from this targeted effect, this PR seems to improve the overall benchmarks (see attached files).

opt5-before.txt

opt5-after.txt

@poke1024 poke1024 changed the title Faster expression timestamp checks Constant time expression timestamp checks Sep 23, 2016
@poke1024 poke1024 changed the title Constant time expression timestamp checks Roughly constant time expression timestamp checks Sep 23, 2016
@sn6uv
Copy link
Member

sn6uv commented Sep 28, 2016

I'm wondering, if the expression changes will the list of cached symbols be invalid? e.g. I'm imagining the sequence of tree transformations Plus[a, b] -> Plus[a, f[1, y]] -> Plus[a, f[1, 2]] where at the final stage f does not appear insymbols and thus f[1,2] won't be evaluated.

@poke1024
Copy link
Contributor Author

poke1024 commented Sep 28, 2016

After reading your comment, I looked into the implications of this change again, and I need to say I underestimated the side effects some current code might have. I had assumed that usually all new Expressions are generated via the Expression constructor, and that virtually no patching on the leaves list happens; this is not always right.

So, to fix this (and also to some cases of #577 that I didn't see before due to a lack of proper analysis), I suggest we make leaves and head in Expression immutable and read-only from the outside. This way, new leaves enter Expression either via the Expression constructor or via a small set of defined mutation operators/methods in Expression. This way it's easy to ensure the cache invariants.

With the latest commits, I introduced the following changes:

  • Expression.leaves is now always a tuple, so no reordering or changing elements there anymore (this might also help memory consumption).
  • If someone wants to change the leaves of an Expression, s/he needs to either construct a new Expression, or to use the new methods set_leaves or set_head. They take care of updating all necessary caches.
  • The attributes for Expression.leaves and Expression.head are no longer writable (see previous point).

All these changes enforce and check stuff at runtime. So, trying to set some expr.leaves[2] = None will bail out. So, this should also be a safe solution for new code to come.

Even though there is a certain overhead with the new property based encapsulation of leaves and head (i.e. redirecting them to Expression._leaves and Expression._head), and sometimes copying tuples into lists, things look good in terms of performance. The general performance is not slower than before, yet the special speedup cases for big are also intact. For a performance overview with the latest changes compared to the current master, see:

bench_before.txt
bench_after.txt

EDIT This also gives us the necessary foothold to implement packed arrays later on (#561).

@poke1024
Copy link
Contributor Author

poke1024 commented Oct 4, 2016

The benefit of this PR can be seen when doing a basic Fold:

With this PR:

In[1]:= Timing[Fold[#1+#2&,Range[5000]]]
Out[1]= {20.6233, 12502500}

Without this PR:

In[2]:= Timing[Fold[#1+#2&,Range[5000]]]
Out[2]= {118.623, 12502500}

With this PR, Mathics is linear:

with-575

Without this PR, Mathics is quadratic:

without-575

@poke1024 poke1024 mentioned this pull request Oct 5, 2016
@poke1024
Copy link
Contributor Author

continued in #619

@poke1024 poke1024 closed this Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants