Skip to content

Conversation

@poke1024
Copy link
Contributor

@poke1024 poke1024 commented Oct 5, 2016

Speeds up Partition[] by a factor 2 (using First[Timing[Partition[Range[50000], 15, 1]]], goes from roughly 14s to 7s). If merged together with #575, it's a factor 8, as it then the last_evaluated optimization really works).

@sn6uv
Copy link
Member

sn6uv commented Oct 7, 2016

Looks like a good optimisation. I'm not entirely happy with the additional complexity introduced here (e.g. relying on sequences) though.

@poke1024
Copy link
Contributor Author

poke1024 commented Oct 7, 2016

I'm not really happy with this sequence part complexity either, but then the optimization really relies on not reevaluating all these leaves and using the reduced sequence information that's already there.

For example, using Partition[x, 20, 1] will generate a list with about 20 times the elements of x. We do not want to rescan all these leaves, if there are any Sequence heads there, if we already have this information for x.

I wonder if it can be encapsulated in a cleaner way.

@poke1024
Copy link
Contributor Author

poke1024 commented Oct 7, 2016

I've added the same method to Most and Rest now, which starts making these functions non-quadratic. Should add it to Part as well (not done yet). Example code for testing the change:

MyTotal[{}] := 0;
MyTotal[x_] := Head[x] + MyTotal[Rest[x]];
$RecursionLimit=512;
First[Timing[MyTotal[Range[250]]]]

Yields 1.22303 before, and 0.751989 after the change. Will be much smaller (and linear in n) with #575. Without #575, the runtime is still quadratic in n due to the highly inefficient cache check in evaluate().

Concering the _sequences stuff complexity: getting rid of this stuff here and changing this PR to set _sequences_ it to [] would still make this a large improvement, as the most important part of this optimization is setting last_evaluated for expressions that consist of leaves from evaluated expressions.

In the long term, though, I believe, we will end up with a data structure that is sort a hybrid of the sequence cache and #575, something like a dict of heads, where for each a list of occurence indices is stored. There's are a lot of iterations in Mathics that need not to go over all leaves, but only over those with specific heads (esp. in the pattern matcher). So Expression.filter_leaves should be independent of the number of total leaves. If a data structure knows those heads, and it ensures that those indices are transferred along the elementary list operations like Take, Partition, ... it can save a lot of time in all parts of Mathics.

EDIT In C a nice structure to accomplish the latter would be a sort of skip list:

struct Leaf { Leaf *next; Leaf *next_same_head; };

@poke1024
Copy link
Contributor Author

continued in #619

@poke1024 poke1024 closed this Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants