trappy: Speed up trappy by caching trace parsing #247

joelagnel · 2017-05-15T10:16:24Z

Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes
< 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from
csv. We leverage this and write out a data frames into a csv file when they are
created for the first time. Next time we read it out if it exists. To make
sure, the cache isn't stale, we take the md5sum of the trace file and also
ensure all CSVs exist before reading from the cache. I get a speed up of 16s to
1s when parsing a 100MB trace.

Signed-off-by: Joel Fernandes joelaf@google.com

joelagnel · 2017-05-15T10:24:35Z

CC @bjackman

bjackman

This looks awesome, can't wait to start using it :)

A few comments from me:

We need a way to disable this for the unit tests, otherwise we won't really test the trace parsing code when we run them in a dirty environment. Or we could just have the tests delete all the relevant cache files before they run. I guess I'd prefer the former since that could also be useful when testing interactively while hacking trappy.
Should add some unit tests for the caching functionality itself.

bjackman · 2017-05-15T18:09:51Z

trappy/ftrace.py


+def _trace_cache_path(trace_file):
+    cache_dir  = '.' +  os.path.basename(trace_file) + '.cache'
+    cache_path = os.path.dirname(trace_file) +  '/' + cache_dir


os.path.join

This seems to be broken in the travis CI, I'm not sure why though

yeah will check why :(

Also, please use os.path.join()

found why it broke them, because I wasn't using abspath causing mkdir to fail, fixed it in next rev.

bjackman · 2017-05-15T18:12:37Z

trappy/ftrace.py

+
+    md5sum = hashlib.md5(open(trace_file, 'rb').read()).hexdigest()
+    open(md5file, 'w').write(md5sum)
+


How come these aren't methods of GenericFTrace or BareTrace?

I felt these are just utility methods, that didn't as such belong to these classes so I'm just as happy keeping them in the module than within classes

I think I can put them in GenericFTrace, that'll be fine.

Cool, doesn't really matter was just curious.

bjackman · 2017-05-15T18:15:36Z

trappy/ftrace.py

+            try:
+                # Read csv into frames
+                for trace_class in self.trace_classes:
+                    csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv'


os.path.join

I'm not too comfortable with the use of unique_word[:-1] - I wonder if we could just use trace_class.__name__ or something like that?

yes, great idea. I'll try to use the class name. The only issue is if the class name changes between trappy versions for some reason. But I guess that's rare.

Yeah I don't think it's too much of a problem if the cache gets "invalidated" (especially since the cache parsing could theoretically change anyway). That actually got me thinking maybe we should also add a ".cache_version" in th cache directory. Then we could write "0" into that file and check it matches when reading back, so that if we decide to change the layout/format then we can just change that 0 and no need to worry about compatibility? Worth the effort?

bjackman · 2017-05-15T18:18:11Z

trappy/ftrace.py

+                for trace_class in self.trace_classes:
+                    csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv'
+                    trace_class.read_csv(csv_file)
+            except:


Maybe we should log (with the warnings module) a warning here? Would be annoying if we later broke this code and didn't notice for ages because an AttributeError was getting thrown away or something.

bjackman · 2017-05-15T18:24:26Z

trappy/ftrace.py

+                cache_good = False
+
+        if cache_good:
+            return


If I'm not mistaken you can get rid of cache_good:

if _check_trace_cache(self.trace_path): try: for trace_class in self.trace_classes: csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv' trace_class.read_csv(csv_file) return except: pass

Also what if we e.g. successfully parse the CSV for trace class A (so its data_frame member is populated), then fail while parsing the CSV for class B? It looks like all the events for A will be duplicated in __parse_trace_file below.

duplicated how?
data_frame will just be overwritten in that case
https://github.com/joelagnel/trappy/blob/97a29442e2c19bc5c09f73bee83f08d35406c907/trappy/base.py#L225
So I don't see how this is an issue

Ahhh yes you're quite right, I misunderstood the code, I thought it would keep appending to .data_frame

bjackman · 2017-05-15T18:35:50Z

trappy/ftrace.py


+        _create_trace_cache(self.trace_path)
+        for trace_class in self.trace_classes:
+            trace_class.write_csv(cache_path + '/' + trace_class.unique_word[:-1] + '.csv')


os.path.join

The creation of the CSV path is duplicated here, should probably put it in a method

This hunk should be a try/except too IMO (again maybe with a warning).

Actually, I didn't agree with the last point why this should be a try/except? If we fail to write to the cache for some reason, I'd rather treat it as a failure and consider it a bug. Are you worried about the system running out of space?

JaviMerino

CSV is unnecessarily complicated. Can't we use cpickle or pickleinstead? It would be much faster.

Taking it to another level, why not pickle the whole class (GenericFtrace derivative). This may be harder to do in code, but it would be more efficient.

JaviMerino · 2017-05-20T12:39:21Z

trappy/ftrace.py

 import os
 import re
 import pandas as pd
+import hashlib, shutil


One import per line please.

JaviMerino · 2017-05-20T12:41:00Z

trappy/ftrace.py


+def _trace_cache_path(trace_file):
+    cache_dir  = '.' +  os.path.basename(trace_file) + '.cache'
+    cache_path = os.path.dirname(trace_file) +  '/' + cache_dir


Also, please use os.path.join()

JaviMerino · 2017-05-20T12:41:49Z

trappy/ftrace.py

+
+def _check_trace_cache(trace_file):
+    cache_path = _trace_cache_path(trace_file)
+    md5file = cache_path + '/md5sum'


os.path.join()

joelagnel · 2017-05-20T21:37:26Z

All great suggestions. Will fix up.

About pickle, I still prefer csv because it's easier to investigate the cache in clear text and also is more immune to the internal format of pandas objects than binary object representation I think. Also I am also not sure how much faster we can get because we are already so fast with csv.

joelagnel · 2017-05-21T02:49:18Z

Also from this link it looks like csv is faster than pickle:
http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization

joelagnel · 2017-05-25T23:03:54Z

So this is the new version with all review comments addressed.

bjackman · 2017-05-30T11:15:24Z

tests/test_caching.py

@@ -0,0 +1,52 @@
+#    Copyright 2015-2017 ARM Limited


I'm not a lawyer but this doesn't look right to me 😛

Hehe will fix.. infact is it Ok to make this "ARM Limited, Google and Contributors" even?

That sounds fine to my inexpert ears - actually it looks like the tests require "ARM Limited" to be in the copyright line so you'll need to change that test if you just want to mark something as solely Google copyright (which would also be fine IIUC).

ah ok, there's even a Google test just after that that checks
https://github.com/ARM-software/trappy/blob/master/tests/test_copyright.py#L48
So probably some files already have Google in their copyrights, I'll double check

bjackman · 2017-05-30T11:28:42Z

trappy/ftrace.py

+                    trace_class.read_csv(csv_file)
+            except:
+                warnings.warn("TRAPpy: Caching disabled due to cache parsing error")
+                cache_good = False


You can still get rid of cache_good (previous comment here)

Yes, sorry will do.

bjackman · 2017-05-30T11:39:02Z

trappy/ftrace.py

+        for trace_class in self.trace_classes:
+            csv_file = self._get_csv_path(trace_class)
+            trace_class.write_csv(csv_file)
+


[Continuing from previous thread, which Github won't let me reply to any more]

why this should be a try/except? If we fail to write to the cache for some reason, I'd rather treat it as a failure and consider it a bug. Are you worried about the system running out of space?

I was actually thinking of something like a permissions error - e.g. sometimes at ARM people share files via NFS mounts where only they have write permissions. Just in general the fact that you intuitively expect to be able to parse traces from read-only places.

Maybe just catch OSErrors?

Ok, I'll catch OSError and print a warning in this case instead of just failing.

bjackman · 2017-05-30T11:50:49Z

trappy/ftrace.py

+        for trace_class in self.trace_classes:
+            csv_file = self._get_csv_path(trace_class)
+            trace_class.write_csv(csv_file)
+


That actually got me thinking - would Trappy be a bad OS citizen by dropping these ".cache" directories all around people's filesystems? E.g. on Linux should we better use /var/cache? That would introduce platform-specific stuff so maybe it's not worth the hassle.

Yes, I think management of that cache directory (configuring its path etc) would be a pain, I'd much rather just drop them into the current directory. CSVs are also much smaller than the actual trace so its not a big space consumer in my experience.

bjackman · 2017-05-30T11:53:27Z

tests/test_caching.py

+        trace_file = os.path.basename(trace_path)
+        cache_dir = '.' + trace_file + '.cache'
+
+        self.assertFalse(cache_dir in os.listdir(trace_dir))


Would be good to also test

That the data parsed from a valid cache is exactly the same as the data from the corresponding trace (I've noticed it's quite easy to do things like mess up column names when doing DataFrame<->CSV).

That the checksum test really works, i.e. that you don't get bogus data when there's an invalid cache.

I'll add these , could you please share what troubles you had when converting CSV to DataFrame and vice versa?

Sorry, I can't actually remember 😖 I just had a post-it note scribbled on the inside of my brain that said "double check pd.read_csv did what you thought it did".

I added the test for data but I'm afraid for the check sum test I'm a bit worn out (I worked every night last few nights on this) - appreciate any help if you want to add more test cases. I think for now the tests I have which check the cached/uncached data are the same, the column names match, and whether caching can be disabled give most of the coverage. I am sure more things can be creatively added here.

Yeah this look good to me, thanks for adding it, I think it's really useful to have coverage like this.

joelagnel · 2017-06-02T06:49:44Z

Hopefully this one is merge-ready

bjackman

Thanks a lot @joelagnel, all of my (very demanding) comments are addressed! Looking forward to a SNAPpy TRAPpy (sorry..)

bjackman · 2017-06-02T12:06:51Z

tests/test_caching.py

+        trace_file = os.path.basename(trace_path)
+        cache_dir = '.' + trace_file + '.cache'
+
+        self.assertFalse(cache_dir in os.listdir(trace_dir))


Yeah this look good to me, thanks for adding it, I think it's really useful to have coverage like this.

bjackman · 2017-06-02T13:08:48Z

I pushed an extra test, along with some microscopic fixes for stuff I noticed while writing it, to the branch on my repo here: https://github.com/bjackman/trappy/tree/for-trappy-cache

So I guess feel free to cherry-pick/squash that into your branch? Or you can leave it and I'll submit it separately once this is merged (I'm ill this week and cannot think clearly at all so the code may be nonsense).

joelagnel · 2017-06-05T05:40:49Z

Squashed @bjackman's fixups except his new test case (which IMO can be added after this set is merged, had to drop it because the test case itself had a bug). Pushed a new version. @JaviMerino @bjackman does it look good to you?

bjackman · 2017-06-05T11:41:40Z

This is good from my point of view, let's see what others think.

bjackman · 2017-06-08T11:22:01Z

I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with the missing file for the test I added. I also added another test for something I was confused about, and an extra check in _check_trace_cache, to avoid a warning if the set of parsed events has changed since the cache was written.

Also, AFAICT this currently breaks the normalize_time and window/abs_window params. Probably the simplest solution is just to add them to the md5sum or something like that?

edit: Does it also break the events parameter? I think this case is actually broken (as @credp suggested):

Parse trace with events=['foo'], cache is created but all .csv files except the one for "foo" are empty)
Parse trace again with events=['bar'], the DataFrame for "bar" is empty.

Maybe it's safest to just store & check the whole parameter list for __init__

joelagnel · 2017-06-12T16:00:09Z

CC @sinkap

joelagnel · 2017-06-14T05:01:46Z

Updated with @bjackman 's latest commits. Thanks!
Will work on the other corner cases now (new event passed after cache creation). @bjackman while I do that could fix the normalize time issue you mentioned if you had time? I am new to that code so I could use some help there. Thanks!

joelagnel · 2017-06-14T06:51:03Z

@bjackman so I dropped "ftrace: Add explicit check for presence of each .csv file" and added a new commit which reads the cache partially. Your new test case for dynamic events added after caching passes with that.

bjackman · 2017-06-14T14:07:46Z

I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with

A couple more minor fixups I noticed
A couple more simple tests (which fail without the following commits)
A commit to re-order the caching operations, to fix normalize_time (and incidentally clean up the impact on finalize_objects)
A commit to store and check the window parameter.

sinkap · 2017-06-16T11:38:16Z

@bjackman I will review your branch today. Since it seems to have some more fixes. @joelagnel

joelagnel · 2017-06-16T23:02:17Z

@bjackman I think you didn't update your branch (last update was June 8 to https://github.com/bjackman/trappy/tree/for-trappy-cache and I already picked up those fixes). Also could you rebase on top of master when you push again? Thanks!

bjackman · 2017-06-19T10:35:42Z

Eek.. you're right I didn't push, and I can't find the commits on my desktop... I will have to check when I get to my laptop 😨

sinkap · 2017-06-21T20:14:47Z

@bjackman feel free to ping when you have the updated commits.

joelagnel · 2017-06-22T05:08:57Z

@bjackman not having caching is killing me, can't wait to have this goodness in with your fixes ;D

credp · 2017-06-22T07:58:52Z

I think @joelagnel and @bjackman addressed Javi's change request, so I am going to dismiss that review to remove the big red cross 👍

Request to consider pickle instead of csv have been considered and addressed.

joelagnel · 2017-06-25T04:04:04Z

thanks @credp

credp · 2017-06-26T08:27:40Z

So where are we with this PR? Looks just about ready pending a push from @bjackman? (cc @joelagnel )

sinkap · 2017-06-26T08:31:36Z

Yep. we have a sync meeting today. Let's finalize it there.

…

On Jun 26, 2017 10:27 AM, "Chris Redpath" ***@***.***> wrote: So where are we with this PR? Looks just about ready pending a push from @bjackman <https://github.com/bjackman>? (cc @joelagnel <https://github.com/joelagnel> ) — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACDZbOCEx0OqPp5gwPr5WoH2es9551sQks5sH2t-gaJpZM4Na7y5> .

bjackman · 2017-06-27T09:24:10Z

Hey guys, really sorry for causing delays here, I forgot about it then went on holiday. Will update tonight when I get home (where the lost commits are on my computer).

sinkap · 2017-06-27T10:05:17Z

@bjackman Enjoy your holiday! 😄

I am merging #253. It would be best to rebase this :)

bjackman · 2017-06-27T18:21:37Z

OK, I've pushed the commits now: https://github.com/bjackman/trappy/tree/for-trappy-cache

Forgotten the details so I'll do a self-review tomorrow :)

joelagnel · 2017-06-27T20:03:24Z

@bjackman, also could you rebase on top of @credp 's changes which got merged? :) thanks man.

…

On Tue, Jun 27, 2017 at 11:21 AM, Brendan Jackman ***@***.***> wrote: OK, I've pushed the commits now: https://github.com/bjackman/ trappy/tree/for-trappy-cache Forgotten the details so I'll do a self-review tomorrow :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACSVD3ZYGhZgSDOTZTSPAjoBmTPZgKbks5sIUgxgaJpZM4Na7y5> .

bjackman · 2017-06-29T12:33:25Z

OK I updated the branch. The rebase was pretty awkward so I think the patch stack is messed up, but TBH I think we should squash them all anyway instead of having "add X", "fix X" in a single PR.

Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes < 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from csv. We leverage this and write out a data frames into a csv file when they are created for the first time. Next time we read it out if it exists. To make sure, the cache isn't stale, we take the md5sum of the trace file and also ensure all CSVs exist before reading from the cache. I get a speed up of 16s to 1s when parsing a 100MB trace. Co-developed-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Joel Fernandes <joelaf@google.com>

Store a JSON file with the parameters used to parse a trace in the cache. If, on retrieval, we find a mismatch then simply invalidate the cache. A richer implementation might detect whether we can still use the cache (e.g. if the requested window is a subset of the cached one), but for now I don't think that's worth the effort/complexity.

Add test for invalid cache Add test for caching with extra events Add test for normalize_time and window parameters Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Joel Fernandes <joelaf@google.com>

joelagnel · 2017-06-29T20:41:55Z

Thanks @bjackman , squashed as many as that made sense to squash. HAPpy SNAPpy TRAPpy!

sinkap · 2017-06-29T20:53:57Z

Merged!

* libs/utils/android: reset airplane mode for Jankbench Jankbench activates airplaine mode before executing. Set airplane mode back OFF after execution finishes. Signed-off-by: Juri Lelli <juri.lelli@arm.com>

joelagnel mentioned this pull request May 15, 2017

[RFC] trappy: add support for event callbacks #241

Closed

bjackman reviewed May 15, 2017

View reviewed changes

JaviMerino previously requested changes May 20, 2017

View reviewed changes

bjackman reviewed May 30, 2017

View reviewed changes

bjackman reviewed Jun 2, 2017

View reviewed changes

sinkap self-assigned this Jun 14, 2017

sinkap force-pushed the master branch 3 times, most recently from 6f40d93 to a84dd23 Compare June 14, 2017 21:41

joelagnel mentioned this pull request Jun 25, 2017

Remove raw trace output #253

Closed

Joel Fernandes and others added 3 commits June 29, 2017 13:35

tests: Caching tests

0ce3d7a

Add test for invalid cache Add test for caching with extra events Add test for normalize_time and window parameters Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Joel Fernandes <joelaf@google.com>

sinkap closed this Jun 29, 2017


		md5sum = hashlib.md5(open(trace_file, 'rb').read()).hexdigest()
		open(md5file, 'w').write(md5sum)

trappy: Speed up trappy by caching trace parsing #247

trappy: Speed up trappy by caching trace parsing #247

Uh oh!

Conversation

joelagnel commented May 15, 2017

Uh oh!

joelagnel commented May 15, 2017

Uh oh!

bjackman left a comment

Choose a reason for hiding this comment

Uh oh!

bjackman May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joelagnel May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JaviMerino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joelagnel commented May 20, 2017

Uh oh!

joelagnel commented May 21, 2017

Uh oh!

joelagnel commented May 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjackman May 15, 2017 •

edited

Loading

bjackman May 15, 2017 •

edited

Loading

joelagnel May 18, 2017 •

edited

Loading

bjackman May 18, 2017 •

edited

Loading

bjackman May 15, 2017 •

edited

Loading

bjackman May 18, 2017 •

edited

Loading

bjackman May 15, 2017 •

edited

Loading

bjackman May 30, 2017 •

edited

Loading

joelagnel May 31, 2017 •

edited

Loading