-
Notifications
You must be signed in to change notification settings - Fork 33
trappy: Speed up trappy by caching trace parsing #247
Conversation
|
CC @bjackman |
bjackman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome, can't wait to start using it :)
A few comments from me:
- We need a way to disable this for the unit tests, otherwise we won't really test the trace parsing code when we run them in a dirty environment. Or we could just have the tests delete all the relevant cache files before they run. I guess I'd prefer the former since that could also be useful when testing interactively while hacking trappy.
- Should add some unit tests for the caching functionality itself.
trappy/ftrace.py
Outdated
|
|
||
| def _trace_cache_path(trace_file): | ||
| cache_dir = '.' + os.path.basename(trace_file) + '.cache' | ||
| cache_path = os.path.dirname(trace_file) + '/' + cache_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join- This seems to be broken in the travis CI, I'm not sure why though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah will check why :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please use os.path.join()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found why it broke them, because I wasn't using abspath causing mkdir to fail, fixed it in next rev.
|
|
||
| md5sum = hashlib.md5(open(trace_file, 'rb').read()).hexdigest() | ||
| open(md5file, 'w').write(md5sum) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come these aren't methods of GenericFTrace or BareTrace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt these are just utility methods, that didn't as such belong to these classes so I'm just as happy keeping them in the module than within classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can put them in GenericFTrace, that'll be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, doesn't really matter was just curious.
trappy/ftrace.py
Outdated
| try: | ||
| # Read csv into frames | ||
| for trace_class in self.trace_classes: | ||
| csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join- I'm not too comfortable with the use of
unique_word[:-1]- I wonder if we could just usetrace_class.__name__or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, great idea. I'll try to use the class name. The only issue is if the class name changes between trappy versions for some reason. But I guess that's rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't think it's too much of a problem if the cache gets "invalidated" (especially since the cache parsing could theoretically change anyway). That actually got me thinking maybe we should also add a ".cache_version" in th cache directory. Then we could write "0" into that file and check it matches when reading back, so that if we decide to change the layout/format then we can just change that 0 and no need to worry about compatibility? Worth the effort?
trappy/ftrace.py
Outdated
| for trace_class in self.trace_classes: | ||
| csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv' | ||
| trace_class.read_csv(csv_file) | ||
| except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should log (with the warnings module) a warning here? Would be annoying if we later broke this code and didn't notice for ages because an AttributeError was getting thrown away or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure
trappy/ftrace.py
Outdated
| cache_good = False | ||
|
|
||
| if cache_good: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken you can get rid of cache_good:
if _check_trace_cache(self.trace_path):
try:
for trace_class in self.trace_classes:
csv_file = cache_path + '/' + trace_class.unique_word[:-1] + '.csv'
trace_class.read_csv(csv_file)
return
except:
pass
Also what if we e.g. successfully parse the CSV for trace class A (so its data_frame member is populated), then fail while parsing the CSV for class B? It looks like all the events for A will be duplicated in __parse_trace_file below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated how?
data_frame will just be overwritten in that case
https://github.com/joelagnel/trappy/blob/97a29442e2c19bc5c09f73bee83f08d35406c907/trappy/base.py#L225
So I don't see how this is an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yes you're quite right, I misunderstood the code, I thought it would keep appending to .data_frame
trappy/ftrace.py
Outdated
|
|
||
| _create_trace_cache(self.trace_path) | ||
| for trace_class in self.trace_classes: | ||
| trace_class.write_csv(cache_path + '/' + trace_class.unique_word[:-1] + '.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join- The creation of the CSV path is duplicated here, should probably put it in a method
- This hunk should be a
try/excepttoo IMO (again maybe with a warning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I didn't agree with the last point why this should be a try/except? If we fail to write to the cache for some reason, I'd rather treat it as a failure and consider it a bug. Are you worried about the system running out of space?
JaviMerino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV is unnecessarily complicated. Can't we use cpickle or pickleinstead? It would be much faster.
Taking it to another level, why not pickle the whole class (GenericFtrace derivative). This may be harder to do in code, but it would be more efficient.
trappy/ftrace.py
Outdated
| import os | ||
| import re | ||
| import pandas as pd | ||
| import hashlib, shutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One import per line please.
trappy/ftrace.py
Outdated
|
|
||
| def _trace_cache_path(trace_file): | ||
| cache_dir = '.' + os.path.basename(trace_file) + '.cache' | ||
| cache_path = os.path.dirname(trace_file) + '/' + cache_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please use os.path.join()
trappy/ftrace.py
Outdated
|
|
||
| def _check_trace_cache(trace_file): | ||
| cache_path = _trace_cache_path(trace_file) | ||
| md5file = cache_path + '/md5sum' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join()
|
All great suggestions. Will fix up. About pickle, I still prefer csv because it's easier to investigate the cache in clear text and also is more immune to the internal format of pandas objects than binary object representation I think. Also I am also not sure how much faster we can get because we are already so fast with csv. |
|
Also from this link it looks like csv is faster than pickle: |
|
So this is the new version with all review comments addressed. |
tests/test_caching.py
Outdated
| @@ -0,0 +1,52 @@ | |||
| # Copyright 2015-2017 ARM Limited | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a lawyer but this doesn't look right to me 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe will fix.. infact is it Ok to make this "ARM Limited, Google and Contributors" even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine to my inexpert ears - actually it looks like the tests require "ARM Limited" to be in the copyright line so you'll need to change that test if you just want to mark something as solely Google copyright (which would also be fine IIUC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, there's even a Google test just after that that checks
https://github.com/ARM-software/trappy/blob/master/tests/test_copyright.py#L48
So probably some files already have Google in their copyrights, I'll double check
trappy/ftrace.py
Outdated
| trace_class.read_csv(csv_file) | ||
| except: | ||
| warnings.warn("TRAPpy: Caching disabled due to cache parsing error") | ||
| cache_good = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still get rid of cache_good (previous comment here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry will do.
| for trace_class in self.trace_classes: | ||
| csv_file = self._get_csv_path(trace_class) | ||
| trace_class.write_csv(csv_file) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Continuing from previous thread, which Github won't let me reply to any more]
why this should be a try/except? If we fail to write to the cache for some reason, I'd rather treat it as a failure and consider it a bug. Are you worried about the system running out of space?
I was actually thinking of something like a permissions error - e.g. sometimes at ARM people share files via NFS mounts where only they have write permissions. Just in general the fact that you intuitively expect to be able to parse traces from read-only places.
Maybe just catch OSErrors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll catch OSError and print a warning in this case instead of just failing.
| for trace_class in self.trace_classes: | ||
| csv_file = self._get_csv_path(trace_class) | ||
| trace_class.write_csv(csv_file) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually got me thinking - would Trappy be a bad OS citizen by dropping these ".cache" directories all around people's filesystems? E.g. on Linux should we better use /var/cache? That would introduce platform-specific stuff so maybe it's not worth the hassle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think management of that cache directory (configuring its path etc) would be a pain, I'd much rather just drop them into the current directory. CSVs are also much smaller than the actual trace so its not a big space consumer in my experience.
| trace_file = os.path.basename(trace_path) | ||
| cache_dir = '.' + trace_file + '.cache' | ||
|
|
||
| self.assertFalse(cache_dir in os.listdir(trace_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also test
- That the data parsed from a valid cache is exactly the same as the data from the corresponding trace (I've noticed it's quite easy to do things like mess up column names when doing
DataFrame<->CSV). - That the checksum test really works, i.e. that you don't get bogus data when there's an invalid cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add these , could you please share what troubles you had when converting CSV to DataFrame and vice versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I can't actually remember 😖 I just had a post-it note scribbled on the inside of my brain that said "double check pd.read_csv did what you thought it did".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the test for data but I'm afraid for the check sum test I'm a bit worn out (I worked every night last few nights on this) - appreciate any help if you want to add more test cases. I think for now the tests I have which check the cached/uncached data are the same, the column names match, and whether caching can be disabled give most of the coverage. I am sure more things can be creatively added here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this look good to me, thanks for adding it, I think it's really useful to have coverage like this.
|
Hopefully this one is merge-ready |
bjackman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @joelagnel, all of my (very demanding) comments are addressed! Looking forward to a SNAPpy TRAPpy (sorry..)
| trace_file = os.path.basename(trace_path) | ||
| cache_dir = '.' + trace_file + '.cache' | ||
|
|
||
| self.assertFalse(cache_dir in os.listdir(trace_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this look good to me, thanks for adding it, I think it's really useful to have coverage like this.
|
I pushed an extra test, along with some microscopic fixes for stuff I noticed while writing it, to the branch on my repo here: https://github.com/bjackman/trappy/tree/for-trappy-cache So I guess feel free to cherry-pick/squash that into your branch? Or you can leave it and I'll submit it separately once this is merged (I'm ill this week and cannot think clearly at all so the code may be nonsense). |
|
Squashed @bjackman's fixups except his new test case (which IMO can be added after this set is merged, had to drop it because the test case itself had a bug). Pushed a new version. @JaviMerino @bjackman does it look good to you? |
|
This is good from my point of view, let's see what others think. |
|
I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with the missing file for the test I added. I also added another test for something I was confused about, and an extra check in _check_trace_cache, to avoid a warning if the set of parsed events has changed since the cache was written. Also, AFAICT this currently breaks the edit: Does it also break the
Maybe it's safest to just store & check the whole parameter list for |
|
CC @sinkap |
|
@bjackman so I dropped "ftrace: Add explicit check for presence of each .csv file" and added a new commit which reads the cache partially. Your new test case for dynamic events added after caching passes with that. |
|
I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with
|
6f40d93 to
a84dd23
Compare
|
@bjackman I will review your branch today. Since it seems to have some more fixes. @joelagnel |
|
@bjackman I think you didn't update your branch (last update was June 8 to https://github.com/bjackman/trappy/tree/for-trappy-cache and I already picked up those fixes). Also could you rebase on top of master when you push again? Thanks! |
|
Eek.. you're right I didn't push, and I can't find the commits on my desktop... I will have to check when I get to my laptop 😨 |
|
@bjackman feel free to ping when you have the updated commits. |
|
@bjackman not having caching is killing me, can't wait to have this goodness in with your fixes ;D |
|
I think @joelagnel and @bjackman addressed Javi's change request, so I am going to dismiss that review to remove the big red cross 👍 |
Request to consider pickle instead of csv have been considered and addressed.
|
thanks @credp |
|
So where are we with this PR? Looks just about ready pending a push from @bjackman? (cc @joelagnel ) |
|
Yep. we have a sync meeting today. Let's finalize it there.
…On Jun 26, 2017 10:27 AM, "Chris Redpath" ***@***.***> wrote:
So where are we with this PR? Looks just about ready pending a push from
@bjackman <https://github.com/bjackman>? (cc @joelagnel
<https://github.com/joelagnel> )
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDZbOCEx0OqPp5gwPr5WoH2es9551sQks5sH2t-gaJpZM4Na7y5>
.
|
|
Hey guys, really sorry for causing delays here, I forgot about it then went on holiday. Will update tonight when I get home (where the lost commits are on my computer). |
|
OK, I've pushed the commits now: https://github.com/bjackman/trappy/tree/for-trappy-cache Forgotten the details so I'll do a self-review tomorrow :) |
|
… On Tue, Jun 27, 2017 at 11:21 AM, Brendan Jackman ***@***.***> wrote:
OK, I've pushed the commits now: https://github.com/bjackman/
trappy/tree/for-trappy-cache
Forgotten the details so I'll do a self-review tomorrow :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACSVD3ZYGhZgSDOTZTSPAjoBmTPZgKbks5sIUgxgaJpZM4Na7y5>
.
|
|
OK I updated the branch. The rebase was pretty awkward so I think the patch stack is messed up, but TBH I think we should squash them all anyway instead of having "add X", "fix X" in a single PR. |
Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes < 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from csv. We leverage this and write out a data frames into a csv file when they are created for the first time. Next time we read it out if it exists. To make sure, the cache isn't stale, we take the md5sum of the trace file and also ensure all CSVs exist before reading from the cache. I get a speed up of 16s to 1s when parsing a 100MB trace. Co-developed-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Joel Fernandes <joelaf@google.com>
Store a JSON file with the parameters used to parse a trace in the cache. If, on retrieval, we find a mismatch then simply invalidate the cache. A richer implementation might detect whether we can still use the cache (e.g. if the requested window is a subset of the cached one), but for now I don't think that's worth the effort/complexity.
Add test for invalid cache Add test for caching with extra events Add test for normalize_time and window parameters Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Joel Fernandes <joelaf@google.com>
|
Thanks @bjackman , squashed as many as that made sense to squash. HAPpy SNAPpy TRAPpy! |
|
Merged! |
* libs/utils/android: reset airplane mode for Jankbench Jankbench activates airplaine mode before executing. Set airplane mode back OFF after execution finishes. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes
< 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from
csv. We leverage this and write out a data frames into a csv file when they are
created for the first time. Next time we read it out if it exists. To make
sure, the cache isn't stale, we take the md5sum of the trace file and also
ensure all CSVs exist before reading from the cache. I get a speed up of 16s to
1s when parsing a 100MB trace.
Signed-off-by: Joel Fernandes joelaf@google.com