-
-
Notifications
You must be signed in to change notification settings - Fork 394
Description
profiler categories: deterministic (cProfile, yappi), sampling (vmprof, pyflame)
deterministic:
- measures "own time", "total time" (own time + called children), and call counts
- I'm seeing dubious call counts from cProfile
- clock resolution is important
- clock type is important (wall time, CPU time, thread time)
- both cProfile and yappi support wall time and CPU time
- CPU time is preferred (though in a Trio program nothing should be using real sleep except the scheduler)
- there may be tradeoff between clock resolution and type. E.g. on OS X I observe 100 ns resolution for
perf_counter(), but only 1 usec resolution forprocess_time(). - yappi is supposed to handle multiple OS threads (though Trio programs often avoid threads)
sampling:
- measures hits per function (and optionally per code line)
- infers "own time" and "total time" from the hits
- much less overhead than deterministic
- sampling rate is important
- effective sample rate of vmprof is apparently < 300 Hz
- vmprof can sample native functions
the bad news: none of these profilers appear to be coroutine aware
Neither type of profiler will correctly report "total time" in the context of cooperative multitasking. E.g. function awaits on trio.sleep, which yields to scheduler, which resumes another task, and so on, all incorrectly attributed as children of the original function.
The profilers should somehow use yield or else details of the particular event loops (asyncio, Trio, etc.) to stop attributing time to the parent function. They apparently already do this for select, etc.
It seems like a ripe area to improve tooling for async / await / Trio. I can't find any prior work for Python addressing this.
the good news: "own time" and function counts are still useful in the context of coroutines