-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm #8807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f7a304c
9ef33bc
f0b49f2
c514787
809aad4
11f3900
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,8 @@ | |
| import os | ||
| import ctypes | ||
| import struct | ||
| from collections import namedtuple | ||
| from typing import Sequence | ||
| import numpy as np | ||
|
|
||
| import tvm._ffi | ||
| from tvm._ffi.base import _LIB, check_call, c_str, string_types, _RUNTIME_ONLY | ||
|
|
@@ -30,8 +31,69 @@ | |
| from . import _ffi_api | ||
|
|
||
|
|
||
| # profile result of time evaluator | ||
| ProfileResult = namedtuple("ProfileResult", ["mean", "results"]) | ||
| class BenchmarkResult: | ||
| """Runtimes from benchmarking""" | ||
|
|
||
| def __init__(self, results: Sequence[float]): | ||
| """Construct a new BenchmarkResult from a sequence of runtimes. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| results : Sequence[float] | ||
| Raw times from benchmarking | ||
|
|
||
| Attributes | ||
| ---------- | ||
| min : float | ||
| Minimum runtime in seconds of all results. | ||
| mean : float | ||
| Mean runtime in seconds of all results. If py:meth:`Module.time_evaluator` or | ||
| `benchmark` is called with `number` > 0, then each result is already the mean of a | ||
| `number` of runtimes, so this becomes the mean of means. | ||
| median : float | ||
| Median runtime in seconds of all results. If py:meth:`Module.time_evaluator` is called | ||
| with `number` > 0, then each result is already the mean of a `number` of runtimes, so | ||
| this becomes the median of means. | ||
| max : float | ||
| Maximum runtime in seconds of all results. If py:meth:`Module.time_evaluator` is called | ||
| with `number` > 0, then each result is already the mean of a `number` of runtimes, so | ||
| this becomes the maximum of those means. | ||
| std : float | ||
| Standard deviation in seconds of runtimes. If py:meth:`Module.time_evaluator` is called | ||
| with `number` > 0, then each result is already the mean of a `number` of runtimes, so | ||
| this becomes the standard deviation of means. | ||
| results : Sequence[float] | ||
| The collected runtimes (in seconds). This may be a series of mean runtimes if | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs more explanation. Currently a BenchmarkResult object contains no information on the benchmark parameters that were used, and it would be best (IMHO) to avoid there being surprises in terms of the interpretation of the results based on how the object was created. My recommendation would be to either fully document the behavior of what it means to benchmark with 'number > 1' or ensure that the BenchmarkResult object itself contains the benchmark parameters used. |
||
| py:meth:`Module.time_evaluator` or `benchmark` was run with `number` > 1. | ||
| """ | ||
| self.results = results | ||
| self.mean = np.mean(self.results) | ||
| self.std = np.std(self.results) | ||
| self.median = np.median(self.results) | ||
| self.min = np.min(self.results) | ||
| self.max = np.max(self.results) | ||
|
|
||
| def __repr__(self): | ||
| return "BenchmarkResult(min={}, mean={}, median={}, max={}, std={}, results={})".format( | ||
| self.min, self.mean, self.median, self.max, self.std, self.results | ||
| ) | ||
|
|
||
| def __str__(self): | ||
| return """Execution time summary: | ||
| {:^12} {:^12} {:^12} {:^12} {:^12} | ||
| {:^12.4f} {:^12.4f} {:^12.4f} {:^12.4f} {:^12.4f} | ||
| """.format( | ||
| "mean (ms)", | ||
| "median (ms)", | ||
| "max (ms)", | ||
| "min (ms)", | ||
| "std (ms)", | ||
| self.mean * 1000, | ||
| self.median * 1000, | ||
| self.max * 1000, | ||
| self.min * 1000, | ||
| self.std * 1000, | ||
| ) | ||
|
|
||
|
|
||
| class Module(object): | ||
|
|
@@ -209,7 +271,7 @@ def time_evaluator(self, func_name, dev, number=10, repeat=1, min_repeat_ms=0, f | |
| Returns | ||
| ------- | ||
| ftimer : function | ||
| The function that takes same argument as func and returns a ProfileResult. | ||
| The function that takes same argument as func and returns a BenchmarkResult. | ||
| The ProfileResult reports `repeat` time costs in seconds. | ||
| """ | ||
| try: | ||
|
|
@@ -230,12 +292,11 @@ def evaluator(*args): | |
| blob = feval(*args) | ||
| fmt = "@" + ("d" * repeat) | ||
| results = struct.unpack(fmt, blob) | ||
| mean = sum(results) / float(repeat) | ||
| return ProfileResult(mean=mean, results=results) | ||
| return BenchmarkResult(results) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does time_evaluator itself have a unit test that needs to be updated? If not, would it make sense to add a quick one?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. time_evaluator was not changed, so it makes no sense to create/modify a unit test for it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Um, unless I'm reading this diff incorrectly (sorry if I'm getting confused by github diffs!) this is a change to time_evaluator. Am I confused?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yes, this is the python binding for time_evaluator. Output is essentially the same, so there is no need to modify the tests.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, OK, but the existing tests for time_evaluator seem kind of weak. Here we're adding a new class (BenchmarkResult) which calculates a number of new statistics, but there's no tests for it ... meaning ... a regression in this behavior (either a change to time_evaluator or BenchmarkResult) would not be caught by CI. At minimum I'd recommend a unit test for BenchmarkResult() itself, which is easy to write. Not trying to be too nitpicky but I do feel like we should be constantly improving our test coverage and adding a new class without corresponding tests tends to be a red flag for me :-) WDYT?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify |
||
|
|
||
| return evaluator | ||
| except NameError: | ||
| raise NameError("time_evaluate is only supported when RPC is enabled") | ||
| raise NameError("time_evaluator is only supported when RPC is enabled") | ||
|
|
||
| def _collect_from_import_tree(self, filter_func): | ||
| """Helper function to collect modules from the tree matching a filter_func, then return it. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.