Skip to content

NonMaxSuppression is the biggest performance bottleneck of the SSD-MobileNet object detection models on Android mobile phone #1609

@futurely

Description

@futurely

System information

  • What is the top-level directory of the model you are using:
    Standalone.

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    No.

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Android 6.0

  • TensorFlow installed from (source or binary):
    Source.

  • TensorFlow version (use command below):
    master 1f82b7a.

  • Bazel version (if compiling from source):
    0.4.5.

  • CUDA/cuDNN version:
    N.A.

  • GPU model and memory:
    CPU: HiSilicon Kirin 935, 3GB
    GPU: ARM Mali-T624

  • Exact command to reproduce:

# https://stackoverflow.com/a/43627334
change tensorflow/core/framework/register_types.h
#define TF_CALL_bool(m)
to
#define TF_CALL_bool(m) m(bool)

bazel build -c opt \
  --crosstool_top=//external:android/crosstool \
  --cpu=armeabi-v7a \
  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
  tensorflow/tools/benchmark:benchmark_model

adb push bazel-bin/tensorflow/tools/benchmark/benchmark_model /data/local/tmp

bazel build tensorflow/tools/graph_transforms:transform_graph

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=tensorflow/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017/frozen_inference_graph.pb \
--out_graph=tensorflow/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017/transformed_inference_graph.pb \
--inputs='image_tensor' \
--outputs='detection_boxes,detection_scores,detection_classes,num_detections' \
--transforms='
  add_default_attributes
  strip_unused_nodes(type=float)
  remove_nodes(op=CheckNumerics)
  fold_constants(ignore_errors=true)
  fold_batch_norms
  fold_old_batch_norms
  fuse_resize_pad_and_conv
  fuse_pad_and_conv
  fuse_resize_and_conv
  quantize_weights
  quantize_nodes
  strip_unused_nodes
  sort_by_execution_order'

adb push frozen_inference_graph.pb /data/local/tmp

adb push transformed_inference_graph.pb /data/local/tmp

adb shell /data/local/tmp/benchmark_model \
 --graph=/data/local/tmp/frozen_inference_graph.pb \
 --input_layer=image_tensor:0 \
 --input_layer_shape=1,224,224,3 \
 --input_layer_type=uint8 \
 --output_layer=detection_boxes:0,detection_scores:0,detection_classes:0,num_detections:0 \
 > frozen_inference_graph.benchmark

adb shell /data/local/tmp/benchmark_model \
 --graph=/data/local/tmp/transformed_inference_graph.pb \
 --input_layer=image_tensor:0 \
 --input_layer_shape=1,224,224,3 \
 --input_layer_type=uint8 \
 --output_layer=detection_boxes:0,detection_scores:0,detection_classes:0,num_detections:0 \
 > transformed_inference_graph.benchmark

Describe the problem

The time spent on NonMaxSuppression nearly doubled or more than tripled the time spent on Conv2D or QuantizedConv2D during benchmarking the inference graphs.

Inference Graph Node Type Average Time %
frozen_inference_graph.pb NonMaxSuppression 48.239
frozen_inference_graph.pb Conv2D 25.395
transformed_inference_graph.pb NonMaxSuppression 40.856
transformed_inference_graph.pb QuantizedConv2D 13.807

Source code / logs

frozen_inference_graph.pb

native : benchmark_model.cc:382 Graph: [/data/local/tmp/frozen_inference_graph.pb]
native : benchmark_model.cc:383 Input layers: [image_tensor:0]
native : benchmark_model.cc:384 Input shapes: [1,224,224,3]
native : benchmark_model.cc:385 Input types: [uint8]
native : benchmark_model.cc:386 Output layers: [detection_boxes:0,detection_scores:0,detection_classes:0,num_detections:0]
native : benchmark_model.cc:387 Num runs: [50]
native : benchmark_model.cc:388 Inter-run delay (seconds): [-1.0]
native : benchmark_model.cc:389 Num threads: [-1]
native : benchmark_model.cc:390 Benchmark name: []
native : benchmark_model.cc:391 Output prefix: []
native : benchmark_model.cc:392 Show sizes: [0]
native : benchmark_model.cc:393 Warmup runs: [2]
native : benchmark_model.cc:53 Loading TensorFlow.
native : benchmark_model.cc:60 Got config, 0 devices
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : benchmark_model.cc:258 Running benchmark for 2 iterations without detailed stat logging:
native : benchmark_model.cc:286 count=2 first=3273186 curr=1668712 min=1668712 max=3273186 avg=2.47095e+06 std=802237
native : benchmark_model.cc:258 Running benchmark for 50 iterations without detailed stat logging:
native : benchmark_model.cc:286 count=50 first=1687558 curr=1682345 min=1615775 max=1802978 avg=1.69049e+06 std=41851
native : benchmark_model.cc:258 Running benchmark for 50 iterations with detailed stat logging:
============================== Top by Computation Time ==============================
	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	                  Conv2D	  623.929	   57.614	   60.054	  3.120%	  3.120%	   409.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_13_pointwise/convolution
	                  Conv2D	  580.451	   31.382	   34.441	  1.789%	  4.909%	   409.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_12_pointwise/convolution
	                  Conv2D	   49.120	   33.424	   33.211	  1.725%	  6.634%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/convolution
	                  Conv2D	  502.062	   24.714	   28.450	  1.478%	  8.112%	   739.328	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_10_pointwise/convolution
	                  Conv2D	  463.825	   23.862	   27.747	  1.441%	  9.554%	   739.328	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_9_pointwise/convolution
	                  Conv2D	  540.994	   23.782	   27.632	  1.435%	 10.989%	   739.328	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_11_pointwise/convolution
	                  Conv2D	  388.112	   22.603	   27.427	  1.425%	 12.414%	   739.328	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_7_pointwise/convolution
	                  Conv2D	  426.231	   25.427	   27.163	  1.411%	 13.825%	   739.328	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_8_pointwise/convolution
	                  Conv2D	  247.387	   23.893	   26.293	  1.366%	 15.191%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/convolution
	   DepthwiseConv2dNative	   95.498	   26.500	   24.778	  1.287%	 16.478%	  2937.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise
============================== Top by Memory Use ==============================
	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	                  Conv2D	  132.991	   20.703	   20.848	  1.083%	  1.083%	  5760.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/convolution
	   DepthwiseConv2dNative	  219.853	   16.056	   16.815	  0.874%	  1.957%	  3225.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/depthwise
	   DepthwiseConv2dNative	   95.498	   26.500	   24.778	  1.287%	  3.244%	  2937.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise
	                  Conv2D	  247.387	   23.893	   26.293	  1.366%	  4.610%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/convolution
	                  Conv2D	  193.214	   14.730	   16.011	  0.832%	  5.442%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_pointwise/convolution
	                  Conv2D	   49.120	   33.424	   33.211	  1.725%	  7.167%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/convolution
	   DepthwiseConv2dNative	  314.590	    7.253	    7.887	  0.410%	  7.577%	  1653.760	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_5_depthwise/depthwise
	   DepthwiseConv2dNative	  174.672	   11.146	   12.409	  0.645%	  8.221%	  1483.776	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/depthwise
	                  Conv2D	  327.950	   21.425	   23.446	  1.218%	  9.439%	  1478.656	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_5_pointwise/convolution
	                  Conv2D	  293.922	   14.347	   15.109	  0.785%	 10.224%	  1478.656	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_4_pointwise/convolution
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	       NonMaxSuppression	       90	   927.182	    48.239%	    48.239%	    36.000	       90
	                  Conv2D	       34	   488.102	    25.395%	    73.633%	 23526.797	       34
	   DepthwiseConv2dNative	       13	   104.880	     5.457%	    79.090%	 16711.424	       13
	                     Mul	      130	    96.828	     5.038%	    84.127%	     0.000	      130
	                   Slice	       91	    50.248	     2.614%	    86.742%	  1380.240	       91
	                   Split	      180	    38.533	     2.005%	    88.746%	  5464.640	      180
	                     Add	      131	    35.987	     1.872%	    90.619%	     0.004	      131
	                ConcatV2	      107	    33.857	     1.761%	    92.380%	  3934.164	      107
	                  Gather	      546	    27.859	     1.449%	    93.830%	  7229.200	      546
	                   Const	     1979	    23.447	     1.220%	    95.050%	     0.000	     1979
	                   Relu6	       35	    21.407	     1.114%	    96.163%	     0.000	       35
	                 Minimum	      451	     9.308	     0.484%	    96.648%	     0.000	      451
	                   Where	      180	     8.533	     0.444%	    97.092%	  2733.760	      180
	                 Maximum	      360	     7.390	     0.384%	    97.476%	     0.000	      360
	                 Greater	      183	     5.793	     0.301%	    97.777%	   343.303	      183
	                     Sub	      192	     5.494	     0.286%	    98.063%	     0.020	      192
	                    Cast	      182	     5.345	     0.278%	    98.341%	  1968.276	      182
	          ResizeBilinear	        1	     4.663	     0.243%	    98.584%	  1080.000	        1
	                 Reshape	      282	     4.459	     0.232%	    98.816%	     0.000	      282
	            StridedSlice	      102	     3.532	     0.184%	    99.000%	     0.392	      102
	     TensorArrayGatherV3	        1	     2.704	     0.141%	    99.140%	  1080.000	        1
	                 BiasAdd	       12	     2.260	     0.118%	    99.258%	     0.000	       12
	                 Squeeze	       97	     2.234	     0.116%	    99.374%	     0.000	       97
	                 Sigmoid	        1	     1.754	     0.091%	    99.465%	     0.000	        1
	               ZerosLike	       90	     1.734	     0.090%	    99.556%	    36.000	       90
	                   Shape	       99	     1.513	     0.079%	    99.634%	     0.784	       99
	                  Unpack	        5	     1.495	     0.078%	    99.712%	   751.464	        5
	                  TopKV2	        1	     1.259	     0.066%	    99.778%	    72.000	        1
	                    NoOp	        1	     0.689	     0.036%	    99.813%	     0.000	        1
	    TensorArrayScatterV3	        1	     0.631	     0.033%	    99.846%	   602.112	        1
	               Transpose	        2	     0.438	     0.023%	    99.869%	    61.344	        2
	                 RealDiv	        8	     0.296	     0.015%	    99.884%	    15.336	        8
	                  Switch	       20	     0.288	     0.015%	    99.899%	     0.000	       22
	                   Merge	        8	     0.217	     0.011%	    99.911%	     0.032	       10
	                  Assert	        5	     0.210	     0.011%	    99.922%	     0.000	        5
	                Identity	       15	     0.189	     0.010%	    99.932%	     0.000	       15
	                   Enter	        6	     0.179	     0.009%	    99.941%	     0.000	        6
	                    Pack	        6	     0.157	     0.008%	    99.949%	    30.672	        6
	                     Exp	        2	     0.132	     0.007%	    99.956%	     0.000	        2
	              ExpandDims	        7	     0.131	     0.007%	    99.963%	     0.000	        7
	                   Range	        5	     0.113	     0.006%	    99.969%	     0.424	        5
	           TensorArrayV3	        2	     0.112	     0.006%	    99.974%	     0.104	        2
	      TensorArrayWriteV3	        1	     0.056	     0.003%	    99.977%	     0.000	        1
	                    Less	        1	     0.056	     0.003%	    99.980%	     0.001	        2
	                    _Arg	        1	     0.053	     0.003%	    99.983%	     0.000	        1
	           NextIteration	        2	     0.049	     0.003%	    99.986%	     0.000	        2
	                    Fill	        3	     0.049	     0.003%	    99.988%	     0.000	        3
	       TensorArrayReadV3	        1	     0.048	     0.002%	    99.991%	     0.000	        1
	                    Rank	        2	     0.041	     0.002%	    99.993%	     0.008	        2
	                 _Retval	        4	     0.040	     0.002%	    99.995%	     0.000	        4
	                LoopCond	        1	     0.024	     0.001%	    99.996%	     0.000	        2
	       TensorArraySizeV3	        1	     0.022	     0.001%	    99.997%	     0.004	        1
	                   Equal	        1	     0.022	     0.001%	    99.998%	     0.001	        1
	                    Size	        1	     0.016	     0.001%	    99.999%	     0.004	        1
	                    Exit	        1	     0.016	     0.001%	   100.000%	     0.000	        1
Timings (microseconds): count=50 first=1745329 curr=1670092 min=1670092 max=2170221 avg=1.92491e+06 std=189978
Memory (bytes): count=50 curr=67058518(all same)
5683 nodes observed

transformed_inference_graph.pb

native : benchmark_model.cc:382 Graph: [/data/local/tmp/transformed_inference_graph.pb]
native : benchmark_model.cc:383 Input layers: [image_tensor:0]
native : benchmark_model.cc:384 Input shapes: [1,224,224,3]
native : benchmark_model.cc:385 Input types: [uint8]
native : benchmark_model.cc:386 Output layers: [detection_boxes:0,detection_scores:0,detection_classes:0,num_detections:0]
native : benchmark_model.cc:387 Num runs: [50]
native : benchmark_model.cc:388 Inter-run delay (seconds): [-1.0]
native : benchmark_model.cc:389 Num threads: [-1]
native : benchmark_model.cc:390 Benchmark name: []
native : benchmark_model.cc:391 Output prefix: []
native : benchmark_model.cc:392 Show sizes: [0]
native : benchmark_model.cc:393 Warmup runs: [2]
native : benchmark_model.cc:53 Loading TensorFlow.
native : benchmark_model.cc:60 Got config, 0 devices
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : benchmark_model.cc:258 Running benchmark for 2 iterations without detailed stat logging:
native : benchmark_model.cc:286 count=2 first=2688688 curr=1373990 min=1373990 max=2688688 avg=2.03134e+06 std=657349
native : benchmark_model.cc:258 Running benchmark for 50 iterations without detailed stat logging:
native : benchmark_model.cc:286 count=50 first=1405345 curr=1400253 min=1246255 max=1466356 avg=1.37303e+06 std=33711
native : benchmark_model.cc:258 Running benchmark for 50 iterations with detailed stat logging:
============================== Top by Computation Time ==============================
	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	                  Conv2D	   33.710	   36.773	   30.203	  1.931%	  1.931%	  2880.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1
	   DepthwiseConv2dNative	   85.930	   26.702	   23.704	  1.516%	  3.447%	  2937.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise
	         QuantizedConv2D	  140.843	   21.529	   20.738	  1.326%	  4.773%	  5760.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	         QuantizedConv2D	  656.673	   19.257	   19.940	  1.275%	  6.048%	   409.608	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_13_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	   DepthwiseConv2dNative	  274.615	   20.146	   17.493	  1.119%	  7.167%	  3225.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/depthwise
	         QuantizedConv2D	  323.094	   14.819	   16.560	  1.059%	  8.226%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	         QuantizedConv2D	  537.240	   13.757	   14.516	  0.928%	  9.154%	   739.336	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_9_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	         QuantizedConv2D	  473.869	   13.103	   14.275	  0.913%	 10.067%	   739.336	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_7_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	         QuantizedConv2D	  569.591	   13.382	   14.089	  0.901%	 10.968%	   739.336	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_10_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	         QuantizedConv2D	  505.947	   13.133	   13.899	  0.889%	 11.856%	   739.336	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_8_pointwise/BatchNorm/batchnorm/mul_1/eightbit
============================== Top by Memory Use ==============================
	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	            QuantizedAdd	  175.293	   12.215	   11.851	  0.758%	  0.758%	  5760.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/BatchNorm/batchnorm/add_1/eightbit
	         QuantizedConv2D	  140.843	   21.529	   20.738	  1.326%	  2.084%	  5760.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	              Dequantize	  203.684	    5.234	    5.386	  0.344%	  2.428%	  5760.000	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6
	   DepthwiseConv2dNative	  274.615	   20.146	   17.493	  1.119%	  3.547%	  3225.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/depthwise
	   DepthwiseConv2dNative	   85.930	   26.702	   23.704	  1.516%	  5.063%	  2937.600	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise
	            QuantizedAdd	  345.853	    5.552	    4.671	  0.299%	  5.361%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/BatchNorm/batchnorm/add_1/eightbit
	         QuantizedConv2D	  323.094	   14.819	   16.560	  1.059%	  6.420%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/BatchNorm/batchnorm/mul_1/eightbit
	            QuantizedAdd	  309.606	    5.842	    5.277	  0.337%	  6.758%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/BatchNorm/batchnorm/add_1/eightbit
	            QuantizedMul	  298.146	    4.004	    4.408	  0.282%	  7.039%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/BatchNorm/batchnorm/mul_1/eightbit
	            QuantizedAdd	  257.952	    5.945	    5.208	  0.333%	  7.372%	  2880.008	        1	FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_pointwise/BatchNorm/batchnorm/add_1/eightbit
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	       NonMaxSuppression	       90	   637.594	    40.856%	    40.856%	    36.000	       90
	         QuantizedConv2D	       33	   215.470	    13.807%	    54.664%	 20647.061	       33
	   DepthwiseConv2dNative	       13	   106.792	     6.843%	    61.507%	 16711.424	       13
	     RequantizationRange	      283	    93.028	     5.961%	    67.468%	     2.264	      283
	              Requantize	      283	    79.127	     5.070%	    72.538%	 18539.414	      283
	            QuantizedAdd	      130	    71.310	     4.569%	    77.108%	 36965.137	      130
	                   Slice	       91	    36.322	     2.327%	    79.435%	  1380.240	       91
	                   Split	      180	    31.280	     2.004%	    81.440%	  4799.104	      180
	                  Conv2D	        1	    30.202	     1.935%	    83.375%	  2880.000	        1
	                ConcatV2	      107	    27.631	     1.771%	    85.146%	  3601.396	      107
	              QuantizeV2	      386	    27.177	     1.741%	    86.887%	  5043.348	      386
	              Dequantize	      307	    26.689	     1.710%	    88.597%	 25504.588	      307
	                  Gather	      546	    25.502	     1.634%	    90.231%	  6120.800	      546
	            QuantizedMul	      108	    23.188	     1.486%	    91.717%	 15810.112	      108
	          QuantizedRelu6	       35	    17.087	     1.095%	    92.812%	  9224.536	       35
	                     Min	      386	    15.680	     1.005%	    93.817%	     1.544	      386
	                     Max	      386	    15.260	     0.978%	    94.795%	     1.544	      386
	                   Const	      629	    10.047	     0.644%	    95.439%	     0.000	      629
	                   Where	      180	     8.200	     0.525%	    95.964%	  2290.400	      180
	                 Minimum	      451	     8.197	     0.525%	    96.489%	     0.000	      451
	                 Reshape	      566	     7.537	     0.483%	    96.972%	     0.000	      566
	                 Maximum	      360	     6.321	     0.405%	    97.377%	     0.000	      360
	                    Cast	      182	     5.764	     0.369%	    97.747%	  1746.596	      182
	                 Greater	      183	     5.025	     0.322%	    98.069%	   322.505	      183
	                     Sub	      192	     4.745	     0.304%	    98.373%	     0.016	      192
	          ResizeBilinear	        1	     4.461	     0.286%	    98.659%	  1080.000	        1
	            StridedSlice	      100	     3.158	     0.202%	    98.861%	     0.384	      100
	        QuantizedReshape	      102	     2.275	     0.146%	    99.007%	     0.816	      102
	     TensorArrayGatherV3	        1	     2.033	     0.130%	    99.137%	  1080.000	        1
	                 Squeeze	       97	     1.766	     0.113%	    99.250%	     0.000	       97
	               ZerosLike	       90	     1.544	     0.099%	    99.349%	    36.000	       90
	                 Sigmoid	        1	     1.410	     0.090%	    99.439%	     0.000	        1
	        QuantizedBiasAdd	       12	     1.406	     0.090%	    99.530%	   728.556	       12
	                   Shape	       99	     1.394	     0.089%	    99.619%	     0.784	       99
	                  Unpack	        5	     1.381	     0.088%	    99.707%	   751.464	        5
	    TensorArrayScatterV3	        1	     1.003	     0.064%	    99.772%	   602.112	        1
	                  TopKV2	        1	     0.968	     0.062%	    99.834%	    72.000	        1
	               Transpose	        2	     0.344	     0.022%	    99.856%	    61.344	        2
	                  Switch	       20	     0.272	     0.017%	    99.873%	     0.000	       22
	                   Merge	        8	     0.203	     0.013%	    99.886%	     0.032	       10
	                   Enter	        6	     0.191	     0.012%	    99.898%	     0.000	        6
	                    NoOp	        1	     0.184	     0.012%	    99.910%	     0.000	        1
	                Identity	       15	     0.175	     0.011%	    99.921%	     0.000	       15
	                 RealDiv	        6	     0.143	     0.009%	    99.931%	     0.000	        6
	                    Pack	        6	     0.133	     0.009%	    99.939%	    30.672	        6
	           TensorArrayV3	        2	     0.114	     0.007%	    99.946%	     0.104	        2
	              ExpandDims	        7	     0.111	     0.007%	    99.953%	     0.000	        7
	                   Range	        5	     0.097	     0.006%	    99.960%	     0.424	        5
	                     Exp	        2	     0.090	     0.006%	    99.965%	     0.000	        2
	                  Assert	        4	     0.075	     0.005%	    99.970%	     0.000	        4
	      TensorArrayWriteV3	        1	     0.054	     0.003%	    99.974%	     0.000	        1
	                    Less	        1	     0.051	     0.003%	    99.977%	     0.001	        2
	           NextIteration	        2	     0.045	     0.003%	    99.980%	     0.000	        2
	       TensorArrayReadV3	        1	     0.044	     0.003%	    99.983%	     0.000	        1
	                    Fill	        3	     0.044	     0.003%	    99.986%	     0.000	        3
	                    _Arg	        1	     0.042	     0.003%	    99.988%	     0.000	        1
	                 _Retval	        4	     0.037	     0.002%	    99.991%	     0.000	        4
	                    Rank	        2	     0.028	     0.002%	    99.992%	     0.008	        2
	                   Equal	        1	     0.026	     0.002%	    99.994%	     0.001	        1
	                LoopCond	        1	     0.023	     0.001%	    99.996%	     0.000	        2
	                     Add	        1	     0.021	     0.001%	    99.997%	     0.004	        1
	       TensorArraySizeV3	        1	     0.020	     0.001%	    99.998%	     0.004	        1
	                    Exit	        1	     0.015	     0.001%	    99.999%	     0.000	        1
	                    Size	        1	     0.014	     0.001%	   100.000%	     0.004	        1
Timings (microseconds): count=50 first=1463739 curr=1459978 min=1392397 max=1906618 avg=1.56387e+06 std=182975
Memory (bytes): count=50 curr=176072750(all same)
6723 nodes observed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions