Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers by kloudkl · Pull Request #560 · BVLC/caffe

kloudkl · 2014-06-30T03:25:20Z

The spatial pyramid pooling layer [1] mentioned in #548 is the combination of the existing PoolingLayer and ConcatLayer. It automatically computes the sliding windows sizes and strides for the multiple pyramid levels, applies the PoolingLayer on each level, and finally concatenates the outputs of all the levels into fixed-size vectors to feed into classifiers or fully connected layers.

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014

shelhamer · 2014-06-30T18:29:05Z

I appreciate how quickly this contribution has appeared, but this should almost certainly be done by composition and not copy-paste.

For example, consider the within-channel LRN https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp#L31

kloudkl · 2014-07-01T09:23:08Z

Tests passed but the gradient checks were slow.

Cuda number of devices: 0
Current device id: 0
Note: Google Test filter = -*GPU*
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from SpatialPyramidPoolingLayerTest/0, where TypeParam = float
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestSetup
E0701 17:11:59.761909   963 common.cpp:30] Cannot create Cublas handle. Cublas won't be available.
E0701 17:11:59.762352   963 common.cpp:37] Cannot create Curand generator. Curand won't be available.
E0701 17:11:59.762387   963 common.cpp:61] Curand not available. Skipping setting the curand seed.
[       OK ] SpatialPyramidPoolingLayerTest/0.TestSetup (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax (25774 ms)
[----------] 3 tests from SpatialPyramidPoolingLayerTest/0 (25776 ms total)

[----------] 3 tests from SpatialPyramidPoolingLayerTest/1, where TypeParam = double
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/1.TestSetup (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax (25519 ms)
[----------] 3 tests from SpatialPyramidPoolingLayerTest/1 (25519 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (51295 ms total)
[  PASSED  ] 6 tests.

kloudkl · 2014-07-01T09:28:05Z

@bhack, would you like to review if the implementation is consistent with the algorithm described in the section 2.3 of the SPP-net paper?

bhack · 2014-07-01T13:48:49Z

@kloudkl I hope that i can do it this evening or tomorrow.

Yangqing · 2014-07-02T04:25:30Z

(Sorry posted this before seeing recent changes, please ignore my previous post - deleted)

bhack · 2014-07-03T12:53:38Z

@kloudkl I've not compiled the code to deeply trying it but seems that you have simply handled the concat between pooling layer and cumulating loss.
How will be handled the Multisize training? We need to wait for transformation layer #569? Seems that actually different feature request "depends" on data and trasformation separation.

kloudkl · 2014-07-04T03:23:17Z

I don't think multi-size training is blocked by the transformation layers. In the paper, the authors simulated multi-size training with multiple fixed-size networks. As the output vectors of the conv5 layers are pooled into fixed-length features by the SpatialPyramidPooling layer, the networks of different sizes can share the same fully-connected layers as their last layers.

I prefer to follow the path of @moskewcz's #308 DenseNet feature pyramid computation. But their code seems too heavy weight to integrate with the SPP. More likely, I will implement the Caffe version of Torch7's PyramidPacker and PyramidUnpacker to extract features for multiple scales of an images as discussed in #189.
#189 (comment)
#189 (comment)

bhack · 2014-07-04T15:45:47Z

@kloudkl Right

ronghanghu · 2014-07-07T20:56:15Z

src/caffe/layers/spatial_pyramid_pooling_layer.cpp

While this is written in Kaiming's paper, I guess there will be some problems with this pooling approach. For example, if image_side_length == 17 and spatial_bin == 6, then you have kernel_size == 3 and stride == 2, so you actually get 8_8 bins, instead of 6_6 bins. @kloudkl Could you tell me whether I am right?

Hi, @kloudkl I emailed Dr. Kaiming He for details, and he told me that this is how they perform spatial pyramid pooling:

Denote the width and height of the conv5 feature maps (can be the full image or a window) as w and h. For a pyramid level with n_n bins, the (i,j)-th bin is in the range of [floor((i-1)_w/n), ceil(i_w/n)] \* [floor((j-1)_h/n), ceil(j*h/n)].

I copied this PR and currently I am trying to implement a PyramidLevelLayer to implement this pooling behavor, based on the rectangular pooling #614.

Yes, you are right. I realized the problem when I wrote the test cases.

Thank you for contacting the authors for clarification!

I'm solving it now.

And I think the range above includes left border but excludes right border, i.e. [0, 3] contains 0, 1, 2 but not 3.

kloudkl · 2014-07-08T11:28:51Z

To be more faithful to the implementation of the authors of the SPP network paper, the pooling layer is extended to support floating point height and width of the kernel and stride. The 36 test cases of the pooling layer are all passed.

The spatial pooling layer is also tested on both the CPU and the GPU.

[==========] Running 10 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from SpatialPyramidPoolingLayerTest/0, where TypeParam = float
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/0.TestSetup (303 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUForwardMax (5 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestGPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestGPUForwardMax (2 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestCPUGradientMax (85298 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/0.TestGPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/0.TestGPUGradientMax (154611 ms)
[----------] 5 tests from SpatialPyramidPoolingLayerTest/0 (240220 ms total)

[----------] 5 tests from SpatialPyramidPoolingLayerTest/1, where TypeParam = double
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestSetup
[       OK ] SpatialPyramidPoolingLayerTest/1.TestSetup (0 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestGPUForwardMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestGPUForwardMax (1 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestCPUGradientMax (85730 ms)
[ RUN      ] SpatialPyramidPoolingLayerTest/1.TestGPUGradientMax
[       OK ] SpatialPyramidPoolingLayerTest/1.TestGPUGradientMax (165432 ms)
[----------] 5 tests from SpatialPyramidPoolingLayerTest/1 (251165 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 2 test cases ran. (491385 ms total)
[  PASSED  ] 10 tests.

kloudkl · 2014-07-10T08:59:33Z

Classification accuracy on the VOC 2012 dataset:

Pooling layer after the conv5 layer	Accuracy (%)
max pooling	71.5
spatial pyramid max pooling	68.3

The spatial pyramid pooling layer consists of four pyramid levels each of which respectively splits the images into 1 , 2, 3, 6 patches evenly in both the vertical and horizontal directions.

kloudkl · 2014-07-12T18:13:59Z

The SPP-net performed worse as the fully connected layer after the last convolution layer has larger dimensions with the reference imagenet model. Its parameters were randomly initialized and caused over-fitting on the relatively small VOC 2012 dataset. If it is first fine-tuned on a much larger dataset, its perfermance will certainly be superior as described in the paper.

SophieZhou · 2014-09-29T06:50:22Z

examples/voc2012-spatial-pyramid-pooling/voc2012_finetune_spatial_pyramid_pooling_test.prototxt

I am wondering that, the voc2012 classification has multiple labels, how to do leveldb?

HDF5DataLayer

sanchom · 2014-11-27T01:50:45Z

What's going on with this? Can I help?

kloudkl · 2014-12-01T09:52:00Z

This algorithm involves some very complicated corner cases. For example, a candidate region in the original image may be mapped into a very small region with the width or height equal to or smaller than 1 pixel. It's very hard to detect objects whose sizes are small relative to the image.

GoogLeNet combined with RCNN is a much more robust but much slower solution.

In practice, you may find the object detectors included in the latest OpenCV quite handy for most use cases if you are required to quickly complete a project.

sanchom · 2014-12-01T17:06:08Z

@kloudkl I'm interested in helping. Maybe we can chat about what is holding up this PR. How can we do that?

shelhamer · 2015-03-12T06:24:55Z

Closing since this PR is abandoned and the code is non-compositional. This is better achieved through layer composition with Pooling and Concat layers than duplicating implementations.

There is an expected replacement: spatial pyramid pooling has been given to a student as Caffe practice.

ghost · 2015-05-08T04:54:04Z

@shelhamer Can you please update us on the current status of this?

shelhamer · 2015-05-11T01:24:39Z

See #2177 for spatial pyramid pooling.

kloudkl changed the title ~~Add the layer template changes for the SpatialPyramidPoolingLayer~~ Implement SpatialPyramidPoolingLayer with PoolingLayer and ConcatLayer Jun 30, 2014

kloudkl mentioned this pull request Jul 1, 2014

Concat falls back to copy when there is only a single bottom blob #570

Closed

kloudkl changed the title ~~Implement SpatialPyramidPoolingLayer with PoolingLayer and ConcatLayer~~ Implement SpatialPyramidPoolingLayer with the Split, Pooling, Flatten & Concat layers Jul 1, 2014

kloudkl mentioned this pull request Jul 3, 2014

Implement generic object detector #601

Closed

ronghanghu reviewed Jul 7, 2014
View reviewed changes

ronghanghu mentioned this pull request Jul 8, 2014

PyramidLevelLayer for spatial pyramid pooling [under development, don't merge] #641

Closed

kloudkl added 12 commits July 10, 2014 10:08

Composite SpatialPyramidPooling with Split, Pooling, Flatten & Concat

5ea99c3

Pooling layer allows float heights and widths for kernels and strides

c0bbc40

Support non-square float kernel and stride in spatial pyramid pooling

0d5a65d

Add more spatial bins to test the SpatialPyramidPoolingLayer

9498df0

Test the pooling layers with float kernels and strides

d5006af

Test the spatial pyramid pooling layer with float kernels and strides

03a6a00

Simplify the verbose assignment & comparison in pooling layer tests

a1ec93e

Improve computing the pooled height and width in the pooling layer

8a81bfd

Avoid the verbose assignment & comparison in spatial pooling layer tests

652d2c8

Add example network definitions using the spatial pyramid pooling layer

863fa57

Compute more accurate average pooling sizes in the PoolingLayer

2d0bb1e

Add spatial pyramid pooling in the directory name of examples/voc2012

d7d6662

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

shelhamer force-pushed the dev branch from 64258b6 to 403b56b Compare September 19, 2014 04:38

SophieZhou reviewed Sep 29, 2014
View reviewed changes

shelhamer mentioned this pull request Oct 2, 2014

Allow Pooling over the size of Bottom #1210

Closed

shelhamer force-pushed the dev branch from d8eb4df to 914da95 Compare October 8, 2014 16:36

sergeyk force-pushed the dev branch from 2fb4c97 to 1718903 Compare October 17, 2014 18:44

futurely mentioned this pull request Oct 21, 2014

Should pooling regions be identical to convolution regions? #1318

Closed

shelhamer added the ES label Mar 10, 2015

shelhamer closed this Mar 12, 2015

shelhamer added the abandoned label Mar 12, 2015

Conversation

kloudkl commented Jun 30, 2014

Uh oh!

shelhamer commented Jun 30, 2014

Uh oh!

kloudkl commented Jul 1, 2014

Uh oh!

kloudkl commented Jul 1, 2014

Uh oh!

bhack commented Jul 1, 2014

Uh oh!

Yangqing commented Jul 2, 2014

Uh oh!

bhack commented Jul 3, 2014

Uh oh!

kloudkl commented Jul 4, 2014

Uh oh!

bhack commented Jul 4, 2014

Uh oh!

ronghanghu Jul 7, 2014

Choose a reason for hiding this comment

Uh oh!

ronghanghu Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

kloudkl Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

kloudkl Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

ronghanghu Jul 8, 2014

Choose a reason for hiding this comment

Uh oh!

kloudkl commented Jul 8, 2014

Uh oh!

kloudkl commented Jul 10, 2014

Uh oh!

kloudkl commented Jul 12, 2014

Uh oh!

SophieZhou Sep 29, 2014

Choose a reason for hiding this comment

Uh oh!

kloudkl Oct 13, 2014

Choose a reason for hiding this comment

Uh oh!

sanchom commented Nov 27, 2014

Uh oh!

kloudkl commented Dec 1, 2014

Uh oh!

sanchom commented Dec 1, 2014

Uh oh!

shelhamer commented Mar 12, 2015

Uh oh!

ghost commented May 8, 2015

Uh oh!

shelhamer commented May 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments