Average pooling with padding - behavior on edges by oscarriddle · Pull Request #6296 · BVLC/caffe

oscarriddle · 2018-03-16T03:44:33Z

Hi,

I'm currently working on converting a TensorFlow trained InceptionV3 model to Caffe model, because I want to test the inference performance under Caffe. I've almost got there, during which several interesting issues are encountered so here is the one.

The average pooling layer calculates different output compared to TensorFlow, when padding is enabled.

Observation:
If the padded elements are contained within current pooling kernel, caffe will compute the average of all the inputs, including the padded values(which are "0"s). Hence its result gets smaller and severely affected by those invalid "0"s.

Root-Cause:
The root-cause is that no matter whether the padding exists in current computing kernel or not, the pooling_size is fixed to the parameter "kernel_size", like 9 for 3x3 kernel.

Here is part of debugging output within every iteration in AVE pooling:
(Input_size 35x35, kernel_size 3x3, padding 1)
pooled_height_:35, pooled_width_:35 stride_h_:1, stride_w_:1 pad_h_:1, pad_h_:1 height_:35, width_:35 hstart:0, hend:2 wstart0, wend:2 pool_size9 <-Here-> top_data0.730075 pooled_height_:35, pooled_width_:35 stride_h_:1, stride_w_:1 pad_h_:1, pad_h_:1 height_:35, width_:35 hstart:0, hend:2 wstart0, wend:3 pool_size9 <-Here-> top_data0.913562 pooled_height_:35, pooled_width_:35 stride_h_:1, stride_w_:1 pad_h_:1, pad_h_:1 height_:35, width_:35 hstart:0, hend:2 wstart1, wend:4 pool_size9 <-Here-> top_data0.615629 pooled_height_:35, pooled_width_:35 stride_h_:1, stride_w_:1 pad_h_:1, pad_h_:1 height_:35, width_:35 hstart:0, hend:2 wstart2, wend:5 pool_size9 top_data0.370382

Note that the hstart, hend, wstart, wend are updating every iteration, but pool_size stays at 9.

Analysis:
Generally speaking, average pooling shan't takes invalid padded values (like 0), because it will pollute the feature map and downgrade the significance of the edge elements.

So I dive into the pooling_layer.cpp, and find below codes for calculating the variable "pool_size".

case PoolingParameter_PoolMethod_AVE:
    for (int i = 0; i < top_count; ++i) {
      top_data[i] = 0;
    }
    // The main loop
    for (int n = 0; n < bottom[0]->num(); ++n) {
     for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) {
          for (int pw = 0; pw < pooled_width_; ++pw) {
            int hstart = ph * stride_h_ - pad_h_;
            int wstart = pw * stride_w_ - pad_w_;
            int hend = min(hstart + kernel_h_, height_ + pad_h_);
            int wend = min(wstart + kernel_w_, width_ + pad_w_);
            int pool_size = (hend - hstart) * (wend - wstart);  //<-Here->
            hstart = max(hstart, 0);
            wstart = max(wstart, 0);
            hend = min(hend, height_);
            wend = min(wend, width_);
            for (int h = hstart; h < hend; ++h) {
              for (int w = wstart; w < wend; ++w) {
                top_data[ph * pooled_width_ + pw] +=
                    bottom_data[h * width_ + w];
              }
            }
            top_data[ph * pooled_width_ + pw] /= pool_size; //<-Here->
          }
        }
        // compute offset
        bottom_data += bottom[0]->offset(0, 1);
        top_data += top[0]->offset(0, 1);
      }
    }
    break;

Obviously, the variable "pool_size" determines the valid number of elements in each iteration. I can get the designer's idea is to update the pool_size along with kernel sliding but not fix it to a value. However, it's been initialized before the "hstart, hend, wstart, wend" getting updated. So the pool_size couldn't be updated eventually.

Solution:
My modification is quite simple, but I think it's very important.

Hope to hear some comment from your side.

Thanks,
Xiaolun Cao

Noiredd · 2018-03-17T15:34:03Z

Copy of @oscarriddle's comment in another thread:

I tried to compare the output values of average pooling layer between Caffe and TensorFlow, and find Caffe's AVE pooling result is always smaller. From the source code, I located the issue that the variable "pool_size", which is the denominator of averaging, is fixed to the kernel size (like 9 for 3x3).
Hence if the pooling kernel overlaps some padded elements (for instance, 0), the averaged value becomes smaller. While I think the correct pooling method shall only calculate those valid elements.
This is a quite simple issue but also significant. Because if someone's network implements many average pooling layers within convolutions, the averaged outputs always lost the feature map information which is downgrading the significance of those elements around the edges.

I will take a closer look at this soon. In the meantime, please review your code again - there are several tests that this change fails to pass.

Previous implementation caused FP overflow for x less than -90

… the gradeint computation robust, much like SoftmaxWithLoss layer (see: http://stackoverflow.com/a/34917052/1714410 for more information). (2) supporting loss along axis

The instructions say that MKL is free for students, but as of 8/2015, MKL is free for everyone with community licensing.

@longjon

As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays. It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this: ```python datum = caffe.io.array_to_datum(np.array([[[0.9]]])) caffe.io.datum_to_array(datum) # array([[[ 0.9]]]) datum_str = datum.SerializeToString() new_datum = caffe.proto.caffe_pb2.Datum() new_datum.ParseFromString(datum_str) caffe.io.datum_to_array(new_datum) # array([[[ 0.89999998]]]) ``` This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB). Alternative solutions: * Require and return float32, consistent with the protobuf representation. * Change the protobuf to allow float32 or float64 and update surrounding code to support this.

With reference to this commit: f1a8470 This fix changes some EXPECT_EQ into EXPECT_FLOAT_EQ .

Imported from Debian Package caffe (1.0.0~rc3+20160715-g42cd785-2).

…ted caffe target This is the first step towards "modern" IMPORTED-targets-only CMake setup. The find_package modules still need to be rewritten and upstreamed in form of config exports where possible.

Despite Caffe itself does not use OpenMP, explicitly linking to OpenMP should be done when one statically links to a BLAS library which uses OpenMP internally and does not provide proper CMake imported targets with proper dependencies (nobody this so far).

a few layers make use of otherwise unused diffs to accumulate results, but unless the diffs are cleared in forward this contaminates the gradients when these layers share a bottom and their backward is skipped.

Loading weights is moved from caffe.exe to solver class, so new "weights" solver parameter is used not only from command line but when caffe is used as library (including python) corrected formatting fixed line length more formatting corrected

…points to a directory. See issue BVLC#6110 proposed improvement No.2

draw_net.py refactoring and optional LR visualization * refactoring `get_layer_label` rewrote the function body to make it more streamlined. does not affect inputs and outputs * optionally visualize LR when drawing the network adds an option to `python/draw_net.py` that allows to visualize information about the learning rate multiplier (if relevant) when drawing the network's graph.

…nto fix_ave_pool

oscarriddle · 2018-03-18T11:39:19Z

Hi,

I checked the Travis CI failed log. The compilation is successfully done but failed at the running test of AVE pooling layer, looks like the output values aren't equal to expected values.

Failed log:

[  FAILED  ] 4 tests, listed below:
[  FAILED  ] LRNLayerTest/0.TestForwardWithinChannel, where TypeParam = caffe::CPUDevice<float>
[  FAILED  ] LRNLayerTest/1.TestForwardWithinChannel, where TypeParam = caffe::CPUDevice<double>
[  FAILED  ] PoolingLayerTest/0.TestForwardAve, where TypeParam = caffe::CPUDevice<float>
[  FAILED  ] PoolingLayerTest/1.TestForwardAve, where TypeParam = caffe::CPUDevice<double>

Besides, not quite sure what happened to LRNLayer test.
Could you help take a closer check on this? If I guess right, the tests of relevant layers also need to be redesigned.

Thanks,
Xiaolun Cao

oscarriddle · 2018-03-19T07:21:21Z

Hi,

Depending on this fix, I successfully converted a TensorFlow trained InceptionV3 model to a caffe model, and get the exact same final inference prediction result compared to the the TensorFlow.

Now I believe that TensorFlow model can definitely be converted to Caffe with almost no accuracy sacrifice (loss around 0.0001%). At the meantime, InceptionV3 network can also runs on Caffe, too! I did it.

Next step, I will try to evaluate the inference performance difference between the TF and Caffe.

Thanks,
Xiaolun Cao

Noiredd · 2018-03-19T12:07:36Z

The LRN layer creates some layers internally, among them the PoolingLayer; so if we break one, the other fails too. Looks like you're right: the relevant tests would have to be redesigned as well - not only for the Pooling, but LRN layer as well.

At this point the question arises, whether such a change doesn't break existing models. Maybe this should come as an optional behavior, with a boolean switch defined in caffe.proto, to allow the users to choose the layer's behavior at the edges? See #6282 for the same idea applied to a different problem.

PS: this PR's commit history became quite weird: looks like you're trying to merge back some 200 commits that are already in master.

oscarriddle · 2018-03-19T12:32:15Z

OK, that makes sense.

I'm not quite familiar with Pull Request, I tried git pull on this branch so those existed commits are somehow merged into this branch. Maybe I should start over from a clean branch.

That's a question indeed, an optional toggle seems good for now. Let me have a try on my local repo, and let's have a talk again when everything goes smoothly.

Thanks,
Xiaolun Cao

Noiredd · 2018-03-20T11:19:44Z

Closing, we'll continue in #6303.

oscarriddle added 2 commits March 16, 2018 11:04

fix pooling_layer.cpp, pool_size is wrongly calculated

a19bb67

update

b219541

oscarriddle mentioned this pull request Mar 16, 2018

[pooling_layer.cpp][AVE POOL] The variable of "pool_size" is wrongly calculated, which caused the average pooling layer output is wrong #6297

Closed

ih4cku and others added 26 commits March 18, 2018 18:50

register a dummy reducer to prevent mincepie runtime error

33510ae

NetSpec: type-check Function inputs (they must be Top instances)

5046291

Scope macros inside switch

2c5a0db

sigmoid fix (cu)

a01e07a

Previous implementation caused FP overflow for x less than -90

sigmoid fix (cpp)

dc2ca3a

Previous implementation caused FP overflow for x less than -90

upgrading InfogainLoss layer: (1) incorporating Softmax layer to make…

d836615

… the gradeint computation robust, much like SoftmaxWithLoss layer (see: http://stackoverflow.com/a/34917052/1714410 for more information). (2) supporting loss along axis

Update info about MKL licensing

3b26bde

The instructions say that MKL is free for students, but as of 8/2015, MKL is free for everyone with community licensing.

small bug in pooling_layer.cu

7cb34df

avoid divide by zeros, suggested by SeanBell

40128c8

using GNUInstallDirs in root cmake file

1cfbb7b

fix install path with GNUInstallDir support

7ead4c0

fix install path with GNUInstallDir support

be1e523

fix install path with GNUInstallDir support

2b1a322

Fix glog upstream autoconf

4d22bf8

add layer_dict to the python interface

3761f4e

add tests for pycaffe's layer_dict

4d78785

add in sudo make uninstall for cmake

9d0c5f6

Fix more float comparison precision issue

4fcafd9

With reference to this commit: f1a8470 This fix changes some EXPECT_EQ into EXPECT_FLOAT_EQ .

Import bash completion script for caffe from Debian Package.

2608a1d

Imported from Debian Package caffe (1.0.0~rc3+20160715-g42cd785-2).

small improments in compute_image_mean

d074408

cmake: fix usage of INCLUDE_DIR/INCLUDE_DIRS in Dependencies.cmake

fc99162

cmake/Templates: properly spell OpenCV CMake config file name

d4821a5

cmake: refactor deps detection, specify all dependencies in the expor…

a23b28a

…ted caffe target This is the first step towards "modern" IMPORTED-targets-only CMake setup. The find_package modules still need to be rewritten and upstreamed in form of config exports where possible.

net.cpp: do not include test/test_caffe_main.hpp

61a1979

YaYaB and others added 22 commits March 18, 2018 18:50

Add check values of gamma and stepsize to avoid unexplained core dump

bd1f0a0

Cuda.cmake: Fix a typo in a comment

b8e34a9

Simplify pip invocation.

9b3b692

docs: switch to official AWS AMI

61de5df

clear scratch use of loss bottom diffs

77c1ab2

clear scratch use of accuracy bottom diff

c5feeea

explain use of scratch diffs in comments

eec6424

a few layers make use of otherwise unused diffs to accumulate results, but unless the diffs are cleared in forward this contaminates the gradients when these layers share a bottom and their backward is skipped.

corrected description of set_transpose in io.py

240dc10

Update Classifier and Detector to avoid deprecation warning

23d7ab1

Automatic replacement of snapshot_prefix parameter if it is empty or …

6ca3c7b

…points to a directory. See issue BVLC#6110 proposed improvement No.2

bug fix: ext should not include the '.'

1ea7396

Fix compatibility for ND convolution

f2af684

Remove legacy tools

cd073de

1D blob handling in MSRA/Xavier fillers

ae10b06

Filler testing overhaul

257bac9

bilinear filter test refactor

7ab8f44

fix pooling_layer.cpp, pool_size is wrongly calculated

1cf3c39

update

d47bcba

Merge branch 'fix_ave_pool' of https://github.com/oscarriddle/caffe i…

ca3d7f7

…nto fix_ave_pool

update fix_ave_pool in GPU

ad48ce5

oscarriddle mentioned this pull request Mar 20, 2018

Add an optional switch to control edge behavior of average pooling #6303

Open

Noiredd changed the title ~~[pooling_layer.cpp][AVE Pool] The pool_size is wrongly calculated~~ Average pooling with padding - behavior on edges Mar 20, 2018

Noiredd closed this Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Average pooling with padding - behavior on edges#6296

Average pooling with padding - behavior on edges#6296
oscarriddle wants to merge 205 commits intoBVLC:masterfrom
oscarriddle:fix_ave_pool

oscarriddle commented Mar 16, 2018 •

edited

Loading

Uh oh!

Noiredd commented Mar 17, 2018

Uh oh!

oscarriddle commented Mar 18, 2018 •

edited

Loading

Uh oh!

oscarriddle commented Mar 19, 2018

Uh oh!

Noiredd commented Mar 19, 2018

Uh oh!

oscarriddle commented Mar 19, 2018

Uh oh!

Noiredd commented Mar 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

oscarriddle commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noiredd commented Mar 17, 2018

Uh oh!

oscarriddle commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscarriddle commented Mar 19, 2018

Uh oh!

Noiredd commented Mar 19, 2018

Uh oh!

oscarriddle commented Mar 19, 2018

Uh oh!

Noiredd commented Mar 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

oscarriddle commented Mar 16, 2018 •

edited

Loading

oscarriddle commented Mar 18, 2018 •

edited

Loading