Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-411] Add ROI Align#10852

Merged
piiswrong merged 20 commits intoapache:masterfrom
zhanghang1989:roi
May 23, 2018
Merged

[MXNET-411] Add ROI Align#10852
piiswrong merged 20 commits intoapache:masterfrom
zhanghang1989:roi

Conversation

@zhanghang1989
Copy link
Copy Markdown
Contributor

@zhanghang1989 zhanghang1989 commented May 8, 2018

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • ROI Align from Caffe2
  • docs and unit test

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@piiswrong
Copy link
Copy Markdown
Contributor

Please add tests and documentation

@xinyu-intel
Copy link
Copy Markdown
Contributor

Hi, can you enable openmp on the cpu implementation of roialign. This can achieve a better performance. You can reference my pr in #9958.

@chinakook
Copy link
Copy Markdown
Contributor

caffe2 has a cpp test. I think Caffe2’s roi align op is written by Kaiming He.

@wkcn
Copy link
Copy Markdown
Member

wkcn commented May 12, 2018

The ROIAlign in this PR is ROIAlign(max).
And the ROIAlign in Caffe2(forward, backward) is ROIAlign(mean)

@zhanghang1989 zhanghang1989 requested a review from szha as a code owner May 15, 2018 23:42
@wkcn
Copy link
Copy Markdown
Member

wkcn commented May 16, 2018

Hi. I wrote a ROIAlign forward/backward test. It may be useful.
https://github.com/wkcn/MobulaOP/blob/master/tests/test_roi_align_op.py

And I found the implement of CPU/GPU is different. I think it's better to add cpu/gpu consistence test.

@zhanghang1989
Copy link
Copy Markdown
Contributor Author

@szha szha requested review from zhreshold and removed request for szha May 16, 2018 19:08
@zhanghang1989
Copy link
Copy Markdown
Contributor Author

@piiswrong @zhreshold Could you please review this PR for adapting ROI Aligh from Caffe2. Thanks!

Comment thread src/operator/contrib/roi_align.cu Outdated
i += blockDim.x * gridDim.x)

// The number of cuda threads to use. 512 is used for backward compatibility
constexpr int ROI_CUDA_NUM_THREADS = 512;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use mshadow::cuda::kMaxThreadsPerBlock might provide better perf on newer opus possibly?
Use mshadow::cuda::CheckLaunchParam to help check the launch limits

Comment thread src/operator/contrib/roi_align.cu Outdated
}

/*
template <typename T>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these

check_numeric_gradient(sym=test, location=[x1, x2],
grad_nodes={'data':'add', 'rois':'null'},
numeric_eps=1e-4, rtol=1e-1, atol=1E-4)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a forward result check in addition to gradient check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will a forward result check soon 👍

@zhanghang1989
Copy link
Copy Markdown
Contributor Author

@piiswrong @zhreshold I have added the unit tests. Please see the updates. Thanks!


#define START_IND(a, b, c) static_cast<int>(floor(static_cast<float>(a * c) / b))
#define END_IND(a, b, c) static_cast<int>(ceil(static_cast<float>((a + 1) * c) / b))
#define START_IND(a, b, c) static_cast<int>(floor(static_cast<real>(a * c) / b))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment thread src/operator/contrib/roi_align.cc Outdated
T roi_start_h = offset_bottom_rois[1] * spatial_scale;
T roi_end_w = offset_bottom_rois[2] * spatial_scale;
T roi_end_h = offset_bottom_rois[3] * spatial_scale;
// T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

int rois_cols) {
DCHECK(rois_cols == 4 || rois_cols == 5);

for (int index = 0; index < nthreads; index++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use openmp?

Copy link
Copy Markdown
Contributor Author

@zhanghang1989 zhanghang1989 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to use single threading in backward, due to no atomic add using CPU. We could assume no one would use cpu to train the model :)

// (n, c, ph, pw) is an element in the pooled output
// can be parallelized using omp
// #pragma omp parallel for num_threads(32)
for (int n = 0; n < n_rois; n++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openmp?

Comment thread src/operator/contrib/roi_align.cu Outdated
const DType *bottom_rois = in_data[0].dptr<DType>();
DType *grad_in = outputs[0].dptr<DType>();

if (kAddTo == req[roialign::kData] || kWriteTo == req[roialign::kData]) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return if NullOp before the switch?

Comment thread src/operator/contrib/roi_align-inl.h Outdated
DMLC_DECLARE_PARAMETER(ROIAlignParam) {
DMLC_DECLARE_FIELD(pooled_size)
.set_expect_ndim(2).enforce_nonzero()
.describe("fix pooled size: (h, w)");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's pooled_size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output roi feature sizes. Name is compatible with ROIPooling

Comment thread src/operator/contrib/roi_align-inl.h Outdated
};


struct ROIAlignGrad {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this struct. Use lambda directly at set_attr

Comment thread src/operator/contrib/roi_align.cc Outdated
// (n, c, ph, pw) is an element in the pooled output
// can be parallelized using omp
int n;
#pragma omp parallel for private(n) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding omp. Regarding to NUM_OF_ROIS and CHANNELS, I found laster is more and more bigger than former in ROIPooling, so I just apply omp on channels to achieve better application performance. Can you help benchmark the performance based on the usually roi_align size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting removing this omp?

&pre_calc);

int c;
#pragma omp parallel for private(c) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping this ?

@zhanghang1989
Copy link
Copy Markdown
Contributor Author

Remove OMP in backward, due to no atomic add in cpu.

@zhreshold
Copy link
Copy Markdown
Member

Gathering status, did anyone have unresolved issues?
Trying to merge this PR ASAP as it is the required operator for gluon-cv.

Comment thread src/operator/contrib/roi_align-inl.h Outdated
DMLC_DECLARE_PARAMETER(ROIAlignParam) {
DMLC_DECLARE_FIELD(pooled_size)
.set_expect_ndim(2).enforce_nonzero()
.describe("ROI Align output roi featuremap height and width: (h, w)");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

featuremap -> feature map

Comment thread src/operator/contrib/roi_align.cc Outdated

NNVM_REGISTER_OP(_contrib_ROIAlign)
.describe(R"code(
ROI Align Layer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous. Remove this line

@zhanghang1989
Copy link
Copy Markdown
Contributor Author

zhanghang1989 commented May 22, 2018

I should have addressed most of the reviews. Please let me know if there are any further comments. Thanks! Related Gluon-CV PR dmlc/gluon-cv#140

@zhanghang1989
Copy link
Copy Markdown
Contributor Author

Finally it passes the CI :) @piiswrong

@piiswrong piiswrong merged commit 98e671e into apache:master May 23, 2018
width,
pooled_height,
pooled_width,
-1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling_rate missing in ROIAlignParam

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it uses the adaptive size. I will make it as an option

jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018
* add roi align

* lint

* cpu gpu forward consistent

* roi align from caffe2

* rois and unit-test

* for cpplint

* use pointer instead of reference for lint

* fix

* add docs

* fix vector

* more unit test

* using mshadow

* omp

* omp on channels

* remove omp due to no cpu atomic add

* use lambda func for grads

* knullop return
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* add roi align

* lint

* cpu gpu forward consistent

* roi align from caffe2

* rois and unit-test

* for cpplint

* use pointer instead of reference for lint

* fix

* add docs

* fix vector

* more unit test

* using mshadow

* omp

* omp on channels

* remove omp due to no cpu atomic add

* use lambda func for grads

* knullop return
@zhanghang1989 zhanghang1989 deleted the roi branch June 5, 2018 00:51
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* add roi align

* lint

* cpu gpu forward consistent

* roi align from caffe2

* rois and unit-test

* for cpplint

* use pointer instead of reference for lint

* fix

* add docs

* fix vector

* more unit test

* using mshadow

* omp

* omp on channels

* remove omp due to no cpu atomic add

* use lambda func for grads

* knullop return
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants