New tutorial on python datalayers and multilabel classification by beijbom · Pull Request #3471 · BVLC/caffe

beijbom · 2015-12-21T07:48:59Z

The goal of this tutorial is twofold. First, it illustrates how to write python datalayers. Both syncronous and asyncronous. Second, it clarifies a common misconception that caffe "doesn't support multi-label classification". It does, and the correct loss layer is SigmoidCrossEntropyLoss.

wangg12 · 2015-12-21T08:26:11Z

How about the evaluation metrics of multilabel classification like the accuracy of the multi-label one?

beijbom · 2015-12-21T16:46:00Z

hey Wangg12, I use the hamming distance during test-time directly in the notebook, but did not put it in a python layer. Is that what you refer to?

seanbell · 2015-12-21T17:21:32Z

examples/pycaffe/layers/pascal_multilabel_datalayers.py

You should cite the URL (https://github.com/rbgirshick/fast-rcnn) and tech report (http://arxiv.org/abs/1504.08083) that accompanies Fast R-CNN. The current citation isn't quite enough.

citation updated. thanks for the pointer!

wangg12 · 2015-12-22T05:32:25Z

Hi, @beijbom

According to Multi-label_classification,the hamming loss and multi-label version accuracy are equivalent to some extent.

In my view, hamming distance cares more about the difference between prediction and groundtruth, while the multi-label version accuracy focuses more on how much groundtruth and top-k prediction are the same.(Correct me if I'm wrong.)

In your implementation, hamming distance is along all prediction larger than 0 which means we need to choose a proper threshold(e.g. 0) for the positive labels. Maybe the multi-label version top-k accuracy is more flexible like the single label one.

So if I want to use the multi-label version accuracy during test, could you give me some instructions on how to write this with python layer?

Thanks!

kli-casia · 2016-01-26T04:29:45Z

Great, I have been looking for python data layer for a long time.
Thank you very much. @beijbom

lukeyeager · 2016-01-26T17:29:24Z

This tutorial may also be useful. It's not a data layer, but it's a bit more beginner-friendly.

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/examples/python-layer/README.md

kli-casia · 2016-01-27T01:34:42Z

@lukeyeager Thank you very much, I will check it out. o(^▽^)o

By the way, is there an example of python layer with parameters, for example,
how to reimplement fully connected layer using python layer?
Thank you.

shelhamer · 2016-02-02T00:51:54Z

examples/pascal_multilabel_with_datalayer_tools.py

I think this would be at home as caffe/examples/pycaffe/tools.py to make it more clearly related to pycaffe examples.

sounds good. I just moved it.

elezar · 2016-02-17T14:57:38Z

examples/pycaffe/layers/pascal_multilabel_datalayers.py

+from threading import Thread
+from PIL import Image
+
+from pascal_multilabel_with_datalayer_tools import SimpleTransformer


Should this not now be "from tools import SimpleTransformer"?

You are right, I'll fix that. Thanks!

@beijbom I have submitted a pull request to your personal branch that also addresses this. Please have a look.

By the way, the tutorial is great! I was able to reproduce it with my multi-label data layer that I am working on for #3653 as well (still need to push the lastest changes).

Thanks a ton Evan, very nicely done. I'm not sure the new layer code is ideal from a tutorial perspective though. While certainly more professional, I think it obscures things a bit, and the reader have to jump back and forth to follow the line of execution. I tend to favour showing people the easiest, dumbest, and most straight-forward way of doing things, and let them take it from there. Does that makes sense? What do other folks think? @shelhamer?

Now that you mention it, I might have gotten a bit carried away.

I was thinking about it this morning again, and I don't think a base class is necessarily required, but I would still think that using the BatchLoader in both the sync and async cases is a good idea as it clarifies what the steps in the process are, and users can "drill down" further if they need more info as to what actually happens when loading an image.

Ok, sure, that makes sense. That's a good compromize. :). Do you want to make a new pull requests, or should I edit yours?

I don't think that I'm going to get much of a chance to have a look this weekend. If it can wait until Monday, then I don't mind doing it myself. Otherwise, feel free to edit it.

…ication.

Fix some typos. Correct imports. Refactor data layers. Apply PEP8 formatting.

[example] tutorial on python data layers and multilabel classification

shelhamer · 2016-03-01T18:32:30Z

Thanks for the multi-label tutorial Oscar!

Potential follow-ups for the willing:

train the defined multi-label CaffeNet classifier to convergence on PASCAL to have a more exciting conclusion and yet another model for the zoo
switch to proto solver definition like in the fine-tuning notebook
add more explanation of classification/multi-class classification/multi-label classification with pointers to the relevant losses
include a pre-fetching Python data layer as an IO optimization (this should be another data layer to make this easier on beginners)

beijbom · 2016-03-01T22:13:32Z

tnx Evan!

I'll revisit some of the follow-ups after ECCV.

The tutorial doesn't appear right on http://caffe.berkeleyvision.org/. Do
we need to take further action to make that happen? Perhaps I missed
setting one of the ipython notebook metadata fields?

I did set:
"include_in_docs": true,

but are there others?

cheers

On Tue, Mar 1, 2016 at 10:33 AM, Evan Shelhamer notifications@github.com
wrote:

Thanks for the multi-label tutorial Oscar!

Potential follow-ups for the willing:

train the defined multi-label CaffeNet classifier to convergence on
PASCAL to have a more exciting conclusion and yet another model for the zoo

switch to proto solver definition like in the fine-tuning notebook

add more explanation of classification/multi-class
classification/multi-label classification with pointers to the relevant
losses

include a pre-fetching Python data layer as an IO optimization (this
should be another data layer to make this easier on beginners)

—
Reply to this email directly or view it on GitHub
#3471 (comment).

shelhamer · 2016-03-01T23:42:50Z

I had to push the new site from the latest master. It's there now!

kli-casia · 2016-04-19T07:38:49Z

Hi, @beijbom . Why did you remove the asyncronous layer?

In my opinion, It's benifical to prefetch a batch for next iteration when caffe is running.

For the syncronous layer, caffe have to waite for batch loadiing, right?

Thanks!

beijbom · 2016-04-19T21:08:11Z

hi @kli-nlpr! We decided to remove it because we didn't see any significant time savings (I agree that it SHOULD make sense, it just didn't :). It was unclear whether it was a implementation or hardware issue. Did you notice any time saving?

kli-casia · 2016-04-20T00:54:03Z

Hi, @beijbom I didn't test it, I just wondering why you delete it.
Does this mean that when writing python data layer, one don't have to consider prefetching?

elezar · 2016-04-21T07:01:17Z

@kli-nlpr If you are really interested in performance, you can have a look at #3653, which adds multi-label support to the standard image data layer. This already implements batch prefetching and is quite a bit faster than the python layers here. Some testing and feedback would be welcome, and if there is more interest from the community, then maybe the PR could get merged sooner. I tried to reproduce @beijbom's pascal tutorial there too, so it should be simple enough to get started.

Note: I am trying to knock @beijbom's layers. They are great examples of some of the more advanced uses of Caffe's python interface!

kli-casia · 2016-04-21T07:08:04Z

@elezar Thank you very much, I will check it out.

In fact, I am much more comfortable with python than C++.
I think it’s much easier to do some complex data preparation using python layer.
But the speed is much slower than C++ layers.
I am looking for methods which can speed up python data layers. ^_^

kli-casia · 2016-08-31T12:51:14Z

Is there a pre-fetching Python data layer right now? Thanks

beijbom · 2016-08-31T17:42:53Z

We have experimented with a pre-fetching Python data layers, but were unable to see a consistent speed-up so far. Let me know if you get it to work.

[example] tutorial on python data layers and multilabel classification

seanbell reviewed Dec 21, 2015
View reviewed changes

shelhamer added documentation ES labels Jan 11, 2016

shelhamer reviewed Feb 2, 2016
View reviewed changes

elezar mentioned this pull request Feb 11, 2016

Add support for multiple labels to be specified in the image_data_layer #3653

Closed

elezar reviewed Feb 17, 2016
View reviewed changes

beijbom and others added 3 commits February 29, 2016 19:07

Added tutorial on how to use python datalayers and multilabel classif…

15a979d

…ication.

Refactor and improve code style.

cf765b9

Fix some typos. Correct imports. Refactor data layers. Apply PEP8 formatting.

Finalized tutorial. Removed asyncronous layer.

9f8f777

shelhamer added a commit that referenced this pull request Mar 1, 2016

Merge pull request #3471 from beijbom/clean-datalayer-tutorial

358b60c

[example] tutorial on python data layers and multilabel classification

shelhamer merged commit 358b60c into BVLC:master Mar 1, 2016

beijbom deleted the clean-datalayer-tutorial branch April 1, 2016 22:48

shelhamer mentioned this pull request Apr 14, 2016

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Closed

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

Merge pull request BVLC#3471 from beijbom/clean-datalayer-tutorial

88a6058

[example] tutorial on python data layers and multilabel classification

Comments

Conversation

beijbom commented Dec 21, 2015

Uh oh!

wangg12 commented Dec 21, 2015

Uh oh!

beijbom commented Dec 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangg12 commented Dec 22, 2015

Uh oh!

kli-casia commented Jan 26, 2016

Uh oh!

lukeyeager commented Jan 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kli-casia commented Jan 27, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shelhamer commented Mar 1, 2016

Uh oh!

beijbom commented Mar 1, 2016

Uh oh!

shelhamer commented Mar 1, 2016

Uh oh!

kli-casia commented Apr 19, 2016

Uh oh!

beijbom commented Apr 19, 2016

Uh oh!

kli-casia commented Apr 20, 2016

Uh oh!

elezar commented Apr 21, 2016

Uh oh!

kli-casia commented Apr 21, 2016

Uh oh!

kli-casia commented Aug 31, 2016

Uh oh!

beijbom commented Aug 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lukeyeager commented Jan 26, 2016 •

edited

Loading