New tutorial on python datalayers and multilabel classification#3471
New tutorial on python datalayers and multilabel classification#3471shelhamer merged 3 commits intoBVLC:masterfrom beijbom:clean-datalayer-tutorial
Conversation
|
How about the evaluation metrics of multilabel classification like the accuracy of the multi-label one? |
|
hey Wangg12, I use the hamming distance during test-time directly in the notebook, but did not put it in a python layer. Is that what you refer to? |
There was a problem hiding this comment.
You should cite the URL (https://github.com/rbgirshick/fast-rcnn) and tech report (http://arxiv.org/abs/1504.08083) that accompanies Fast R-CNN. The current citation isn't quite enough.
There was a problem hiding this comment.
citation updated. thanks for the pointer!
|
Hi, @beijbom According to Multi-label_classification,the hamming loss and multi-label version accuracy are equivalent to some extent. In my view, hamming distance cares more about the difference between prediction and groundtruth, while the multi-label version accuracy focuses more on how much groundtruth and top-k prediction are the same.(Correct me if I'm wrong.) In your implementation, hamming distance is along all prediction larger than 0 which means we need to choose a proper threshold(e.g. 0) for the positive labels. Maybe the multi-label version top-k accuracy is more flexible like the single label one. So if I want to use the multi-label version accuracy during test, could you give me some instructions on how to write this with python layer? Thanks! |
|
Great, I have been looking for python data layer for a long time. |
|
This tutorial may also be useful. It's not a data layer, but it's a bit more beginner-friendly. https://github.com/NVIDIA/DIGITS/blob/digits-3.0/examples/python-layer/README.md |
|
@lukeyeager Thank you very much, I will check it out. o(^▽^)o By the way, is there an example of python layer with parameters, for example, |
There was a problem hiding this comment.
I think this would be at home as caffe/examples/pycaffe/tools.py to make it more clearly related to pycaffe examples.
There was a problem hiding this comment.
sounds good. I just moved it.
| from threading import Thread | ||
| from PIL import Image | ||
|
|
||
| from pascal_multilabel_with_datalayer_tools import SimpleTransformer |
There was a problem hiding this comment.
Should this not now be "from tools import SimpleTransformer"?
There was a problem hiding this comment.
You are right, I'll fix that. Thanks!
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks a ton Evan, very nicely done. I'm not sure the new layer code is ideal from a tutorial perspective though. While certainly more professional, I think it obscures things a bit, and the reader have to jump back and forth to follow the line of execution. I tend to favour showing people the easiest, dumbest, and most straight-forward way of doing things, and let them take it from there. Does that makes sense? What do other folks think? @shelhamer?
There was a problem hiding this comment.
Now that you mention it, I might have gotten a bit carried away.
I was thinking about it this morning again, and I don't think a base class is necessarily required, but I would still think that using the BatchLoader in both the sync and async cases is a good idea as it clarifies what the steps in the process are, and users can "drill down" further if they need more info as to what actually happens when loading an image.
There was a problem hiding this comment.
Ok, sure, that makes sense. That's a good compromize. :). Do you want to make a new pull requests, or should I edit yours?
There was a problem hiding this comment.
I don't think that I'm going to get much of a chance to have a look this weekend. If it can wait until Monday, then I don't mind doing it myself. Otherwise, feel free to edit it.
Fix some typos. Correct imports. Refactor data layers. Apply PEP8 formatting.
[example] tutorial on python data layers and multilabel classification
|
Thanks for the multi-label tutorial Oscar! Potential follow-ups for the willing:
|
|
tnx Evan! I'll revisit some of the follow-ups after ECCV. The tutorial doesn't appear right on http://caffe.berkeleyvision.org/. Do I did set: but are there others? cheers On Tue, Mar 1, 2016 at 10:33 AM, Evan Shelhamer notifications@github.com
|
|
I had to push the new site from the latest master. It's there now! |
|
Hi, @beijbom . Why did you remove the asyncronous layer? In my opinion, It's benifical to prefetch a batch for next iteration when caffe is running. For the syncronous layer, caffe have to waite for batch loadiing, right? Thanks! |
|
hi @kli-nlpr! We decided to remove it because we didn't see any significant time savings (I agree that it SHOULD make sense, it just didn't :). It was unclear whether it was a implementation or hardware issue. Did you notice any time saving? |
|
Hi, @beijbom I didn't test it, I just wondering why you delete it. |
|
@kli-nlpr If you are really interested in performance, you can have a look at #3653, which adds multi-label support to the standard image data layer. This already implements batch prefetching and is quite a bit faster than the python layers here. Some testing and feedback would be welcome, and if there is more interest from the community, then maybe the PR could get merged sooner. I tried to reproduce @beijbom's pascal tutorial there too, so it should be simple enough to get started. Note: I am trying to knock @beijbom's layers. They are great examples of some of the more advanced uses of Caffe's python interface! |
|
@elezar Thank you very much, I will check it out. In fact, I am much more comfortable with python than C++. |
|
Is there a pre-fetching Python data layer right now? Thanks |
|
We have experimented with a pre-fetching Python data layers, but were unable to see a consistent speed-up so far. Let me know if you get it to work. |
[example] tutorial on python data layers and multilabel classification
The goal of this tutorial is twofold. First, it illustrates how to write python datalayers. Both syncronous and asyncronous. Second, it clarifies a common misconception that caffe "doesn't support multi-label classification". It does, and the correct loss layer is SigmoidCrossEntropyLoss.