Add support for multiple labels to be specified in the image_data_layer by elezar · Pull Request #3653 · BVLC/caffe

elezar · 2016-02-09T14:31:13Z

Although there are a number of pull requests pending where mulit-label support has been discussed, many of them seem to have been abandoned.

For reference for example: #523 #3268

The latter may still be active (last updated in December 2015), but they also deal with the loss or accuracy layers, and not with the data layers themselves.

This pull requests adds the ability to specify multiple labels in a text file directly e.g.:

images/cat.jpg 0
images/fish-bike.jpg 1 9

instead of having to maintain separate labels and data sets.

It is also possible to specify a list of labels to ignore following the labels per item:

images/cat.jpg 0;9
images/fish-bike.jpg 1 9

Ignoring a label equates to setting the entries in the label top blob to some predefined value (-1). In the same way that multiple labels could be specified, multiple ignore labels could also be specified as a list. Here a semi-colon (;) has been chosen as the label and ignore list separator.

The following points should be noted:

If a single label is specified per line in the input file, the current behavior is maintained. That is to say that the label top for the layer is an (N,1) dimensional blob, with the entries equal to the label of the image.
If multiple labels (including ignore labels) are specified per line, the label top layer is an (N, K) dimensional blob, with K=max_label + 1. In this case the entries in the blob are zero if the label is not present in the line, or 1 if this is the case. max_label is determined when reading the file for the first time.

Some extensions that I would like to add:

For this to be useful, the loss and accuracy layers should also be extended. The best would most likely be to work in conjunction with other pull requests such as multi-label softmax support #3268 in this regard. @xdshang, any thoughts on this? What other accuracy or loss layers make sense to extend to the multi-label case? The sigmoid cross entropy loss, for example.
The separators (between individual labels) should be flexible. Files may be comma-separated, or space separated, for example.

I would also like to look at adapting #3471 to use the new data layer.

Thoughts and comments would be welcome.

Note, I had to add std::endl outputs to the unit tests. I assume that this makes more sense in any case.

shenyunhang · 2016-02-29T05:52:53Z

This is exactly what I need. Have you test it on PASCAL VOC dataset?

elezar · 2016-02-29T07:20:32Z

I have, and have been working on a simple tutorial. I will push it soon.

elezar · 2016-03-01T13:05:56Z

@shenyunhang Please see the latest push. I have included an example that uses the PASCAL VOC dataset with the multi-label image data layer. The example can be run as both an iPython notebook, from the script.

The format of the iPython notebook is "borrowed" from @beijbom's PR #3471 (I will update it further once that PR has been accepted and merged).

shenyunhang · 2016-03-01T15:04:20Z

@elezar Thank you very much. I am trying on it.
However, I have some data with the following format:

filename ;ignore1 ignore2

Here I can't make sure what labels are associated with the data line.
It seem image_data_layer can't work with such format. But it is OK. I will modify some places to make it work for me.

elezar · 2016-03-01T16:28:46Z

@shenyunhang I will look at adding a unit test for your example you tomorrow. There should not be a specific reason that this does not work, but I may have missed something in the implementation. You are also welcome to submit a pull request to the feature branch at zalando/caffe is you get it working before then.

elezar · 2016-03-31T09:52:01Z

@shenyunhang were you able to get the layer working for you? If so, do you mind sharing your changes with me?

elezar · 2016-03-31T12:37:03Z

@shenyunhang I have updated the image data layer to support ignore labels when no labels have been specified.

…data_layer. This adds support for lines in the image_data_layer with the following format: filename label1 label2 label3;ignore1 ignore2 Here multiple labels (label 1, 2, and 3) are associated with the data line and labels to ignore (e.g. not take into account when calculating loss or accuracy) can also be specified. The separator for the two label lists can be specified by the user, as can the separator for the individual labels. By default, labels to be ignored are assigned a value of -1, but this can be specified. The SigmoidCrossEntropy layer has also been changed to allow for a particular label to be ignored when calculating the loss, or back propagating the diff. A PASCAL VOC2012 Example has been included to show the use of the code.

elezar · 2016-04-01T14:44:06Z

@beijbom Since you worked on the Python implementation of the example, would you mind looking at notebook that I have included in this PR? @shelhamer This also contains a prototxt version of the network which could be used to address some of the TODOs mentioned in #3471 (comment)

beijbom · 2016-04-01T22:54:36Z

hey @elezar , I think the notebook could use some more explanation & comments, but it's getting there! :). It will be very useful to have multi-label support in the image data-layer, as this is a very common user-case, and the python data-layers are slow for production stuff.

cheers

elezar · 2016-04-04T14:45:09Z

@beijbom Thanks. I have tried to improve the notebook (once again using your notebook as reference) and pushed the changes.

elezar · 2016-05-10T08:58:38Z

@shelhamer could you perhaps have a look at this PR and let me know what work is still required?

wuhang · 2016-05-11T08:47:31Z

@elezar
I'm a new caffe, how do i classification new image by multi lable, can give me a complete classification of examples? thanks.

elezar · 2016-05-11T14:46:45Z

@wuhang I think a good place to start is @beijbom's tutorial. There is an iPython notebook included in master.

elezar · 2018-08-13T17:52:56Z

Closing this PR as stale.

This was referenced Feb 16, 2016

Allow EuclideanLossLayer to ignore labels #3677

Closed

New tutorial on python datalayers and multilabel classification #3471

Merged

elezar force-pushed the feature/mulit_label_image_data_layer branch from 3c3ff57 to 5dfbc9e Compare February 24, 2016 15:23

elezar force-pushed the feature/mulit_label_image_data_layer branch 2 times, most recently from 0afa9a5 to 3cef9db Compare March 1, 2016 10:10

elezar force-pushed the feature/mulit_label_image_data_layer branch from cfdddeb to 4c4f3f0 Compare March 31, 2016 12:36

elezar force-pushed the feature/mulit_label_image_data_layer branch from b31cb70 to 02868e1 Compare April 1, 2016 10:21

Evan Lezar added 2 commits April 4, 2016 14:41

Add comments to ipython notebook.

a97e725

Add better documentation to the image datalayer pascal example.

19e35c4

elezar mentioned this pull request May 24, 2016

Add support for specifying a separator in the image data layer #4203

Closed

elezar closed this Aug 13, 2018

Comments

Conversation

elezar commented Feb 9, 2016

Uh oh!

shenyunhang commented Feb 29, 2016

Uh oh!

elezar commented Feb 29, 2016

Uh oh!

elezar commented Mar 1, 2016

Uh oh!

shenyunhang commented Mar 1, 2016

Uh oh!

elezar commented Mar 1, 2016

Uh oh!

elezar commented Mar 31, 2016

Uh oh!

elezar commented Mar 31, 2016

Uh oh!

elezar commented Apr 1, 2016

Uh oh!

beijbom commented Apr 1, 2016

Uh oh!

elezar commented Apr 4, 2016

Uh oh!

elezar commented May 10, 2016

Uh oh!

wuhang commented May 11, 2016

Uh oh!

elezar commented May 11, 2016

Uh oh!

elezar commented Aug 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants