Use regular expressions to parse image data text files. by erictzeng · Pull Request #1971 · BVLC/caffe

erictzeng · 2015-02-25T23:03:19Z

Fixes #1951.

This pull request consists of two changes:

Rather than the brittle ifstream method of parsing image data files, this pull request uses regular expressions for more robust matching.
Previously, the parsing code was duplicated across two files, tools/convert_imageset.cpp and src/caffe/layers/image_data_layer.cpp. This pull request pulls that common code out into a new function in src/caffe/util/io.cpp for ease of maintenance.

More details follow.

Each line of the input text file is matched against the following regular expression:

\h*("?)(.+?)\1\h+(\d+)\h*

Feel free to play around with an interactive version so you can test it out and see what it matches. This regular expression handles a lot of cases that would've been difficult to handle using the previous naive approach. It captures whitespace within a filename, and enables quoting of filenames in case for some insane reason you have a space at the beginning of a file name.

Some concrete examples of really degenerate cases that will parse correctly:

file name with spaces.jpg 1
" file_name_with_leading_space.jpg" 2
file_name_with_"_symbol.jpg 3
" really disgusting " file  ""name  .jpg" 4

One drawback is that this introduces boost_regex as an additional dependency. However, since we already require Boost, this seems like an acceptable tradeoff.

Implementation-wise, this pull request should be complete, though it's lacking tests, which I will get around to writing at some point in the near future.

shelhamer · 2015-03-07T06:22:05Z

@erictzeng this looks right -- thanks for fixing the brittle format -- but I think you need to update the travis script to install boost regex: https://github.com/BVLC/caffe/blob/master/scripts/travis/travis_install.sh.

bchu · 2016-03-30T00:01:29Z

Any updates on this?

Use regular expressions to parse image data text files.

801d217

shelhamer added the in progress label Mar 7, 2015

bchu mentioned this pull request Dec 14, 2015

Fix ImageDataLayer's silent failure on file paths with spaces #3433

Closed

bchu mentioned this pull request May 16, 2016

handle spaces in image file names #4059

Merged

malreddysid mentioned this pull request Jun 1, 2016

Resolve SIGSEGV error in image_data_layer.cpp #4218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Use regular expressions to parse image data text files.#1971

Use regular expressions to parse image data text files.#1971
erictzeng wants to merge 1 commit intoBVLC:masterfrom
erictzeng:convert_imageset_spaces

erictzeng commented Feb 25, 2015

Uh oh!

shelhamer commented Mar 7, 2015

Uh oh!

bchu commented Mar 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

erictzeng commented Feb 25, 2015

Uh oh!

shelhamer commented Mar 7, 2015

Uh oh!

bchu commented Mar 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants