Skip to content

Comments

Optimized accuracy calculation#5692

Closed
Noiredd wants to merge 3 commits intoBVLC:masterfrom
Noiredd:master
Closed

Optimized accuracy calculation#5692
Noiredd wants to merge 3 commits intoBVLC:masterfrom
Noiredd:master

Conversation

@Noiredd
Copy link
Member

@Noiredd Noiredd commented Jun 14, 2017

Note: sorry about potential confusion arising from multiple edits to this PR. It is complete now, unless someone finds room for more improvement.

Motivation: accuracy calculation, particularly for segmentation tasks with top_k: 1, can be extremely slow: note the two nested for loops (over the batch and over image pixels) with partial_sort inside. The main culprit here is the need to copy all data to a new container so we can sort it.

My proposal is to replace that with a dynamically updated priority_queue: instead of copying everything and thinking later, we can iterate just once and only copy an element when it's larger than the smallest of k currently best ones (provided by an automatically sorted container).
Initially I thought of having a separate case for top_k: 1, where we would only make a single iteration and remember the single best element, but it is only marginally faster than the queue approach (see below), therefore I abandoned the idea for the sake of code clarity.

Benchmark settings:

  • net: FCN_AlexNet, batch size 1,
  • dataset: PASCAL_VOC (21 classes, 1464 images in training set, 1449 for validation, images are about 360x550) - about 250M top_k searches per epoch,
  • solver setup: max_iter: 14640, test_interval: 1464, test_iter: 1449,
  • hardware: Titan Z, 2x Xeon E5620 (2.4 GHz),
  • software: Ubuntu 14.04, CUDA 8.0, cuDNN 5.1, ATLAS, DIGITS 5.1-dev.

Each build was ran 4 times (Titan Z can run two nets in parallel so it was effectively 2 times 2 runs), times were measured from job initialization to completion (as reported by DIGITS).

Results:

top_k current master priority_queue optimal search
1 90m 00s 58m 57s 58m 34s
5 98m 49s 88m 42s N/A

For me this is about 10% faster for top-5 accuracy, and over 30% faster for top-1.

Task list:

  • optimize the top_k==1 case
  • optimize the general case
  • provide a simple benchmark
  • simplify to an insignificantly slower but much clearer approach.

@Noiredd Noiredd changed the title Faster accuracy calculation for top-1 prediction [do not merge] Faster accuracy calculation for top-1 prediction (WIP) Jun 15, 2017
@Noiredd Noiredd changed the title [do not merge] Faster accuracy calculation for top-1 prediction (WIP) Optimized accuracy calculation Jun 19, 2017
@Noiredd Noiredd force-pushed the master branch 2 times, most recently from 96ba397 to 9670e4f Compare June 20, 2017 12:54
@Noiredd
Copy link
Member Author

Noiredd commented Jun 21, 2017

Is there any way I can get to the details of the Travis build error? My make runtest passed without problems, but there I got a "100% mismatch", which seems confusing because the visible parts of x and y arrays look identical.

EDIT: this is not the only PR to fail for this reason. #5595 resulted in a similar line, #5558 as well, same with the current state of #5676 (log).

@Noiredd
Copy link
Member Author

Noiredd commented Jul 3, 2017

I guess I force-pushed too much and broke Travis. Going to close this and open a new PR.

@Noiredd Noiredd closed this Jul 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant