Out-of-memory on g2.8xlarge

See https://github.com/NVIDIA/DIGITS/issues/310.

/cc @ajsander

> I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:
> 
>    WARNING: Logging before InitGoogleLogging() is written to STDERR
>     F0915 14:10:45.809661 98789 common.cpp:266] Check failed: error == cudaSuccess (2 vs. 0)  out of        memory
>     **\* Check failure stack trace: ***
> 
> When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB. 
> https://github.com/NVIDIA/DIGITS/issues/310#issue-106567556

Here is some information about his system:

> Running on an Amazon g2.8xlarge
> GPU[s]: 4x GRID K520
> CUDA 7.0
> cuDNN 7.0
> Caffe version 0.12 NVIDIA fork
> DIGITS 2.1
> 
> Both Alexnet and GoogleNet Experienced the same problem
> https://github.com/NVIDIA/DIGITS/issues/310#issuecomment-140457927

Here's how I reproduced it:
1. Start up a Ubuntu 14.04 on `g2.8xlarge` EC2 instance
2. Install the 346 driver
3. Installed DIGITS 2.0 and Caffe 0.13.1 (with CNMeM) using the [web installer](https://developer.nvidia.com/digits)
4. Create a small dataset of 256x256 images
5. Train AlexNet on it
6. Try to classify an image
## The big question

Why would we run out of memory during inference but not while training?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out-of-memory on g2.8xlarge #34

The big question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out-of-memory on g2.8xlarge #34

Description

The big question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions