Skip to content

Out-of-memory on g2.8xlarge #34

@lukeyeager

Description

@lukeyeager

See NVIDIA/DIGITS#310.

/cc @ajsander

I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0915 14:10:45.809661 98789 common.cpp:266] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB.
NVIDIA/DIGITS#310 (comment)

Here is some information about his system:

Running on an Amazon g2.8xlarge
GPU[s]: 4x GRID K520
CUDA 7.0
cuDNN 7.0
Caffe version 0.12 NVIDIA fork
DIGITS 2.1

Both Alexnet and GoogleNet Experienced the same problem
NVIDIA/DIGITS#310 (comment)

Here's how I reproduced it:

  1. Start up a Ubuntu 14.04 on g2.8xlarge EC2 instance
  2. Install the 346 driver
  3. Installed DIGITS 2.0 and Caffe 0.13.1 (with CNMeM) using the web installer
  4. Create a small dataset of 256x256 images
  5. Train AlexNet on it
  6. Try to classify an image

The big question

Why would we run out of memory during inference but not while training?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions