I compared 8-gpu caffe training with and without CuDNN. Surprisingly, CuDNN reduces training speed. I was wondering if anybody has seen this.
Here are some details:
OS: RHEL 6.5
CUDA: 7.5
CUDNN: 5.1
GPUs: 8 Telsa-K80
Caffe model: caffenet reference model
Data set: ImageNet.
Speed:
1-gpu with cudnn = 1.7 X 1-gpu without cudnn.
8-gpu with cudnn = 0.86 X 8-gpu without cudnn.
I can provide more information if needed.