GitHub - novasearch/cuda

.. hightlight:: rest

CUDA Roll

Contents

CUDA Roll

Introduction

This roll installs:

NVIDIA CUDA Toolkit 9.2.148 + Patch 1
NVIDIA Driver 396.54
NVIDIA CUDA Deep Neural Network library (cuDNN) 7.4.2.24

For more information about the NVIDIA CUDA Toolkit please see the official NVIDIA developer website

Requirements

To build/install this roll you need to download cuda toolkit and driver source files (*.run format) and plase them in respective directories in src/:

The toolkit distro is ~1Gb. Must have enough space (~ 1.5GB) in / when building the roll.

Building

To build the roll, execute :

# make 2>&1 | tee build.log

A successful build will create cuda-*.x86_64*.iso file.

Installing

To add this roll to existing cluster, execute these instructions on a Rocks frontend node:

# rocks add roll *.iso
# rocks enable roll cuda
# cd /export/rocks/install
# rocks create distro
# rocks run roll cuda > add-roll.sh

And on login node execute resulting add-roll.sh:

# bash add-roll.sh 2>&1 | tee  add-roll.out

Reinstall compute nodes (only GPU-enabled):

# rocks set host attr compute-X-Y cuda true
# rocks set host boot compute-X-Y action=install
# rocks run host compute-X-Y reboot

After the compute node comes up reboot it again to initiate the driver installation and loading.

In addition to the software, the roll installs cuda environment module files in:

/opt/modulefiles/applications/cuda

To use the modules:

% module load cuda

What is installed

The following is installed with cuda roll:

/opt/cuda/driver - NVIDIA driver
/etc/init.d/nvidia  - nvidia startup/shutdown script (disabled on login node)
/opt/cuda   - toolkit (without samples on compute nodes)
/opt/modules/applications/cuda - module environment

On login nodes:

/opt/cuda/samples  - code samples
/var/www/html/cuda - link to cuda html documentation

Testing

The tests commands are run on GPU-enabled nodes.

To find information about installed GPU card execute:

nvidia-smi

Run GPU device tests :

% /opt/cuda/bin/deviceQuery
% /opt/cuda/bin/deviceQueryDrv
% /opt/cuda/bin/bandwidthTest
% /opt/cuda/bin/p2pBandwidthLatencyTest

CUDA and SGE

Some users reposrt increase in virtual memory use when using CUDA. See following links for additional info.

Useful commands:

pmap -x PID
more /proc/PID/smaps

GPU monitoring plugin for gmond

https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
graphs/default		graphs/default
nodes		nodes
src		src
.cuda.metadata		.cuda.metadata
.gitignore		.gitignore
Makefile		Makefile
README.rst		README.rst
bootstrap.sh		bootstrap.sh
version.mk		version.mk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Roll

Introduction

Requirements

Building

Installing

What is installed

Testing

CUDA and SGE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages