CUDA 9 and CuDNN 7 with PyTorch

I’ve been trying to find ways to speed up training of my autonomous driving networks, since the current training time is about 12 hours per epoch. One of my most recent efforts has been trying to upgrade to the newly released CUDA 9.0 RC and CuDNN 7 packages from Nvidia. While these are optimized for Nvidia’s new Volta architecture, they claim it speeds up operations on Pascal GPUs like the 1080 Tis my lab has as well.

To get CUDA 9 and CuDNN 7 working with PyTorch, the deep learning framework all of my group’s research code is written in, I had to clone Pull Request #2263 from the PyTorch GitHub, which is written by an Nvidia engineer to add CUDA 9 and CuDNN 7 support to PyTorch. However, it turned out there were some other issues with this. To get everything to work, here are the steps I had to follow:

  1. Download and install CUDA 9
  2. Download and install CuDNN 7
  3. Download and install NCCL
  4. Download and install Anaconda for Python 3.6
  5. Run the following workaround for NCCL:
    mkdir ~/nccl
    mkdir ~/nccl/include
    mkdir ~/nccl/lib
    ln -s /usr/include/nccl.h ~/nccl/include/
    ln -s /usr/lib/libnccl.so ~/nccl/lib/
    export CMAKE_PREFIX_PATH="~/nccl":$CMAKE_PREFIX_PATH
    export LIBRARY_PATH="~/nccl/lib/":$LIBRARY_PATH
    
  6. Clone the CUDA 9 branch for PyTorch:
    git clone https://github.com/csarofeen/pytorch
    cd pytorch
    git checkout cuda9
    
  7. Compile PyTorch
    export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
    conda install numpy pyyaml mkl setuptools cmake gcc cffi
    conda install -c soumith magma-cuda80
    python setup.py install
    

Once I finally got it working, I ran a speed test by running the PyTorch MNIST example on an AWS p2.xlarge instance with an Nvidia Tesla K80 using all of the default settings in the example code. Unfortunately, the speed test didn’t show CUDA 9 speeding up training in this case. On an instance with CUDA 8 and CuDNN 6, the MNIST example took 88 seconds to train 10 epochs. On an instance where I did the above steps to get CUDA 9 and CuDNN 7 working, it took 89 seconds. More experimentation is required to see if extra performance can be squeezed out of CUDA 9 and CuDNN 7.

Advertisements

One thought on “CUDA 9 and CuDNN 7 with PyTorch

  1. Hello, great blog post thanks! Was wondering, if you don’t want to use conda, are you still able to follow this tutorial?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s