In this article I describe how to set up supported Nvidia hardware to work with debian, by the end you should be able to run a tensorflow tutorial on the video card. I ran this tutorial on Nvidia GTX 1060 6GB.
This tutorial uses CUDA kit packaged by Debian project and not the one provided by Nvidia for Ubuntu
This is a part of the talk I gave at August 2019 GTALUG meeting. I've been coming back to the video for the various bits of settings and perhaps having this written down would be more useful.
The following is needed to successfully run tensorflow-gpu. See tensorflow website for updated list & details.
- NVIDIA GPU drivers
- CUDA toolkit
- CUPTI ships with CUDA toolkit
- cuDNN SDK
Add non-free repository to your /etc/apt/sources.list if you haven't already
deb http://ftp.ca.debian.org/debian/ buster main contrib non-free deb-src http://ftp.ca.debian.org/debian/ buster main contrib non-free
Run apt update & upgrade as a superuser
# apt update # apt upgrade
Install CUDA package
# apt install nvidia-cuda-dev nvidia-cuda-toolkit nvidia-driver
Install CUDDN package
Unfortunately the license for the package doesn't allow it to be distributed by the third-party, like debian repos, so you will have to register fill out survey on nvidia website https://developer.nvidia.com/cudnn in order to download the package.
The discussion on detailed issues preventing packaging this can be found at this page -- https://bugs.debian.org/862524.
Once downloaded, install the package with the following command
# dpkg -i libcudnn6_6.0.20-1_cuda8.0_amd64.deb
Create a Python 3 virtual environment and install tensorflow and tensorflow-gpu packages with the following command
$ pip install tensorflow tensorflow-gpu
Clone tensorflow models tutorial and attempt to run classify_imagenet.py script
$ git clone https://github.com/tensorflow/models.git $ cd models/tutorials/image/imagenet $ python3 classify_image.py
I theory, at this point the classification script should work, however in practice it doesn't.
You will be greeted with errors related to missing libraries. These errors, however, are fairly straightforward to fix -- some of the library files are in place that tensorflow package doesn't expect. This can be mitigated by creating several symlinks, as described in the next section.
The first error running classify_image.py will throw the following:
Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory libcublas.so.10.0 libcufft.so.10.0 libcurand.so.10.0 libcusolver.so.10.0 libcusparse.so.10.0
This file is provided by libcudart10.1. To find which package provides this file I use apt-file utility that lets you to search for a particular file among all available packages.
Attempting to install the package it would reveal that it is already present.
# apt install libcudart10.1 Reading package lists... Done Building dependency tree Reading state information... Done libcdart10.1 is already the newest version (10.1.105-2). libcudart10.1 set to manually installed.
The only other way to fix it is to create a symlink from the version 10.1 to the version required by tensorflow - 10.0.
If you try to re-run the classifier, you will encounter the same error caused by a different library requirement. The snippet below should fix all errors like this.
# cd /usr/lib/x86_64-linux-gnu # ln -s libcudart.so.10.1 libcudart.so.10.0 # ln -s libcublas.so libcublas.so.10.0 # ln -s libcufft.so libcufft.so.10.0 # ln -s libcurand.so libcurand.so.10.0 # ln -s libcusolver.so libcusolver.so.10.0 # ln -s libcusparse.so libcusparse.so.10.0
Running classify_image.py again will now run successfully and produce desired result.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296) custard apple (score = 0.00147) earthstar (score = 0.00117)
Here is the video presentation of the material.