Debian11安装CUDA和CuDNN

发布于 2023-02-07  56 次阅读


准备工作

安装驱动

见前文

GCC版本换回GCC-10

因为tensorflow支持到cuda 11.2,其不支持GCC-12所以需要降级

aptitude install gcc g++

注意TensorFlow GPU版本对应

这里是TensorFlow的官方文档对应:完整的官方版本

版本 Python 版本 编译器 构建工具 cuDNN CUDA
tensorflow-2.6.0 3.6-3.9 GCC 7.3.1 Bazel 3.7.2 8.1 11.2
tensorflow-2.5.0 3.6-3.9 GCC 7.3.1 Bazel 3.7.2 8.1 11.2
tensorflow-2.4.0 3.6-3.8 GCC 7.3.1 Bazel 3.1.0 8.0 11.0
tensorflow-2.3.0 3.5-3.8 GCC 7.3.1 Bazel 3.1.0 7.6 10.1
tensorflow-2.2.0 3.5-3.8 GCC 7.3.1 Bazel 2.0.0 7.6 10.1
tensorflow-2.1.0 2.7、3.5-3.7 GCC 7.3.1 Bazel 0.27.1 7.6 10.1
tensorflow-2.0.0 2.7、3.3-3.7 GCC 7.3.1 Bazel 0.26.1 7.4 10.0
tensorflow_gpu-1.15.0 2.7、3.3-3.7 GCC 7.3.1 Bazel 0.26.1 7.4 10.0
tensorflow_gpu-1.14.0 2.7、3.3-3.7 GCC 4.8 Bazel 0.24.1 7.4 10.0
tensorflow_gpu-1.13.1 2.7、3.3-3.7 GCC 4.8 Bazel 0.19.2 7.4 10.0
tensorflow_gpu-1.12.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.15.0 7 9
tensorflow_gpu-1.11.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.15.0 7 9
tensorflow_gpu-1.10.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.15.0 7 9
tensorflow_gpu-1.9.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.11.0 7 9
tensorflow_gpu-1.8.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.10.0 7 9
tensorflow_gpu-1.7.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.9.0 7 9
tensorflow_gpu-1.6.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.9.0 7 9
tensorflow_gpu-1.5.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.8.0 7 9
tensorflow_gpu-1.4.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.5.4 6 8
tensorflow_gpu-1.3.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.4.5 6 8
tensorflow_gpu-1.2.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.4.5 5.1 8
tensorflow_gpu-1.1.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.4.2 5.1 8
tensorflow_gpu-1.0.0 2.7、3.3-3.6 GCC 4.8 Bazel 0.4.2 5.1 8

安装CUDA 11.2

CUDA的下载地址: https://developer.nvidia.com/cuda-toolkit-archive

我选择安装CUDA 11.2

wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run

根据安装完成输出的提示做一些配置

vim ~/.bashrc
//末尾添加
export CUDA_HOME=/usr/local/cuda-11.2
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:LD_LIBRARY_PATH
export PATH=/usr/local/cuda-11.2/bin:PATH

测试CUDA 11.2

命令行测试

root@debian:~# source ~/.bashrc
root@debian:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

运行CUDA自带的例子

安装CUDA的时候会问你是否要安装例子,我选了y, 然后就再我家目录下生成了一个目录 NVIDIA_CUDA-11.2_Samples

先编译

cd NVIDIA_CUDA-11.2_Samples
make all -j6
#编译完成生成了一个bin目录
cd bin/x86_64/linux/release/
./deviceQuery  #也可以执行其他的

输出

root@debian:~/NVIDIA_CUDA-11.2_Samples/bin/x86_64/linux/release# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA P106-100"
  CUDA Driver Version / Runtime Version          12.0 / 11.2
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6075 MBytes (6369902592 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1709 MHz (1.71 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS

安装CuDNN

cuDNN介绍
cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的针对深度神经网络的加速库,是一个用于深层神经网络的GPU加速库。如果你要用GPU训练模型,cuDNN不是必须的,但是一般会采用这个加速库。

cuDNN版本选择与安装
cuDNN需要寻找和CUDA版本匹配的进行安装:官方下载
下载的时候要注册并登陆,下载会有多个选择,只需要安装 cuDNN Library for Linux, 我下载到的文件名字是cudnn-11.2-linux-x64-v8.1.0.77.tgz

解压这个文件内容如下:

root@debian:/st/download# tree cuda
cuda
├── include
│   ├── cudnn_adv_infer.h
│   ├── cudnn_adv_infer_v8.h
│   ├── cudnn_adv_train.h
│   ├── cudnn_adv_train_v8.h
│   ├── cudnn_backend.h
│   ├── cudnn_backend_v8.h
│   ├── cudnn_cnn_infer.h
│   ├── cudnn_cnn_infer_v8.h
│   ├── cudnn_cnn_train.h
│   ├── cudnn_cnn_train_v8.h
│   ├── cudnn.h
│   ├── cudnn_ops_infer.h
│   ├── cudnn_ops_infer_v8.h
│   ├── cudnn_ops_train.h
│   ├── cudnn_ops_train_v8.h
│   ├── cudnn_v8.h
│   ├── cudnn_version.h
│   └── cudnn_version_v8.h
├── lib64
│   ├── libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
│   ├── libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.1.0
│   ├── libcudnn_adv_infer.so.8.1.0
│   ├── libcudnn_adv_train.so -> libcudnn_adv_train.so.8
│   ├── libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.1.0
│   ├── libcudnn_adv_train.so.8.1.0
│   ├── libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
│   ├── libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.1.0
│   ├── libcudnn_cnn_infer.so.8.1.0
│   ├── libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
│   ├── libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.1.0
│   ├── libcudnn_cnn_train.so.8.1.0
│   ├── libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
│   ├── libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.1.0
│   ├── libcudnn_ops_infer.so.8.1.0
│   ├── libcudnn_ops_train.so -> libcudnn_ops_train.so.8
│   ├── libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.1.0
│   ├── libcudnn_ops_train.so.8.1.0
│   ├── libcudnn.so -> libcudnn.so.8
│   ├── libcudnn.so.8 -> libcudnn.so.8.1.0
│   ├── libcudnn.so.8.1.0
│   ├── libcudnn_static.a
│   └── libcudnn_static_v8.a -> libcudnn_static.a
└── NVIDIA_SLA_cuDNN_Support.txt

2 directories, 42 files

然后执行

sudo cp cuda/include/cudnn.h   /usr/local/cuda-11.2/include/
sudo cp -d cuda/lib64/lib*     /usr/local/cuda-11.2/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda-11.2/lib64/libcudnn*

删除cuDNN(如果要)
如果要删除cuDNN,请执行:

sudo rm /usr/local/cuda-11.2/include/cudnn.h
sudo rm -r /usr/local/cuda-11.2/lib64/libcudnn*

测试TensorFlow

pip install tensorflow==2.6.0
import tensorflow as tf
print(tf.__version__)
print(tf.test.gpu_device_name())

返回

2.6.0
/device:GPU:0
2023-02-07 23:30:16.175483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 5375 MB memory:  -> device: 0, name: NVIDIA P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1

小错误

  1. successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

解决

#查看显卡id
lspci -D | grep NVIDIA
返回
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
#查看所有设备
ls /sys/bus/pci/devices/
返回
0000:00:00.0  0000:00:16.0  0000:00:1b.0  0000:00:1c.4  0000:00:1f.0  0000:00:1f.3  0000:01:00.0
0000:00:01.0  0000:00:1a.0  0000:00:1c.0  0000:00:1d.0  0000:00:1f.2  0000:00:1f.5  0000:03:00.0
#查看显卡节点
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
-1为关闭
#开启
sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
#再次检查
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0为工作

本当の声を響かせてよ