准备工作
安装驱动
见前文
GCC版本换回GCC-10
因为tensorflow支持到cuda 11.2,其不支持GCC-12所以需要降级
aptitude install gcc g++
注意TensorFlow GPU版本对应
这里是TensorFlow的官方文档对应:完整的官方版本
版本 | Python 版本 | 编译器 | 构建工具 | cuDNN | CUDA |
---|---|---|---|---|---|
tensorflow-2.6.0 | 3.6-3.9 | GCC 7.3.1 | Bazel 3.7.2 | 8.1 | 11.2 |
tensorflow-2.5.0 | 3.6-3.9 | GCC 7.3.1 | Bazel 3.7.2 | 8.1 | 11.2 |
tensorflow-2.4.0 | 3.6-3.8 | GCC 7.3.1 | Bazel 3.1.0 | 8.0 | 11.0 |
tensorflow-2.3.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 3.1.0 | 7.6 | 10.1 |
tensorflow-2.2.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 2.0.0 | 7.6 | 10.1 |
tensorflow-2.1.0 | 2.7、3.5-3.7 | GCC 7.3.1 | Bazel 0.27.1 | 7.6 | 10.1 |
tensorflow-2.0.0 | 2.7、3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.15.0 | 2.7、3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.14.0 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.24.1 | 7.4 | 10.0 |
tensorflow_gpu-1.13.1 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.19.2 | 7.4 | 10.0 |
tensorflow_gpu-1.12.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.11.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.10.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.9.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.11.0 | 7 | 9 |
tensorflow_gpu-1.8.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.10.0 | 7 | 9 |
tensorflow_gpu-1.7.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.6.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.5.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.8.0 | 7 | 9 |
tensorflow_gpu-1.4.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.5.4 | 6 | 8 |
tensorflow_gpu-1.3.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 6 | 8 |
tensorflow_gpu-1.2.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 5.1 | 8 |
tensorflow_gpu-1.1.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
tensorflow_gpu-1.0.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
安装CUDA 11.2
CUDA的下载地址: https://developer.nvidia.com/cuda-toolkit-archive
我选择安装CUDA 11.2
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run sudo sh cuda_11.2.0_460.27.04_linux.run
根据安装完成输出的提示做一些配置
vim ~/.bashrc //末尾添加 export CUDA_HOME=/usr/local/cuda-11.2 export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:LD_LIBRARY_PATH export PATH=/usr/local/cuda-11.2/bin:PATH
测试CUDA 11.2
命令行测试
root@debian:~# source ~/.bashrc root@debian:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Nov_30_19:08:53_PST_2020 Cuda compilation tools, release 11.2, V11.2.67 Build cuda_11.2.r11.2/compiler.29373293_0
运行CUDA自带的例子
安装CUDA的时候会问你是否要安装例子,我选了y, 然后就再我家目录下生成了一个目录 NVIDIA_CUDA-11.2_Samples
先编译
cd NVIDIA_CUDA-11.2_Samples make all -j6 #编译完成生成了一个bin目录 cd bin/x86_64/linux/release/ ./deviceQuery #也可以执行其他的
输出
root@debian:~/NVIDIA_CUDA-11.2_Samples/bin/x86_64/linux/release# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVIDIA P106-100" CUDA Driver Version / Runtime Version 12.0 / 11.2 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 6075 MBytes (6369902592 bytes) (10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores GPU Max Clock rate: 1709 MHz (1.71 GHz) Memory Clock rate: 4004 Mhz Memory Bus Width: 192-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 98304 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 11.2, NumDevs = 1 Result = PASS
安装CuDNN
cuDNN介绍
cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的针对深度神经网络的加速库,是一个用于深层神经网络的GPU加速库。如果你要用GPU训练模型,cuDNN不是必须的,但是一般会采用这个加速库。
cuDNN版本选择与安装
cuDNN需要寻找和CUDA版本匹配的进行安装:官方下载
下载的时候要注册并登陆,下载会有多个选择,只需要安装 cuDNN Library for Linux, 我下载到的文件名字是cudnn-11.2-linux-x64-v8.1.0.77.tgz
解压这个文件内容如下:
root@debian:/st/download# tree cuda cuda ├── include │ ├── cudnn_adv_infer.h │ ├── cudnn_adv_infer_v8.h │ ├── cudnn_adv_train.h │ ├── cudnn_adv_train_v8.h │ ├── cudnn_backend.h │ ├── cudnn_backend_v8.h │ ├── cudnn_cnn_infer.h │ ├── cudnn_cnn_infer_v8.h │ ├── cudnn_cnn_train.h │ ├── cudnn_cnn_train_v8.h │ ├── cudnn.h │ ├── cudnn_ops_infer.h │ ├── cudnn_ops_infer_v8.h │ ├── cudnn_ops_train.h │ ├── cudnn_ops_train_v8.h │ ├── cudnn_v8.h │ ├── cudnn_version.h │ └── cudnn_version_v8.h ├── lib64 │ ├── libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8 │ ├── libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.1.0 │ ├── libcudnn_adv_infer.so.8.1.0 │ ├── libcudnn_adv_train.so -> libcudnn_adv_train.so.8 │ ├── libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.1.0 │ ├── libcudnn_adv_train.so.8.1.0 │ ├── libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8 │ ├── libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.1.0 │ ├── libcudnn_cnn_infer.so.8.1.0 │ ├── libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8 │ ├── libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.1.0 │ ├── libcudnn_cnn_train.so.8.1.0 │ ├── libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8 │ ├── libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.1.0 │ ├── libcudnn_ops_infer.so.8.1.0 │ ├── libcudnn_ops_train.so -> libcudnn_ops_train.so.8 │ ├── libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.1.0 │ ├── libcudnn_ops_train.so.8.1.0 │ ├── libcudnn.so -> libcudnn.so.8 │ ├── libcudnn.so.8 -> libcudnn.so.8.1.0 │ ├── libcudnn.so.8.1.0 │ ├── libcudnn_static.a │ └── libcudnn_static_v8.a -> libcudnn_static.a └── NVIDIA_SLA_cuDNN_Support.txt 2 directories, 42 files
然后执行
sudo cp cuda/include/cudnn.h /usr/local/cuda-11.2/include/ sudo cp -d cuda/lib64/lib* /usr/local/cuda-11.2/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda-11.2/lib64/libcudnn*
删除cuDNN(如果要)
如果要删除cuDNN,请执行:
sudo rm /usr/local/cuda-11.2/include/cudnn.h sudo rm -r /usr/local/cuda-11.2/lib64/libcudnn*
测试TensorFlow
pip install tensorflow==2.6.0 import tensorflow as tf print(tf.__version__) print(tf.test.gpu_device_name())
返回
2.6.0 /device:GPU:0 2023-02-07 23:30:16.175483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 5375 MB memory: -> device: 0, name: NVIDIA P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1
小错误
- successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
解决
#查看显卡id lspci -D | grep NVIDIA 返回 cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node #查看所有设备 ls /sys/bus/pci/devices/ 返回 0000:00:00.0 0000:00:16.0 0000:00:1b.0 0000:00:1c.4 0000:00:1f.0 0000:00:1f.3 0000:01:00.0 0000:00:01.0 0000:00:1a.0 0000:00:1c.0 0000:00:1d.0 0000:00:1f.2 0000:00:1f.5 0000:03:00.0 #查看显卡节点 cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node -1为关闭 #开启 sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node #再次检查 cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node 0为工作
Comments | NOTHING