准备工作
安装驱动
见前文
GCC版本换回GCC-10
因为tensorflow支持到cuda 11.2,其不支持GCC-12所以需要降级
aptitude install gcc g++
注意TensorFlow GPU版本对应
这里是TensorFlow的官方文档对应:完整的官方版本
版本 | Python 版本 | 编译器 | 构建工具 | cuDNN | CUDA |
---|---|---|---|---|---|
tensorflow-2.6.0 | 3.6-3.9 | GCC 7.3.1 | Bazel 3.7.2 | 8.1 | 11.2 |
tensorflow-2.5.0 | 3.6-3.9 | GCC 7.3.1 | Bazel 3.7.2 | 8.1 | 11.2 |
tensorflow-2.4.0 | 3.6-3.8 | GCC 7.3.1 | Bazel 3.1.0 | 8.0 | 11.0 |
tensorflow-2.3.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 3.1.0 | 7.6 | 10.1 |
tensorflow-2.2.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 2.0.0 | 7.6 | 10.1 |
tensorflow-2.1.0 | 2.7、3.5-3.7 | GCC 7.3.1 | Bazel 0.27.1 | 7.6 | 10.1 |
tensorflow-2.0.0 | 2.7、3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.15.0 | 2.7、3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.14.0 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.24.1 | 7.4 | 10.0 |
tensorflow_gpu-1.13.1 | 2.7、3.3-3.7 | GCC 4.8 | Bazel 0.19.2 | 7.4 | 10.0 |
tensorflow_gpu-1.12.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.11.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.10.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.9.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.11.0 | 7 | 9 |
tensorflow_gpu-1.8.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.10.0 | 7 | 9 |
tensorflow_gpu-1.7.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.6.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.5.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.8.0 | 7 | 9 |
tensorflow_gpu-1.4.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.5.4 | 6 | 8 |
tensorflow_gpu-1.3.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 6 | 8 |
tensorflow_gpu-1.2.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 5.1 | 8 |
tensorflow_gpu-1.1.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
tensorflow_gpu-1.0.0 | 2.7、3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
安装CUDA 11.2
CUDA的下载地址: https://developer.nvidia.com/cuda-toolkit-archive
我选择安装CUDA 11.2
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run
根据安装完成输出的提示做一些配置
vim ~/.bashrc
//末尾添加
export CUDA_HOME=/usr/local/cuda-11.2
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:LD_LIBRARY_PATH
export PATH=/usr/local/cuda-11.2/bin:PATH
测试CUDA 11.2
命令行测试
root@debian:~# source ~/.bashrc
root@debian:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
运行CUDA自带的例子
安装CUDA的时候会问你是否要安装例子,我选了y, 然后就再我家目录下生成了一个目录 NVIDIA_CUDA-11.2_Samples
先编译
cd NVIDIA_CUDA-11.2_Samples
make all -j6
#编译完成生成了一个bin目录
cd bin/x86_64/linux/release/
./deviceQuery #也可以执行其他的
输出
root@debian:~/NVIDIA_CUDA-11.2_Samples/bin/x86_64/linux/release# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA P106-100"
CUDA Driver Version / Runtime Version 12.0 / 11.2
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6075 MBytes (6369902592 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1709 MHz (1.71 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS
安装CuDNN
cuDNN介绍
cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的针对深度神经网络的加速库,是一个用于深层神经网络的GPU加速库。如果你要用GPU训练模型,cuDNN不是必须的,但是一般会采用这个加速库。
cuDNN版本选择与安装
cuDNN需要寻找和CUDA版本匹配的进行安装:官方下载
下载的时候要注册并登陆,下载会有多个选择,只需要安装 cuDNN Library for Linux, 我下载到的文件名字是cudnn-11.2-linux-x64-v8.1.0.77.tgz
解压这个文件内容如下:
root@debian:/st/download# tree cuda
cuda
├── include
│ ├── cudnn_adv_infer.h
│ ├── cudnn_adv_infer_v8.h
│ ├── cudnn_adv_train.h
│ ├── cudnn_adv_train_v8.h
│ ├── cudnn_backend.h
│ ├── cudnn_backend_v8.h
│ ├── cudnn_cnn_infer.h
│ ├── cudnn_cnn_infer_v8.h
│ ├── cudnn_cnn_train.h
│ ├── cudnn_cnn_train_v8.h
│ ├── cudnn.h
│ ├── cudnn_ops_infer.h
│ ├── cudnn_ops_infer_v8.h
│ ├── cudnn_ops_train.h
│ ├── cudnn_ops_train_v8.h
│ ├── cudnn_v8.h
│ ├── cudnn_version.h
│ └── cudnn_version_v8.h
├── lib64
│ ├── libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
│ ├── libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.1.0
│ ├── libcudnn_adv_infer.so.8.1.0
│ ├── libcudnn_adv_train.so -> libcudnn_adv_train.so.8
│ ├── libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.1.0
│ ├── libcudnn_adv_train.so.8.1.0
│ ├── libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
│ ├── libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.1.0
│ ├── libcudnn_cnn_infer.so.8.1.0
│ ├── libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
│ ├── libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.1.0
│ ├── libcudnn_cnn_train.so.8.1.0
│ ├── libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
│ ├── libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.1.0
│ ├── libcudnn_ops_infer.so.8.1.0
│ ├── libcudnn_ops_train.so -> libcudnn_ops_train.so.8
│ ├── libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.1.0
│ ├── libcudnn_ops_train.so.8.1.0
│ ├── libcudnn.so -> libcudnn.so.8
│ ├── libcudnn.so.8 -> libcudnn.so.8.1.0
│ ├── libcudnn.so.8.1.0
│ ├── libcudnn_static.a
│ └── libcudnn_static_v8.a -> libcudnn_static.a
└── NVIDIA_SLA_cuDNN_Support.txt
2 directories, 42 files
然后执行
sudo cp cuda/include/cudnn.h /usr/local/cuda-11.2/include/
sudo cp -d cuda/lib64/lib* /usr/local/cuda-11.2/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda-11.2/lib64/libcudnn*
删除cuDNN(如果要)
如果要删除cuDNN,请执行:
sudo rm /usr/local/cuda-11.2/include/cudnn.h
sudo rm -r /usr/local/cuda-11.2/lib64/libcudnn*
测试TensorFlow
pip install tensorflow==2.6.0
import tensorflow as tf
print(tf.__version__)
print(tf.test.gpu_device_name())
返回
2.6.0
/device:GPU:0
2023-02-07 23:30:16.175483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 5375 MB memory: -> device: 0, name: NVIDIA P106-100, pci bus id: 0000:01:00.0, compute capability: 6.1
小错误
- successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
解决
#查看显卡id
lspci -D | grep NVIDIA
返回
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
#查看所有设备
ls /sys/bus/pci/devices/
返回
0000:00:00.0 0000:00:16.0 0000:00:1b.0 0000:00:1c.4 0000:00:1f.0 0000:00:1f.3 0000:01:00.0
0000:00:01.0 0000:00:1a.0 0000:00:1c.0 0000:00:1d.0 0000:00:1f.2 0000:00:1f.5 0000:03:00.0
#查看显卡节点
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
-1为关闭
#开启
sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
#再次检查
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0为工作
Comments | NOTHING