過去ページ docker200322

nvidiaカードが入ったマシンにdockerを入れて、そのコンテナでGPU計算を行ってみる

docker/run
docker/Dockerfile

dockerのリポジトリをインストールします

インストールする計算機はこんな感じ物

[root@docker ~]# cat /etc/redhat-release
Rocky Linux release 9.2 (Blue Onyx)
 
 
[root@docker ~]# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.104.05  Sat Aug 19 01:15:15 UTC 2023
GCC version:  gcc version 11.3.1 20221121 (Red Hat 11.3.1-4) (GCC)
 
[root@docker ~]# ls -l /usr/local/cuda
ls: cannot access /usr/local/cuda: No such file or directory   <-- cudaライブラリは入れていない
[root@docker ~]#

まずOS提供ではなく docker 側で提供するリポジトリからdockerを入れる

[root@docker ~]# dnf install yum-utils
[root@docker ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

中身はdocker/repository

一応yumで調べると、「docker.x86_64」はOS提供のリポジトリから得られるdockerのようで、
今回は「docker-ce.x86_64」を入れます。こちらは「docker-ce」側で提供してパッケージみたい

[root@docker ~]# dnf install docker-ce
(同時に containerd.io、docker-ce-cli、docker-ce-rootless-extras、docker-compose-plugin、docker-buildx-pluginもインストールされる)
 
[root@docker ~]# systemctl start docker

一応バージョン確認

[root@docker ~]# docker --version
Docker version 24.0.6, build ed223bc
[root@docker ~]#

次に「NVIDIA Container Toolkit」をインストールします

NVIDIA Container Toolkit(旧名: NVIDIA Docker?, nvidia-docker2?)をインスト

本家様 https://github.com/NVIDIA/nvidia-docker
次にdockerにnvidiaのツールキットを載せます.

[root@docker ~]# curl -s -o /etc/yum.repos.d/nvidia-docker.repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
[root@docker ~]# cat /etc/yum.repos.d/nvidia-docker.repo
[nvidia-container-toolkit]
name=nvidia-container-toolkit
baseurl=https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
 
[nvidia-container-toolkit-experimental]
name=nvidia-container-toolkit-experimental
baseurl=https://nvidia.github.io/libnvidia-container/experimental/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
 
[root@docker ~]#
[root@docker ~]# dnf install -y nvidia-container-toolkit
(「libnvidia-container-tools」と「libnvidia-container1」「nnvidia-container-toolkit-base」が同時にインストールされる)
 
 
[root@docker ~]# systemctl restart docker

nvidia-docker.repo の中身はこちらでdocker/NVIDIAContainerToolkit

ここでちょいとテスト

[root@docker ~]# nvidia-container-cli info
NVRM version:   535.104.05
CUDA version:   12.2
 
Device Index:   0
Device Minor:   0
Model:          NVIDIA RTX A2000
Brand:          NvidiaRTX
GPU UUID:       GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084
Bus Location:   00000000:13:00.0
Architecture:   8.6
[root@docker ~]#

っで問題ないとかきのよに表示される
nvidia-container-cli: initialization error: nvml error: driver not loaded」の時はNVIDIA#o13e41e5の「persistence mode」にすれば回避されるみたい.

そしてdockerを使ってのGPUテスト

[root@docker ~]# docker run --gpus all --rm  nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.8.0-runtime-ubuntu22.04' locally
11.8.0-runtime-ubuntu22.04: Pulling from nvidia/cuda
6b851dcae6ca: Pull complete
 :
 :
Mon Sep 11 18:32:25 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A2000               Off | 00000000:13:00.0 Off |                  Off |
| 30%   41C    P8               6W /  70W |      2MiB /  6138MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
 
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
[root@docker ~]# docker images
REPOSITORY    TAG                          IMAGE ID       CREATED        SIZE
nvidia/cuda   11.8.0-runtime-ubuntu22.04   af0cef3d3ee9   2 months ago   2.65GB
[root@docker ~]#

特定のユーザがdockerを実行できるようにする

単にグループ「docker」にユーザを加えればいいです

[root@docker ~]# id illya
id: ‘illya’: no such user
 
[root@docker ~]# cat /etc/subuid
[root@docker ~]#
[root@docker ~]# id illya
uid=1000(illya) gid=1000(illya) groups=1000(illya)
[root@docker ~]# useradd -m illya
[root@docker ~]# passwd illya
Changing password for user illya.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@docker ~]#
[root@docker ~]# cat /etc/subuid
illya:100000:65536
[root@docker ~]# cat /etc/subgid
illya:100000:65536
[root@docker ~]#

ユーザを作成すると /etc/subuid と /etc/subgid に値が入る.

[root@docker ~]# su - illya
 
[illya@docker ~]$ id
uid=1000(illya) gid=1000(illya) groups=1000(illya) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
 
[illya@docker ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.
[illya@docker ~]$

そのままでは権限がないみたい. なのでユーザを docker グループに加える

[root@docker ~]# usermod -aG docker illya
[root@docker ~]# id illya
uid=1000(illya) gid=1000(illya) groups=1000(illya),986(docker)
 
[root@docker ~]# su - illya
 
[illya@docker ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
 
==========
== CUDA ==
==========
 
CUDA Version 11.8.0
 
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
 
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
 
GPU 0: NVIDIA RTX A2000 (UUID: GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084)
 
[illya@docker ~]$
[illya@docker ~]$ docker images
REPOSITORY    TAG                          IMAGE ID       CREATED        SIZE
nvidia/cuda   11.8.0-runtime-ubuntu22.04   af0cef3d3ee9   2 months ago   2.65GB
[illya@docker ~]$

っで使えるようになった.

rootless

rootではなく自分のプロセスとしてdockerを動かしてみる. 前段はrootアカウントでdockerデーモンが動いてます.
このrootで動いているdockerデーモンを停止します

[root@docker ~]# systemctl stop docker.socket
[root@docker ~]# systemctl disable docker.socket
 
[root@docker ~]# vigr   <--- dockerグループから illya を削除
 
[root@docker ~]# reboot  <--- 「/var/run/docker/」や「/var/run/docker.sock」を削除させるために

ここでは
「dnf install docker-ce」の際に「docker-ce-rootless-extras」パッケージがインストールされていてこれを使う
「/etc/subuid」「/etc/subgid」はユーザアカウントを作成した際に追加される. だがnisクライアントとかになると手動で対処かな.

っで設定します. 単に「dockerd-rootless-setuptool.sh」を実行します

[illya@docker ~]$ dockerd-rootless-setuptool.sh --help
Usage: /usr/bin/dockerd-rootless-setuptool.sh [OPTIONS] COMMAND
 
A setup tool for Rootless Docker (dockerd-rootless.sh).
 
Documentation: https://docs.docker.com/go/rootless/
 
Options:
  -f, --force                Ignore rootful Docker (/var/run/docker.sock)
      --skip-iptables        Ignore missing iptables
 
Commands:
  check        Check prerequisites
  install      Install systemd unit (if systemd is available) and show how to manage the service
  uninstall    Uninstall systemd unit
 
[illya@docker ~]$
[illya@docker ~]$ lsmod |grep ip_tables | wc -l 
0
[illya@docker ~]$
[illya@docker ~]$ dockerd-rootless-setuptool.sh --skip-iptables install
[INFO] Creating /home/illya/.config/systemd/user/docker.service
[INFO] starting systemd service docker.service
+ systemctl --user start docker.service
+ sleep 3
+ systemctl --user --no-pager --full status docker.service
● docker.service - Docker Application Container Engine (Rootless)
     Loaded: loaded (/home/illya/.config/systemd/user/docker.service; disabled; preset: disabled)
     Active: active (running) since Wed 2023-09-13 03:35:24 JST; 3s ago
       Docs: https://docs.docker.com/go/rootless/
   Main PID: 1037 (rootlesskit)
      Tasks: 39
     Memory: 176.2M
        CPU: 173ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/docker.service
             tq1037 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh --iptables=false
             tq1049 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh --iptables=false
             tq1070 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 1049 tap0
             tq1077 dockerd --iptables=false
             mq1096 containerd --config /run/user/1000/docker/containerd/containerd.toml
+ DOCKER_HOST=unix:///run/user/1000/docker.sock
+ /usr/bin/docker version
Client: Docker Engine - Community
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:33:18 2023
 OS/Arch:           linux/amd64
 Context:           default
 
Server: Docker Engine - Community
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:49 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
 rootlesskit:
  Version:          1.1.1
  ApiVersion:       1.1.1
  NetworkDriver:    slirp4netns
  PortDriver:       builtin
  StateDir:         /tmp/rootlesskit3872238327
 slirp4netns:
  Version:          1.2.0
  GitCommit:        656041d45cfca7a4176f6b7eed9e4fe6c11e8383
+ systemctl --user enable docker.service
Created symlink /home/illya/.config/systemd/user/default.target.wants/docker.service → /home/illya/.config/systemd/user/docker.service.
[INFO] Installed docker.service successfully.
[INFO] To control docker.service, run: `systemctl --user (start|stop|restart) docker.service`
[INFO] To run docker.service on system startup, run: `sudo loginctl enable-linger illya`
 
[INFO] Creating CLI context "rootless"
Successfully created context "rootless"
[INFO] Using CLI context "rootless"
Current context is now "rootless"
 
[INFO] Make sure the following environment variable(s) are set (or add them to ~/.bashrc):
export PATH=/usr/bin:$PATH
 
[INFO] Some applications may require the following environment variable too:
export DOCKER_HOST=unix:///run/user/1000/docker.sock
 
[illya@docker ~]$
[illya@docker ~]$ export DOCKER_HOST=unix:///run/user/1000/docker.sock
[illya@docker ~]$ echo "export DOCKER_HOST=unix:///run/user/1000/docker.sock" >> ~/.bashrc        <--環境設定

dockerのコマンド自体は /usr/bin に配置されているdockerを使ってます

これでユーザ権限dockerが稼働してます

[illya@docker ~]$ systemctl --user status docker
● docker.service - Docker Application Container Engine (Rootless)
     Loaded: loaded (/home/illya/.config/systemd/user/docker.service; enabled; preset: disabled)
     Active: active (running) since Wed 2023-09-13 03:35:24 JST; 1min 14s ago
       Docs: https://docs.docker.com/go/rootless/
   Main PID: 1037 (rootlesskit)
      Tasks: 39
     Memory: 176.8M
        CPU: 428ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/docker.service
             tq1037 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver>
             tq1049 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-dri>
             tq1070 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 1049 tap0
             tq1077 dockerd --iptables=false
             mq1096 containerd --config /run/user/1000/docker/containerd/containerd.toml
[illya@docker ~]$

dockerのimageとかは「~/.local/share/docker/」に置かれます

[illya@docker ~]$ ls -l ~/.local/share/docker/
total 48
drwx--x--x. 4 illya illya 4096 Sep 13 03:35 buildkit
drwx--x--x. 3 illya illya 4096 Sep 13 03:35 containerd
drwx--x---. 2 illya illya 4096 Sep 13 03:35 containers
-rw-------. 1 illya illya   36 Sep 13 03:35 engine-id
drwx--x---. 3 illya illya 4096 Sep 13 03:35 fuse-overlayfs
drwx------. 3 illya illya 4096 Sep 13 03:35 image
drwxr-x---. 3 illya illya 4096 Sep 13 03:35 network
drwx------. 4 illya illya 4096 Sep 13 03:35 plugins
drwx------. 2 illya illya 4096 Sep 13 03:35 runtimes
drwx------. 2 illya illya 4096 Sep 13 03:35 swarm
drwx------. 2 illya illya 4096 Sep 13 03:35 tmp
drwx-----x. 2 illya illya 4096 Sep 13 03:35 volumes
[illya@docker ~]$

っでテスト

[illya@docker ~]$ which docker
/usr/bin/docker
 
[illya@docker ~]$ docker --version
Docker version 24.0.6, build ed223bc
 
[illya@docker ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
[illya@docker ~]$

残念ながらエラーになる. どうもGPUを利用するような「--gpus all」を入れるとダメ見たい.

ここからhttps://github.com/NVIDIA/nvidia-docker/issues/1447から行けそうな気がするのだがダメ見たい.
っで「systemd.unified_cgroup_hierarchy=0」を組んでもダメ.
「/etc/nvidia-container-runtime/config.toml」の「#no-cgroups = false」を「no-cgroups = true」にすると回避された.

[illya@docker ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
 :
 :
GPU 0: NVIDIA RTX A2000 (UUID: GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084)
[illya@docker ~]$
最新の60件
2024-10-11 2024-10-10 2024-10-09 2024-10-08 2024-10-06 2024-10-05 2024-10-04 2024-10-03 2024-10-02 2024-10-01 2024-09-30 2024-09-29 2024-09-28 2024-09-27 2024-09-22 2024-09-20 2024-09-17 2024-09-12 2024-09-09 2024-09-08 2024-09-06 2024-09-05 2024-09-04 2024-09-02 2024-09-01 2024-08-31 2024-08-28 2024-08-18 2024-08-17 2024-08-16 2024-08-15 2024-08-14 2024-08-11

edit


トップ   編集 差分 履歴 添付 複製 名前変更 リロード   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2023-09-14 (木) 01:13:20