過去ページ docker200322
nvidiaカードが入ったマシンにdockerを入れて、そのコンテナでGPU計算を行ってみる
docker/run
docker/Dockerfile
作ったdockerイメージ/コンテナを 別の計算機で運用する docker/export-import
インストールする計算機はこんな感じ物
[root@rockylinux9 ~]# cat /etc/redhat-release
Rocky Linux release 9.6 (Blue Onyx)
[root@rockylinux9 ~]# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 570.181 Release Build (dvs-builder@U22-I3-AF02-20-5) Wed Jul 30 18:41:07 UTC 2025
GCC version: gcc version 11.5.0 20240719 (Red Hat 11.5.0-5) (GCC)
[root@rockylinux9 ~]# ls -l /usr/local/cuda
ls: cannot access /usr/local/cuda: No such file or directory <-- cudaライブラリは入れていない
[root@rockylinux9 ~]#まずOS提供ではなく docker 側で提供するリポジトリからdockerを入れる
[root@rockylinux9 ~]# dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
[root@rockylinux9 ~]# ls -l /etc/yum.repos.d/
total 24
-rw-r--r--. 1 root root 1919 Aug 13 01:55 docker-ce.repo <--- 追加された
-rw-r--r--. 1 root root 6610 May 17 12:07 rocky-addons.repo
-rw-r--r--. 1 root root 1165 May 17 12:07 rocky-devel.repo
-rw-r--r--. 1 root root 2387 May 17 12:07 rocky-extras.repo
-rw-r--r--. 1 root root 3417 May 17 12:07 rocky.repo
[root@rockylinux9 ~]#一応yumで調べると、「docker.x86_64」はOS提供のリポジトリから得られるdockerのようで、
今回は「docker-ce.x86_64」を入れます。こちらは「docker-ce」側で提供してパッケージみたい
[root@rockylinux9 ~]# dnf install docker-ce docker-ce-cli containerd.io
(同時に docker-buildx-plugin、docker-ce-rootless-extras、docker-compose-plugin もインストールされる)
[root@rockylinux9 ~]# systemctl enable docker --now
[root@rockylinux9 ~]# dnf config-manager --disable docker-ce-stable一応バージョン確認
[root@rockylinux9 ~]# docker --version
Docker version 28.3.3, build 980b856
[root@rockylinux9 ~]#次に「NVIDIA Container Toolkit」をインストールします
本家様 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html
インストール手順https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
次にdockerにnvidiaのツールキットを載せます.
(リポジトリのインストール)
[root@rockylinux9 ~]# curl -s -o /etc/yum.repos.d/nvidia-container-toolkit.repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
(NVIDIA Container Toolkitのインストール)
[root@rockylinux9 ~]# dnf install nvidia-container-toolkit
(「libnvidia-container-tools」と「libnvidia-container1」「nnvidia-container-toolkit-base」が同時にインストールされる)
[root@rockylinux9 ~]# nvidia-ctk runtime configure --runtime=docker
INFO[0000] Config file does not exist; using empty config
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
[root@rockylinux9 ~]# systemctl restart docker
[root@rockylinux9 ~]# dnf config-manager --disable nvidia-container-toolkitnvidia-docker.repo の中身はこちらでdocker/NVIDIAContainerToolkit
ここでちょいとテスト
[root@rockylinux9 ~]# nvidia-container-cli info
NVRM version: 570.181
CUDA version: 12.8
Device Index: 0
Device Minor: 0
Model: NVIDIA RTX A2000
Brand: NvidiaRTX
GPU UUID: GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084
Bus Location: 00000000:06:10.0
Architecture: 8.6
[root@rockylinux9 ~]#っで問題ないとかきのよに表示される
「nvidia-container-cli: initialization error: nvml error: driver not loaded」の時はNVIDIA#o13e41e5の「persistence mode」にすれば回避されるみたい.
そしてdockerを使ってのGPUテスト
[root@rockylinux9 ~]# docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi
:
Tue Aug 12 17:29:20 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.181 Driver Version: 570.181 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A2000 Off | 00000000:06:10.0 Off | Off |
| 30% 59C P0 26W / 70W | 0MiB / 6138MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
[root@rockylinux9 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda 11.8.0-runtime-ubuntu22.04 d8fb74ecc8b2 21 months ago 2.65GB
[root@rockylinux9 ~]#単にグループ「docker」にユーザを加えればいいです
[root@rockylinux9 ~]# grep docker /etc/group
docker:x:983:
[root@rockylinux9 ~]# usermod -aG docker saber
[root@rockylinux9 ~]# id saber
uid=1000(saber) gid=1000(saber) groups=1000(saber),983(docker)
[root@rockylinux9 ~]# grep docker /etc/group
docker:x:983:saber
[root@rockylinux9 ~]# su - saber
[saber@rockylinux9 ~]$ id
uid=1000(saber) gid=1000(saber) groups=1000(saber),983(docker) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
:
GPU 0: NVIDIA RTX A2000 (UUID: GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084)
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda 11.8.0-runtime-ubuntu22.04 d8fb74ecc8b2 21 months ago 2.65GB
[saber@rockylinux9 ~]$もしdockerグループに入っていないと下記のようなエラーになります.
[root@rockylinux9 ~]# usermod -G `id -ng saber` saber
[root@rockylinux9 ~]# id saber
uid=1000(saber) gid=1000(saber) groups=1000(saber)
[root@rockylinux9 ~]# grep docker /etc/group
docker:x:983:
[root@rockylinux9 ~]# su - saber
[saber@rockylinux9 ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Head "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
Run 'docker run --help' for more information
[saber@rockylinux9 ~]$ docker images
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Head "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
[saber@rockylinux9 ~]$「/var/run/docker.sock」にアクセスできる権限がないのでdockerが使えない. なのでusermod でアカウントを docker に組み込みます。
追加は「usermod -aG docker <アカウント>」. 削除は「usermod -G `id -ng <アカウント>` <アカウント>」で
前段は、dockerグループに利用者のアカウントを追加してdockerを利用する方法ですが、 docker 自身はroot権限で動きます.
rootless dockerは、dockerを利用者のアカウント権限で動かします
rootless dockerを利用するには、利用者のアカウントが、「/etc/subuid」「/etc/subgid」にあるかが大事. アカウントをその計算機で作れば自動的に記載されますが、
nisとか外部のアカウント情報を参照していると記載されていないので手動で修正する必要があります。
この2つのファイルはdockerのインストールに関係なくはじめから用意されています
[root@rockylinux9 ~]# id saber
uid=1000(saber) gid=1000(saber) groups=1000(saber)
[root@rockylinux9 ~]# cat /etc/subuid
saber:100000:65536
[root@rockylinux9 ~]# cat /etc/subgid
saber:100000:65536
[root@rockylinux9 ~]#そして
「dnf install docker-ce」の際に「docker-ce-rootless-extras」パッケージがインストールされていて、この中にある
「dockerd-rootless-setuptool.sh」をdockerを利用したいアカウントで実行します
sshでマシンにログインしてください. 「su - <アカウント>」では「dockerd-rootless-setuptool.sh」が正しく機能しませんので.
[saber@rockylinux9 ~]$ id
id=1000(saber) gid=1000(saber) groups=1000(saber) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 <-- グループ docker から外れてます
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ lsmod |grep ip_tables | wc -l
0
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ dockerd-rootless-setuptool.sh --skip-iptables install
[INFO] Creating /home/saber/.config/systemd/user/docker.service
[INFO] starting systemd service docker.service
+ systemctl --user start docker.service
+ sleep 3
+ systemctl --user --no-pager --full status docker.service
● docker.service - Docker Application Container Engine (Rootless)
Loaded: loaded (/home/saber/.config/systemd/user/docker.service; disabled; preset: disabled)
Active: active (running) since Wed 2025-08-13 03:54:12 JST; 3s ago
Docs: https://docs.docker.com/go/rootless/
Main PID: 4317 (rootlesskit)
Tasks: 38
Memory: 62.5M
CPU: 283ms
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/docker.service
tq4317 rootlesskit --state-dir=/run/user/1000/dockerd-rootless --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh
tq4330 /proc/self/exe --state-dir=/run/user/1000/dockerd-rootless --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh
tq4354 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 4330 tap0
tq4363 dockerd
mq4385 containerd --config /run/user/1000/docker/containerd/containerd.toml
+ DOCKER_HOST=unix:///run/user/1000/docker.sock
+ /usr/bin/docker version
Client: Docker Engine - Community
Version: 28.3.3
API version: 1.51
Go version: go1.24.5
Git commit: 980b856
Built: Fri Jul 25 11:37:02 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.3.3
API version: 1.51 (minimum version 1.24)
Go version: go1.24.5
Git commit: bea959c
Built: Fri Jul 25 11:33:59 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.27
GitCommit: 05044ec0a9a75232cad458027ca83437aae3f4da
runc:
Version: 1.2.5
GitCommit: v1.2.5-0-g59923ef
docker-init:
Version: 0.19.0
GitCommit: de40ad0
rootlesskit:
Version: 2.3.4
ApiVersion: 1.1.1
NetworkDriver: slirp4netns
PortDriver: builtin
StateDir: /run/user/1000/dockerd-rootless
slirp4netns:
Version: 1.3.2
GitCommit: 0f13345bcef588d2bb70d662d41e92ee8a816d85
+ systemctl --user enable docker.service
Created symlink /home/saber/.config/systemd/user/default.target.wants/docker.service → /home/saber/.config/systemd/user/docker.service.
[INFO] Installed docker.service successfully.
[INFO] To control docker.service, run: `systemctl --user (start|stop|restart) docker.service`
[INFO] To run docker.service on system startup, run: `sudo loginctl enable-linger saber`
[INFO] Creating CLI context "rootless"
Successfully created context "rootless"
[INFO] Using CLI context "rootless"
Current context is now "rootless"
[INFO] Make sure the following environment variable(s) are set (or add them to ~/.bashrc):
export PATH=/usr/bin:$PATH
[INFO] Some applications may require the following environment variable too: <-- ここに従って.bashrcに設定を施します
export DOCKER_HOST=unix:///run/user/1000/docker.sock
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ export DOCKER_HOST=unix:///run/user/1000/docker.sock
[saber@rockylinux9 ~]$ echo "export DOCKER_HOST=unix:///run/user/1000/docker.sock" >> ~/.bashrcこれでユーザ権限dockerが稼働してます
[saber@rockylinux9 ~]$ systemctl --user status docker <-- 自身のdockerデーモンの確認
● docker.service - Docker Application Container Engine (Rootless)
Loaded: loaded (/home/saber/.config/systemd/user/docker.service; enabled; preset: disabled)
Active: active (running) since Wed 2025-08-13 03:54:12 JST; 2min 44s ago
Docs: https://docs.docker.com/go/rootless/
Main PID: 4317 (rootlesskit)
Tasks: 38
Memory: 62.5M
CPU: 338ms
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/docker.service
tq4317 rootlesskit --state-dir=/run/user/1000/dockerd-rootless --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto (略
tq4330 /proc/self/exe --state-dir=/run/user/1000/dockerd-rootless --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto (略
tq4354 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 4330 tap0
tq4363 dockerd
mq4385 containerd --config /run/user/1000/docker/containerd/containerd.toml
[saber@rockylinux9 ~]$dockerのimageとかは「~/.local/share/docker/」に置かれます
これでユーザがログインした時にユーザ専用の docker が動きます。ログアウトするとユーザ専用のdockerは停止します。
[saber@rockylinux9 ~]$ which docker
/usr/bin/docker
[saber@rockylinux9 ~]$ docker --version
Docker version 28.3.3, build 980b856
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
:
Status: Downloaded newer image for nvidia/cuda:11.8.0-runtime-ubuntu22.04
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: (略
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown
:
[saber@rockylinux9 ~]$残念ながらエラーになる.
一応「/etc/nvidia-container-runtime/config.toml」で下記のような修正を行うと行ける. cgroupを使わないって...GPUリソース管理が出来なくなる?. slurm/openpbsではどうなるの?ってやや心配
|
その上で再度実行すると通ります。
[saber@rockylinux9 ~]$ docker run --gpus all --rm nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi -L
:
GPU 0: NVIDIA RTX A2000 (UUID: GPU-23cc3ee7-31d3-a068-2f61-5aa00052d084)
[saber@rockylinux9 ~]$追加したなら削除も記載しないと.
[saber@rockylinux9 ~]$ systemctl --user stop docker
[saber@rockylinux9 ~]$ systemctl --user disable docker
Removed "/home/saber/.config/systemd/user/default.target.wants/docker.service".
[saber@rockylinux9 ~]$ dockerd-rootless-setuptool.sh --skip-iptables uninstall
+ systemctl --user stop docker.service
+ systemctl --user disable docker.service
[INFO] Uninstalled docker.service
[INFO] Deleted CLI context "rootless"
Current context is now "default"
[INFO] Configured CLI to use the "default" context.
[INFO]
[INFO] Make sure to unset or update the environment PATH, DOCKER_HOST, and DOCKER_CONTEXT environment variables if you have added them to `~/.bashrc`.
[INFO] This uninstallation tool does NOT remove Docker binaries and data.
[INFO] To remove data, run: `/usr/bin/rootlesskit rm -rf /home/saber/.local/share/docker`
[saber@rockylinux9 ~]$
[saber@rockylinux9 ~]$ /usr/bin/rootlesskit rm -rf /home/saber/.local/share/docker
[saber@rockylinux9 ~]$ vi ~/.bashrc
「export DOCKER_HOST=unix:///run/user/1000/docker.sock」の削除
[saber@rockylinux9 ~]$これで外れます.