gpu搭載計算ノードには個別に /etc/slurm/gres.conf ファイルを設置する

[root@e ~]# cat /etc/slurm/gres.conf
Name=gpu File=/dev/nvidia0
 
[root@e ~]#

ただCPUのNUMA構成によっては GPU と近いcoreを指定したほうが望ましい事もあるようで
その際には「nvidia-smi topo -m」の値から

Name=gpu File=/dev/nvidia0 COREs=0,1,2,3
Name=gpu File=/dev/nvidia1 COREs=4,5,6,7
Name=gpu File=/dev/nvidia2 COREs=8,9,10,11
Name=gpu File=/dev/nvidia3 COREs=12,13,14,15
 
(あるいは)
Name=gpu File=/dev/nvidia[0-1] COREs=0,1,2,3,4,5,6,7
Name=gpu File=/dev/nvidia[2-3] COREs=8,9,10,11,12,13,14,15

とかになる

1枚だとこんな感じ

[root@e ~]# nvidia-smi topo -m
        GPU0    CPU Affinity    NUMA Affinity
GPU0     X      0-3             N/A
 
Legend:
 
  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
[root@e ~]#

トップ   編集 添付 複製 名前変更     ヘルプ   最終更新のRSS
Last-modified: 2021-04-19 (月) 04:07:09 (64d)