容器内vGPU的使用

摘要：# 启动容器，分配 vGPU 实例（假设使用 vGPU 0 和 1）docker run --gpus '"device=0,1"' --rm -it my-vgpu-image

# 在宿主机上，将物理 GPU 划分为 vGPU # 假设将 GPU 0 划分为 4 个 1GB 显存的 vGPU nvidia-smi vgpu -i 0 -c 4 -g 1

构建支持 vGPU 的容器镜像：

dockerfile

# 使用 NVIDIA CUDA 基础镜像FROM nvidia/cuda:11.8.0-base-ubuntu22.04# 安装 Python 和深度学习库RUN apt-get update && apt-get install -y python3 python3-pipRUN pip3 install torch torchvision# 确保容器内的 NVIDIA 驱动兼容性ENV NVIDIA_VISIBLE_DEVICES allENV NVIDIA_DRIVER_CAPABILITIES compute,utility

通过环境变量指定容器使用的 vGPU 资源：

bash

# 启动容器，分配 vGPU 实例（假设使用 vGPU 0 和 1）docker run --gpus '"device=0,1"' --rm -it my-vgpu-image

在容器内运行的 Python 脚本（inference.py），检查 vGPU 并执行计算：

python

import torchdef check_vgpu:if torch.cuda.is_available:device_count = torch.cuda.device_countprint(f"可用 vGPU 数量: {device_count}")for i in range(device_count):print(f"vGPU {i}: {torch.cuda.get_device_name(i)}")else:print("未检测到 vGPU")def run_inference:device = torch.device("cuda:0" if torch.cuda.is_available else "cpu")x = torch.randn(1024, 1024).to(device)y = torch.randn(1024, 1024).to(device)z = torch.matmul(x, y)print(f"矩阵乘法结果（部分）: {z[0][0:5]}")if __name__ == "__main__":check_vgpurun_inferencepython3 inference.py

预期输出：

可用 vGPU 数量: 2vGPU 0: Tesla T4 (GRID vGPU)vGPU 1: Tesla T4 (GRID vGPU)矩阵乘法结果（部分）: tensor([ 12.3456, -5.4321, 3.1415, ...], device='cuda:0')

在 Kubernetes 中通过 Device Plugins 和 Resource Limits 分配 vGPU：

# Kubernetes Pod 定义apiVersion: v1kind: Podmetadata:name: vgpu-podspec:containers:- name: inference-containerimage: my-vgpu-imagecommand: ["python3", "/app/inference.py"]resources:limits:nvidia.com/gpu: 2 # 请求 2 个 vGPU 实例

通过上述步骤，你可以在容器化环境中高效利用 NVIDIA GRID vGPU 资源，适用于 AI 推理、图形渲染等场景。

来源：小茵科技论

标签：容器 grid cuda python3 vgpu

本文地址：http://news.43b.com.cn/a/790678.html

免责声明：本站系转载，并不代表本网赞同其观点和对其真实性负责。如涉及作品内容、版权和其它问题，请在30日内与本站联系，我们将在第一时间删除内容!