Skip to content

The utilization rate of computing power in the asset overview has always been 100% #64

@g-zhangpp

Description

@g-zhangpp

1.Problem description
I created a pod and set nvidia. com/gpucores=20. When I started the training task in the pod, I saw that the computing power utilization rate was 100% in the HAMI-WEB asset overview. But the correct one should be 20%。

Image

2.Environment Configuration
I configured gpuCorePolicy=force

Image

3.Here is the YAML file where I created the pod

cat test.yml

apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-pod
spec:
replicas: 1
selector:
matchLabels:
app: gpu-app
template:
metadata:
labels:
app: gpu-app
spec:
containers:
- name: simple-container
image: pytorch/pytorch:2.9.0-cuda13.0-cudnn9-runtime
command: ["python", "test_gpu.py"]
env:
- name: LIBCUDA_LOG_LEVEL
value: "4"
resources:
requests:
cpu: "1"
memory: "1Gi"
nvidia.com/gpu: "1"
nvidia.com/gpucores: 20
#nvidia.com/gpumem: "4000"
limits:
cpu: "1"
memory: "1Gi"
nvidia.com/gpu: "1"
nvidia.com/gpucores: 20
#nvidia.com/gpumem: "4000"
volumeMounts:
- name: data-volume
mountPath: /workspace
- name: shm-volume
mountPath: /dev/shm
volumes:
- name: data-volume
hostPath:
path: /root/vgpu
type: Directory
- name: shm-volume
emptyDir:
medium: Memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions