From 3cec9011a70ff32cabac27606caeb0fbf570474f Mon Sep 17 00:00:00 2001 From: james Date: Mon, 10 Nov 2025 17:45:55 +0800 Subject: [PATCH 1/5] refactor: rename hami-scheduler-device.yaml Signed-off-by: james --- hami-scheduler-device.yaml => ascend-device-configmap.yaml | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename hami-scheduler-device.yaml => ascend-device-configmap.yaml (100%) diff --git a/hami-scheduler-device.yaml b/ascend-device-configmap.yaml similarity index 100% rename from hami-scheduler-device.yaml rename to ascend-device-configmap.yaml From ad8efcf04d691d7a50c46e665e80c013bca1d6a4 Mon Sep 17 00:00:00 2001 From: james Date: Mon, 10 Nov 2025 18:37:24 +0800 Subject: [PATCH 2/5] docs: update readme Signed-off-by: james --- README.md | 68 +++++++++++++++++++++++---------------------- config.yaml | 79 ----------------------------------------------------- 2 files changed, 36 insertions(+), 111 deletions(-) delete mode 100644 config.yaml diff --git a/README.md b/README.md index e1a9d86..d6c86f8 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,13 @@ ## Introduction -This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) scheduling. +This Ascend device plugin is implemented for [HAMi](https://github.com/Project-HAMi/HAMi) and [volcano](https://github.com/volcano-sh/volcano) scheduling. -Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [templeate](./config.yaml) +Memory slicing is supported based on virtualization template, lease available template is automatically used. For detailed information, check [template](./ascend-device-configmap.yaml) ## Prerequisites -[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime) +[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime) ## Compile @@ -24,51 +24,32 @@ docker buildx build -t $IMAGE_NAME . ## Deployment -Due to dependencies with HAMi, you need to set +### Label Ascend Node -``` -devices.ascend.enabled=true -``` - -during HAMi installation. For more details, see 'devices' section in values.yaml. -```yaml -devices: - ascend: - enabled: true - image: "ascend-device-plugin:master" - imagePullPolicy: IfNotPresent - extraArgs: [] - nodeSelector: - ascend: "on" - tolerations: [] - resources: - - huawei.com/Ascend910A - - huawei.com/Ascend910A-memory - - huawei.com/Ascend910B - - huawei.com/Ascend910B-memory - - huawei.com/Ascend310P - - huawei.com/Ascend310P-memory ``` +kubectl label node {ascend-node} ascend=on +``` -Note that resources here(hawei.com/Ascend910A,huawei.com/Ascend910B,...) is managed in hami-scheduler-device configMap. It defines three different templates(910A,910B,310P). - -label your NPU nodes with 'ascend=on' +### Deploy Configmap ``` -kubectl label node {ascend-node} ascend=on +kubectl apply -f ascend-device-configmap.yaml ``` -Deploy ascend-device-plugin by running +### Deploy `ascend-device-plugin` ```bash kubectl apply -f ascend-device-plugin.yaml ``` +If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and Ascend Device Plugin will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend ## Usage -You can allocate a slice of NPU by specifying both resource number and resource memory. For more examples, see [examples](./examples/) +You can allocate a slice of NPU by specifying both resource number and resource memory. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName. + +### Usage in HAMi ```yaml ... @@ -81,3 +62,26 @@ You can allocate a slice of NPU by specifying both resource number and resource # if you don't specify Ascend910B-memory, it will use a whole NPU. huawei.com/Ascend910B-memory: "4096" ``` + For more examples, see [examples](./examples/) + + ### Usage in volcano + + Volcano must be installed prior to usage, for more information see [here](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md) + + ```yaml +apiVersion: v1 +kind: Pod +metadata: + name: ascend-pod +spec: + schedulerName: volcano + containers: + - name: ubuntu-container + image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04 + command: ["sleep"] + args: ["100000"] + resources: + limits: + huawei.com/Ascend310P: "1" + huawei.com/Ascend310P-memory: "4096" + ``` \ No newline at end of file diff --git a/config.yaml b/config.yaml deleted file mode 100644 index 945e692..0000000 --- a/config.yaml +++ /dev/null @@ -1,79 +0,0 @@ -vnpus: -- chipName: 910A - commonWord: Ascend910A - resourceName: huawei.com/Ascend910A - resourceMemoryName: huawei.com/Ascend910A-memory - memoryAllocatable: 32768 - memoryCapacity: 32768 - aiCore: 30 - templates: - - name: vir02 - memory: 2184 - aiCore: 2 - - name: vir04 - memory: 4369 - aiCore: 4 - - name: vir08 - memory: 8738 - aiCore: 8 - - name: vir16 - memory: 17476 - aiCore: 16 -- chipName: 910B3 - commonWord: Ascend910B3 - resourceName: huawei.com/Ascend910B3 - resourceMemoryName: huawei.com/Ascend910B3-memory - memoryAllocatable: 65536 - memoryCapacity: 65536 - aiCore: 20 - aiCPU: 7 - templates: - - name: vir05_1c_16g - memory: 16384 - aiCore: 5 - aiCPU: 1 - - name: vir10_3c_32g - memory: 32768 - aiCore: 10 - aiCPU: 3 -- chipName: 310P3 - commonWord: Ascend310P - resourceName: huawei.com/Ascend310P - resourceMemoryName: huawei.com/Ascend310P-memory - memoryAllocatable: 21527 - memoryCapacity: 24576 - aiCore: 8 - aiCPU: 7 - templates: - - name: vir01 - memory: 3072 - aiCore: 1 - aiCPU: 1 - - name: vir02 - memory: 6144 - aiCore: 2 - aiCPU: 2 - - name: vir04 - memory: 12288 - aiCore: 4 - aiCPU: 4 -- chipName: 910ProB - commonWord: Ascend910ProB - resourceName: huawei.com/Ascend910ProB - resourceMemoryName: huawei.com/Ascend910ProB-memory - memoryAllocatable: 32768 - memoryCapacity: 32768 - aiCore: 30 - templates: - - name: vir02 - memory: 2184 - aiCore: 2 - - name: vir04 - memory: 4369 - aiCore: 4 - - name: vir08 - memory: 8738 - aiCore: 8 - - name: vir16 - memory: 17476 - aiCore: 16 \ No newline at end of file From f6d5a9536851b505dabdf4fbdb60e2aaf75a435e Mon Sep 17 00:00:00 2001 From: james Date: Mon, 10 Nov 2025 18:45:29 +0800 Subject: [PATCH 3/5] docs: update Signed-off-by: james --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d6c86f8..1459dd6 100644 --- a/README.md +++ b/README.md @@ -24,14 +24,14 @@ docker buildx build -t $IMAGE_NAME . ## Deployment -### Label Ascend Node +### Label the Node with `ascend=on` ``` kubectl label node {ascend-node} ascend=on ``` -### Deploy Configmap +### Deploy ConfigMap ``` kubectl apply -f ascend-device-configmap.yaml @@ -43,7 +43,7 @@ kubectl apply -f ascend-device-configmap.yaml kubectl apply -f ascend-device-plugin.yaml ``` -If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and Ascend Device Plugin will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend +If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to true when deploying HAMi, and the ConfigMap and `ascend-device-plugin` will be automatically deployed. refer https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend ## Usage From 79aeee5aa2c6f9868469e6583a147424ac01b169 Mon Sep 17 00:00:00 2001 From: james Date: Mon, 10 Nov 2025 19:00:47 +0800 Subject: [PATCH 4/5] docs: update cn version Signed-off-by: james --- README.md | 2 +- README_cn.md | 73 +++++++++++++++++++++++++++++----------------------- 2 files changed, 42 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 1459dd6..64feba6 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ If scheduling Ascend devices in HAMi, simply set `devices.ascend.enabled` to tru ## Usage -You can allocate a slice of NPU by specifying both resource number and resource memory. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName. +To exclusively use an entire card or request multiple cards, you only need to set the corresponding resourceName. If multiple tasks need to share the same NPU, you need to set the corresponding resource request to 1 and configure the appropriate ResourceMemoryName. ### Usage in HAMi diff --git a/README_cn.md b/README_cn.md index 1dafac1..fe96241 100644 --- a/README_cn.md +++ b/README_cn.md @@ -2,15 +2,15 @@ ## 说明 -基于[HAMi](https://github.com/Project-HAMi/HAMi)调度机制的ascend device plugin。 +Ascend device plugin 是用来支持在 [HAMi](https://github.com/Project-HAMi/HAMi) 和 [volcano](https://github.com/volcano-sh/volcano) 中调度昇腾NPU设备. -支持基于显存调度,显存是基于昇腾的虚拟化模板来切分的,会找到满足显存需求的最小模板来作为容器的显存。模版的具体信息参考[配置模版](./config.yaml) +昇腾NPU虚拟化切分是通过模板来配置的,在调度时会找到满足显存需求的最小模板来作为容器的显存。各芯片的模板配置信息参考[这里](./ascend-device-configmap.yaml) -启动容器依赖[ascend-docker-runtime](https://gitee.com/ascend/ascend-docker-runtime)。 +## 环境要求 -## 编译 +部署 [ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime) -### 编译二进制文件 +## 编译 ```bash make all @@ -24,47 +24,33 @@ docker buildx build -t $IMAGE_NAME . ## 部署 -由于和HAMi的一些依赖关系,部署集成在HAMi的部署中,指定以下字段: - -``` -devices.ascend.enabled=true -``` +### 给 Node 打 ascend 标签 -相关的每一种NPU设备的资源名,参考values.yaml中的以下字段,目前本组件支持3种型号的NPU切片(310p,910A,910B)若不需要修改的话可以直接使用以下的默认配置: -```yaml -devices: - ascend: - enabled: true - image: "ascend-device-plugin:master" - imagePullPolicy: IfNotPresent - extraArgs: [] - nodeSelector: - ascend: "on" - tolerations: [] - resources: - - huawei.com/Ascend910A - - huawei.com/Ascend910A-memory - - huawei.com/Ascend910B - - huawei.com/Ascend910B-memory - - huawei.com/Ascend310P - - huawei.com/Ascend310P-memory +``` +kubectl label node {ascend-node} ascend=on ``` -将集群中的NPU节点打上如下标签: +### 部署 ConfigMap ``` -kubectl label node {ascend-node} ascend=on +kubectl apply -f ascend-device-configmap.yaml ``` -最后使用以下指令部署ascend-device-plugin +### 部署 `ascend-device-plugin` ```bash kubectl apply -f ascend-device-plugin.yaml ``` +如果要在HAMi中使用升腾NPU, 在部署HAMi时设置 `devices.ascend.enabled` 为 true 会自动部署 ConfigMap 和 `ascend-device-plugin`。 参考 https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/README.md#huawei-ascend + ## 使用 +如果要独占整卡或者申请多张卡只需要设置对应的 resourceName 即可。如果多个任务要共享同一张卡,需要将 resourceName 设置为1,并且设置对应的 ResourceMemoryName。 + +### 在 HAMi 中使用 + ```yaml ... containers: @@ -73,6 +59,29 @@ kubectl apply -f ascend-device-plugin.yaml resources: limits: huawei.com/Ascend910B: "1" - # 不填写显存默认使用整张卡 + # if you don't specify Ascend910B-memory, it will use a whole NPU. huawei.com/Ascend910B-memory: "4096" ``` + For more examples, see [examples](./examples/) + + ### 在 volcano 中使用 + + 在 volcano 中使用时需要提前部署好 volcano, 更多信息请[参考这里](https://github.com/volcano-sh/volcano/tree/master/docs/user-guide/how_to_use_vnpu.md) + + ```yaml +apiVersion: v1 +kind: Pod +metadata: + name: ascend-pod +spec: + schedulerName: volcano + containers: + - name: ubuntu-container + image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04 + command: ["sleep"] + args: ["100000"] + resources: + limits: + huawei.com/Ascend310P: "1" + huawei.com/Ascend310P-memory: "4096" + ``` \ No newline at end of file From e512bbc2764d8d2dd683587e80c8781eaaf4953d Mon Sep 17 00:00:00 2001 From: james Date: Tue, 11 Nov 2025 10:35:06 +0800 Subject: [PATCH 5/5] docs: update readme Signed-off-by: james --- README_cn.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README_cn.md b/README_cn.md index fe96241..156ca53 100644 --- a/README_cn.md +++ b/README_cn.md @@ -59,7 +59,7 @@ kubectl apply -f ascend-device-plugin.yaml resources: limits: huawei.com/Ascend910B: "1" - # if you don't specify Ascend910B-memory, it will use a whole NPU. + # 如果不指定显存大小, 就会使用整张卡 huawei.com/Ascend910B-memory: "4096" ``` For more examples, see [examples](./examples/)