diff --git a/keps/prod-readiness/sig-scheduling/5055.yaml b/keps/prod-readiness/sig-scheduling/5055.yaml index eb9caba0126..7661ea804fc 100644 --- a/keps/prod-readiness/sig-scheduling/5055.yaml +++ b/keps/prod-readiness/sig-scheduling/5055.yaml @@ -4,3 +4,5 @@ kep-number: 5055 alpha: approver: "@soltysh" +beta: + approver: "@soltysh" diff --git a/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/README.md b/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/README.md index 77dc83fb4cd..0b48d99cc26 100644 --- a/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/README.md +++ b/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/README.md @@ -122,7 +122,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [x] (R) KEP approvers have approved the KEP status as `implementable` - [x] (R) Design details are appropriately documented - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - - [ ] e2e Tests for all Beta API Operations (endpoints) + - [x] e2e Tests for all Beta API Operations (endpoints) - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free - [x] (R) Graduation criteria is in place @@ -130,8 +130,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [x] (R) Production readiness review completed - [x] (R) Production readiness review approved - [x] "Implementation History" section is up-to-date for milestone -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] -- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes +- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + v1.33.0: +- `k8s.io/dynamic-resource-allocation/resourceslice/tracker`: 65.8% - `k8s.io/dynamic-resource-allocation/structured`: 91.3% - `k8s.io/kubernetes/pkg/apis/resource/validation`: 97.8% - `k8s.io/kubernetes/pkg/controller/devicetainteviction`: 89.9% + + + +v1.35.0-rc.0: + +- `k8s.io/dynamic-resource-allocation/resourceslice/tracker`: 62.1% +- `k8s.io/dynamic-resource-allocation/structured/internal/experimental`: 93.7% +- `k8s.io/kubernetes/pkg/apis/resource/validation`: 96.8% +- `k8s.io/kubernetes/pkg/controller/devicetainteviction`: 87.9% + Test cases that are worth calling out: - Updating DeviceTaintRule status at a reasonable rate @@ -738,8 +753,13 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html --> -Integration tests for the new eviction manager will be useful to ensure that -permissions are correct. +An integration test for the new eviction controller is useful to ensure that +permissions are correct. It also covers evicting a higher number of pods than +what would be possible in an E2E test. + +- source code: https://github.com/kubernetes/kubernetes/blob/v1.35.0-rc.0/test/integration/dra/device_taints_test.go +- job: https://testgrid.k8s.io/sig-release-master-blocking#integration-master&include-filter-by-regex=dra.dra +- triage: https://storage.googleapis.com/k8s-triage/index.html?text=EvictCluster&job=integration&test=dra ##### e2e tests @@ -758,6 +778,10 @@ scheduling. Adding a taint in a ResourceSlice must evict a running pod. Same for adding a taint through a DeviceTaintRule. A toleration for a NoExecute taint must allow a pod to run. +- source code: https://github.com/kubernetes/kubernetes/blob/496077da56dceb7a68c4715a01670e2a5fa582e8/test/e2e/dra/dra.go#L1968-L2084 +- job: https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#ci-kind-dra-all&include-filter-by-regex=DRADeviceTaints +- triage: https://storage.googleapis.com/k8s-triage/index.html?test=DRADeviceTaints + ### Graduation Criteria #### Alpha @@ -818,6 +842,11 @@ updates. - kube-apiserver - kube-scheduler - kube-controller-manager + - Feature gate name: DRADeviceTaintRules + - Components depending on the feature gate: + - kube-apiserver + - kube-scheduler + - kube-controller-manager - [X] Other - Describe the mechanism: resource.k8s.io/v1alpha3 API group - Will enabling / disabling the feature require downtime of the control @@ -844,36 +873,31 @@ This will be covered through unit tests for the apiserver and scheduler. ### Rollout, Upgrade and Rollback Planning - - ###### How can a rollout or rollback fail? Can it impact already running workloads? - +During a partial rollout (feature enabled on at least one API server before it +is also enabled in kube-scheduler and kube-controller-manager) a user or DRA +driver might taint devices before the rest of the control plane catches up. For +NoExecute, the pods will get evicted eventually. For NoSchedule, pods keep +running. ###### What specific metrics should inform a rollback? - +The normal health metrics of the control plane components need to be monitored +to determine if performance deteriorates. + +If the `scheduler_pending_pods` metric in the kube-scheduler suddenly +increases, then perhaps scheduling no longer works as intended. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? - +Automated upgrade/downgrade testing verifies that: +- A DeviceTaintRule created before a downgrade prevents pod scheduling after a downgrade. +- A pod which gets scheduled because of a toleration after the downgrade + is kept running after an upgrade. ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? @@ -881,6 +905,8 @@ are missing a bunch of machinery and tooling and can't do that now. Even if applying deprecation policies, they may still surprise some users. --> +No. + ### Monitoring Requirements +Usage of DeviceTaintRules can be seen in the apiserver's +`apiserver_resource_objects` metric with labels `group=resource.k8s.io` and +`resource=deviceTaintRules`. + +Usage of taints in ResourceSlices and tolerations in ResourceClaims can +only be observed by inspecting objects. + ###### How can someone using this feature know that it is working for their instance? -- [ ] Events - - Event Reason: -- [ ] API .status - - Condition name: - - Other field: -- [ ] Other (treat as last resort) - - Details: +- [X] API DeviceTaintRule.Status + - Condition name: EvictionInProgress ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? @@ -934,25 +962,23 @@ These goals will help you determine what you need to measure (SLIs) in the next question. --> -###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? +As for normal pod scheduling of pods using ResourceClaims there is no SLO for +scheduling with taints. - +Pod eviction is a best-effort deletion of pods. The goal is that it deletes all +affected pods eventually, with no performance guarantees. + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: -- [ ] Other (treat as last resort) - - Details: +- [X] Metrics + - Metric names: `device_taint_eviction_controller_pod_deletions_total`, + `device_taint_eviction_controller_pod_deletion_duration_seconds`, + `workqueue_*` with name="device-taint-eviction-controller" + - Components exposing the metric: kube-controller-manager ###### Are there any missing metrics that would be useful to have to improve observability of this feature? - +No. ### Dependencies @@ -962,26 +988,11 @@ This section must be completed when targeting beta to a release. ###### Does this feature depend on any specific services running in the cluster? - +No. ### Scalability -Applying taints to devices scales with `number of DeviceTaintRules` * -`number of devices` when CEL selectors need to be evaluated. Without them, -filtering scales with `number of DeviceTaintRules` * `number of +Applying taints to devices scales with `number of DeviceTaintRules` * `number of ResourceSlices` but then may still need to compare device names and of course modify selected devices. @@ -1031,19 +1042,10 @@ No, because the feature is not used on nodes. ### Troubleshooting - - ###### How does this feature react if the API server and/or etcd is unavailable? +Work is halted while the API server or etcd are unavailable and resumes when they come back. + ###### What are other known failure modes? +None known at this point. + ###### What steps should be taken if SLOs are not being met to determine the problem? ## Implementation History - 1.33: first KEP revision and implementation +- 1.35: revised alpha with `effect: None` and DeviceTaintRule status ## Drawbacks diff --git a/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/kep.yaml b/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/kep.yaml index c597ec3ca89..c7c9b53ccb0 100644 --- a/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/kep.yaml +++ b/keps/sig-scheduling/5055-dra-device-taints-and-tolerations/kep.yaml @@ -21,11 +21,13 @@ stage: alpha # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.35" +latest-milestone: "v1.36" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.33" + beta: "v1.36" + stable: "v1.37" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled @@ -35,7 +37,15 @@ feature-gates: - kube-apiserver - kube-scheduler - kube-controller-manager + - name: DRADeviceTaintRules + components: + - kube-apiserver + - kube-scheduler + - kube-controller-manager disable-supported: true # The following PRR answers are required at beta release metrics: + - device_taint_eviction_controller_pod_deletions_total + - device_taint_eviction_controller_pod_deletion_duration_seconds + - workqueue_* with name="device-taint-eviction-controller" in kube-controller-manager