From 4bc6e8f4ee65b067a62de8a0c1995d2cdd9147c4 Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Mon, 1 Dec 2025 08:11:32 -0700 Subject: [PATCH 01/11] feat: checkpoint on proposal --- WORKSPACE_PROPOSAL.md | 123 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 WORKSPACE_PROPOSAL.md diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md new file mode 100644 index 0000000..7554af0 --- /dev/null +++ b/WORKSPACE_PROPOSAL.md @@ -0,0 +1,123 @@ +## **Formal Design Proposal: Workspace CRD** + +This document proposes the creation of a `Workspace` Custom Resource Definition (CRD) and a corresponding Kubernetes Controller. A Workspace represents a grouping of one or more virtual machines (VMs) and/or containers that communicate with each other in an isolated network within the cluster. + +The proposed system will provide a unified, declarative API for deploying complex, multi-component environments. This simplifies the management of applications that require both traditional VMs and modern containerized services. The Workspace controller will orchestrate the creation, networking, and lifecycle of all associated resources, ensuring they are deployed cohesively. + +### 2. Background and Problem Statement +As Open Terrain (OT) environments grow in complexity, there is an increasing need to deploy applications composed of multiple, interconnected components. These "workspaces" often consist of several VMs and containers that need to function as a single logical unit. The current approach to deploying these workspaces requires more knowledge of the guts of workspaces than required for users of the platform. + +### 2.1 Main pain points +- **Resource removal**: Currently it is difficult to know when old resources required to back `Workspaces` are ready to be removed. Removal of these resources could be tied to Workspace Lifecycles. +- **No self healing**: The current approach of deploying Workspaces all at once via a helm chart results in the inability for the platform to attempt to resolve problems without outside interaction. +- **No Unified Status**: It is difficult to determine the overall health and status of a workspace. An admin must manually inspect each individual component to diagnose issues. + +### **3. Proposed Architecture** +The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. It introduces a high-level abstraction for an entire environment. + +The `Workspace` CRD will serve as a blueprint for a complete, isolated environment. It will define all the necessary VMs, containers, and networking rules. The workspace CRD will be a cluster scoped resource. + +### **3.1 Workspace Lifecycle** +The `Workspace` can be in the following phases during its lifecycle. + +- `Provisioning`: The Workspace has being stood up +- `Failed`: Something happened and we cannot recover the workspace. +- `Ready`: The workspace has been successfully provisioned and is ready for use. + +The `Workspace` can have the following conditions. + +- `CreatingWorks` + +**Example `Workspace` Manifest:** +```yaml +apiVersion: "pelotech.ot/v1alpha1" +kind: "Workspace" +metadata: + name: "data-analysis-environment-1" +spec: + # Defines the VMs to be included in the workspace. + # The controller will create a VMDiskImage resource for each entry. + virtualMachines: + - name: "database-vm" + spec: + VMDiskImageName: "database-vmdi" + - name: "analytics-vm" + spec: + VMDiskImageName: "analytics-vmdi" + + # Defines the containers to be included in the workspace. + # The controller will create a Deployment for each entry. + containers: + - name: "api-server" + image: "my-registry/api-server:3.1.0" + ports: + - containerPort: 8080 + protocol: TCP + + # Defines networking policy for the workspace. + network: + # If true, the controller creates a NetworkPolicy to isolate all + # components in this workspace from other workloads in the namespace. + +# The operator manages this section to provide real-time status. +status: + phase: "Provisioning" + message: "Creating resources for workspace." + conditions: + - type: "Ready" + status: "False" + lastTransitionTime: "2025-07-12T10:30:00Z" + # Status of individual components for easy diagnosis. + resourceStatus: + virtualMachines: + - name: "database-vm" + phase: "Succeeded" + - name: "analytics-vm" + phase: "Queued" + containers: + - name: "api-server" + status: "Pending" +``` + +#### **3.3: The Workspace Controller** +The Workspace Controller will orchestrate the creation and management of all resources defined in a `Workspace` manifest. + +**Operator Logic:** +- **Watch for Workspaces**: The operator watches for new `Workspace` resources. +- **Enforce Concurrency**: It adheres to the `workspaceConcurrency` limit defined in the `ConfigMap`. +- **Resource Orchestration**: For each `Workspace` resource, the controller will: + 1. Create a `VMDiskImage` resource for each entry in the `spec.virtualMachines` list. + 2. Create a `Deployment` and `Service` for each entry in the `spec.containers` list. + 3. If `spec.network.isolate` is `true`, create a `NetworkPolicy` that allows traffic only between the pods and VMs belonging to this workspace. +- **Update Status**: The controller provides real-time feedback by updating the `status` field of the `Workspace` resource, aggregating the status of all child resources. + +### **4. End-to-End Controller Workflow** +```mermaid +flowchart TD + subgraph "Setup (Admin)" + A["Admin configures the 'sync-operator-policy' ConfigMap"] + end + subgraph "Event (User/CI)" + B["User or CI system creates a Workspace resource"] + end + subgraph "Workspace Controller Logic" + C{"Controller detects new Workspace"}; + A --> C; + B --> C; + C --> D{"Process Workspace, respecting concurrency limit"}; + D --> E["Create VMDiskImage CRs"]; + D --> F["Create Deployments & Services"]; + D --> G["Create isolating NetworkPolicy"]; + subgraph "VMDiskImage Controller" + E --> H["VMDiskImage controller syncs disks"] + end + G --> I["Update Workspace status with progress"]; + F --> I; + H --> I; + end +``` + +### **5. Future Considerations (Non-MVP)** +- **Inter-Workspace Communication**: Develop a mechanism to define explicit rules for allowing traffic between specific workspaces. +- **Advanced Networking**: Support for more complex network topologies, such as defining specific `Egress` or `Ingress` rules to external services. +- **Templating**: Introduce a templating mechanism to allow for reusable workspace definitions. From e38c9bdcb8f4853cae73f366bf6f13d310fad926 Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Mon, 1 Dec 2025 10:11:47 -0700 Subject: [PATCH 02/11] feat: more updates --- WORKSPACE_PROPOSAL.md | 77 ++++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 45 deletions(-) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index 7554af0..f6e67ea 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -15,19 +15,15 @@ As Open Terrain (OT) environments grow in complexity, there is an increasing nee ### **3. Proposed Architecture** The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. It introduces a high-level abstraction for an entire environment. -The `Workspace` CRD will serve as a blueprint for a complete, isolated environment. It will define all the necessary VMs, containers, and networking rules. The workspace CRD will be a cluster scoped resource. +The `Workspace` CRD will serve as a blueprint for a complete, isolated environment. It will define all the necessary VMs, containers, and networking rules. The introduction of this cluster scoped resource will allow the platform team to have a single interface for OT customers to deploy workspaces to the system. ### **3.1 Workspace Lifecycle** The `Workspace` can be in the following phases during its lifecycle. -- `Provisioning`: The Workspace has being stood up +- `Provisioning`: The Workspace is being stood up. - `Failed`: Something happened and we cannot recover the workspace. - `Ready`: The workspace has been successfully provisioned and is ready for use. -The `Workspace` can have the following conditions. - -- `CreatingWorks` - **Example `Workspace` Manifest:** ```yaml apiVersion: "pelotech.ot/v1alpha1" @@ -39,11 +35,9 @@ spec: # The controller will create a VMDiskImage resource for each entry. virtualMachines: - name: "database-vm" - spec: - VMDiskImageName: "database-vmdi" + vmdiskImageName: "database-vmdi" - name: "analytics-vm" - spec: - VMDiskImageName: "analytics-vmdi" + vmdiskImageName: "analytics-vmdi" # Defines the containers to be included in the workspace. # The controller will create a Deployment for each entry. @@ -82,42 +76,35 @@ status: #### **3.3: The Workspace Controller** The Workspace Controller will orchestrate the creation and management of all resources defined in a `Workspace` manifest. -**Operator Logic:** -- **Watch for Workspaces**: The operator watches for new `Workspace` resources. -- **Enforce Concurrency**: It adheres to the `workspaceConcurrency` limit defined in the `ConfigMap`. -- **Resource Orchestration**: For each `Workspace` resource, the controller will: - 1. Create a `VMDiskImage` resource for each entry in the `spec.virtualMachines` list. - 2. Create a `Deployment` and `Service` for each entry in the `spec.containers` list. - 3. If `spec.network.isolate` is `true`, create a `NetworkPolicy` that allows traffic only between the pods and VMs belonging to this workspace. -- **Update Status**: The controller provides real-time feedback by updating the `status` field of the `Workspace` resource, aggregating the status of all child resources. ### **4. End-to-End Controller Workflow** ```mermaid -flowchart TD - subgraph "Setup (Admin)" - A["Admin configures the 'sync-operator-policy' ConfigMap"] - end - subgraph "Event (User/CI)" - B["User or CI system creates a Workspace resource"] - end - subgraph "Workspace Controller Logic" - C{"Controller detects new Workspace"}; - A --> C; - B --> C; - C --> D{"Process Workspace, respecting concurrency limit"}; - D --> E["Create VMDiskImage CRs"]; - D --> F["Create Deployments & Services"]; - D --> G["Create isolating NetworkPolicy"]; - subgraph "VMDiskImage Controller" - E --> H["VMDiskImage controller syncs disks"] - end - G --> I["Update Workspace status with progress"]; - F --> I; - H --> I; - end -``` + stateDiagram-v2 + direction TB + + [*] --> NewWorkspaceDetected + state "Check VMDiskImages" as CheckVMDI + NewWorkspaceDetected --> CheckVMDI + + %% Creation Path + CheckVMDI --> StandUpVMDIs : VMDIs Missing + CheckVMDI --> Provisioning : VMDIs Exist + + StandUpVMDIs --> Provisioning : VMDIs Ready + + Provisioning --> WorkspaceReady : Success -### **5. Future Considerations (Non-MVP)** -- **Inter-Workspace Communication**: Develop a mechanism to define explicit rules for allowing traffic between specific workspaces. -- **Advanced Networking**: Support for more complex network topologies, such as defining specific `Egress` or `Ingress` rules to external services. -- **Templating**: Introduce a templating mechanism to allow for reusable workspace definitions. + %% Deletion Path + WorkspaceReady --> WorkspaceDeleted : Delete Triggered + + state "Check VMDI References" as CheckRef + + WorkspaceDeleted --> CheckRef + + CheckRef --> RemoveVMDIs : Last Reference + CheckRef --> FinalizeCleanup : Reference Exists + + RemoveVMDIs --> FinalizeCleanup + + FinalizeCleanup --> [*] +``` From 8221d31bf4262609b90f07f581071704208d3365 Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Tue, 2 Dec 2025 14:32:54 -0700 Subject: [PATCH 03/11] feat: some cleanup --- WORKSPACE_PROPOSAL.md | 176 +++++++++++++++++++++++++++++++----------- 1 file changed, 129 insertions(+), 47 deletions(-) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index f6e67ea..354a5c4 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -13,7 +13,7 @@ As Open Terrain (OT) environments grow in complexity, there is an increasing nee - **No Unified Status**: It is difficult to determine the overall health and status of a workspace. An admin must manually inspect each individual component to diagnose issues. ### **3. Proposed Architecture** -The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. It introduces a high-level abstraction for an entire environment. +The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. The `Workspace` CRD will serve as a blueprint for a complete, isolated environment. It will define all the necessary VMs, containers, and networking rules. The introduction of this cluster scoped resource will allow the platform team to have a single interface for OT customers to deploy workspaces to the system. @@ -29,55 +29,129 @@ The `Workspace` can be in the following phases during its lifecycle. apiVersion: "pelotech.ot/v1alpha1" kind: "Workspace" metadata: - name: "data-analysis-environment-1" + name: "demo-workspace-1" spec: - # Defines the VMs to be included in the workspace. - # The controller will create a VMDiskImage resource for each entry. - virtualMachines: - - name: "database-vm" - vmdiskImageName: "database-vmdi" - - name: "analytics-vm" - vmdiskImageName: "analytics-vmdi" - - # Defines the containers to be included in the workspace. - # The controller will create a Deployment for each entry. - containers: - - name: "api-server" - image: "my-registry/api-server:3.1.0" - ports: - - containerPort: 8080 - protocol: TCP - - # Defines networking policy for the workspace. - network: - # If true, the controller creates a NetworkPolicy to isolate all - # components in this workspace from other workloads in the namespace. - -# The operator manages this section to provide real-time status. -status: - phase: "Provisioning" - message: "Creating resources for workspace." - conditions: - - type: "Ready" - status: "False" - lastTransitionTime: "2025-07-12T10:30:00Z" - # Status of individual components for easy diagnosis. - resourceStatus: - virtualMachines: - - name: "database-vm" - phase: "Succeeded" - - name: "analytics-vm" - phase: "Queued" - containers: - - name: "api-server" - status: "Pending" + vms: + - baseVm: ubuntu_2004_lts_en-us_x64 + baseVmVersion: 2.1.2 + ignoreOnDeploy: true + name: demo-vm + version: 2.1.0 + backingVMDiskImage: demo-vmdi + resources: + cpu: '2' + memory: 2Gi + diskSize: 18Gi + interfaces: + - network: control-net + ipAddress: 10.10.0.161/24 + - network: bridge-inet + ipAddress: 4.29.163.6/28 + - network: bridge-edge + ipAddress: 172.27.11.11/28 + ansible: + roles: + build: + - role: specialize + - role: linuxRouter + variables: + enable_nat: true + nat_out_iface_idx: 1 + deploy: + - role: runtimeChecks + containerClusters: + - ignoreOnDeploy: false + name: demo-app + interfaces: + - network: bridge-inet + ipAddress: 4.29.163.7/28 + containers: + - name: chef + image: 'ghcr.io/demo/demo-containers/demo-app:1.0.2' + resources: + cpu: 128m + memory: 256Mi + capabilities: + drop: + - ALL + add: + - NET_BIND_SERVICE + volumeMounts: + - name: dnsconfig + containerPath: /config + readOnly: true + portMappings: + - containerPort: 53 + protocol: udp + hostPort: 53 + - ignoreOnDeploy: false + name: ansible + interfaces: + - network: control-net + ipAddress: 10.10.0.141/24 + containers: + - name: ansible + image: 'ghcr.io/demo/demo-containers/ansible:2.5.14' + resources: + cpu: 384m + memory: 2048Mi + env: + - name: DEBUG + value: 'true' + volumes: + - name: monitorconfigs + size: 1Gi + localPath: /config + - name: dnsconfig + size: 1Gi + localPath: /demo-app-config + networks: + - name: control-net + cidr: 10.10.0.0/24 + - name: capture-net + cidr: 10.10.1.0/24 + - name: bridge-inet + cidr: 4.29.163.0/28 + nameservers: + addresses: + - 4.29.163.7 + rangeGateway: 4.29.163.14 + routes: + - destinationNetworkCidr: 0.0.0.0/0 + nextHopIpAddress: 4.29.163.14 + - name: bridge-edge + cidr: 172.27.11.0/28 + routes: + - destinationNetworkCidr: 0.0.0.0/0 + nextHopIpAddress: 172.27.11.11 + - destinationNetworkCidr: 172.26.0.0/15 + nextHopIpAddress: 172.27.11.12 + nameservers: + addresses: + - 4.29.163.7 + - name: edge-fw + cidr: 172.26.5.0/28 + routes: + - destinationNetworkCidr: 0.0.0.0/0 + nextHopIpAddress: 172.26.5.11 + - destinationNetworkCidr: 172.26.0.0/16 + nextHopIpAddress: 172.26.5.12 + nameservers: + addresses: + - 4.29.163.7 + - name: range-services + cidr: 172.26.1.0/24 + routes: + - destinationNetworkCidr: 0.0.0.0/0 + nextHopIpAddress: 172.26.1.254 + nameservers: + addresses: + - 172.26.1.101 ``` +#### **3.2: The Workspace Controller** +The Workspace Controller will orchestrate the creation and management of all resources defined in a `Workspace` manifest. Below is +the high flow of a Workspace through the controller -#### **3.3: The Workspace Controller** -The Workspace Controller will orchestrate the creation and management of all resources defined in a `Workspace` manifest. - - -### **4. End-to-End Controller Workflow** ```mermaid stateDiagram-v2 direction TB @@ -108,3 +182,11 @@ The Workspace Controller will orchestrate the creation and management of all res FinalizeCleanup --> [*] ``` +### **4: Considered alternatives** + +The following alternatives have been considered + +#### Deploying VMDiskImages prior to Workspace creation + +This option seems viaiable on the face however it does not resolve the issue where a OT user attempts to stand up a workspace while a VMDiskImage is syncing. This would still result in a workspace getting stuck and requiring outside intervention. + From 4ae699980b19ca394c65955b38640dac510d981a Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Tue, 2 Dec 2025 14:40:39 -0700 Subject: [PATCH 04/11] feat: add more info --- WORKSPACE_PROPOSAL.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index 354a5c4..758061e 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -24,6 +24,10 @@ The `Workspace` can be in the following phases during its lifecycle. - `Failed`: Something happened and we cannot recover the workspace. - `Ready`: The workspace has been successfully provisioned and is ready for use. +When a workspace is first recognized by the Cluster it will be placed in a `Provisioning` state. While provisioning the cluster will check to see if any VMDIs required do not yet exist. Should the VMDIs not exist the controller will create them and wait on standing up the rest of the workspace until the VMDIs are ready. + +Once the VMDIs are ready the controller will preform the operations to stand up workspaces by creating any underlying resources. Should this step fail the workspace will be moved into a `Failed` state. When the workspace is successfully created and all sub components are reading green it will be moved into a `Ready` state. + **Example `Workspace` Manifest:** ```yaml apiVersion: "pelotech.ot/v1alpha1" @@ -190,3 +194,6 @@ The following alternatives have been considered This option seems viaiable on the face however it does not resolve the issue where a OT user attempts to stand up a workspace while a VMDiskImage is syncing. This would still result in a workspace getting stuck and requiring outside intervention. +#### Usage of outside service to Record VMDI usage in workspaces + +This option would circumvent the need for the new CRD and Controller however would result in the duplication of state. OT should be the ultimate sources of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state incosistency. From 71665b67435fe77cb0d3661240e11762d489cb93 Mon Sep 17 00:00:00 2001 From: Harrison Billings <46241098+hmbill694@users.noreply.github.com> Date: Wed, 3 Dec 2025 08:42:35 -0700 Subject: [PATCH 05/11] Update WORKSPACE_PROPOSAL.md Co-authored-by: Caleb Tallquist <55416214+tallquist10@users.noreply.github.com> --- WORKSPACE_PROPOSAL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index 758061e..3d9e2a5 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -192,7 +192,7 @@ The following alternatives have been considered #### Deploying VMDiskImages prior to Workspace creation -This option seems viaiable on the face however it does not resolve the issue where a OT user attempts to stand up a workspace while a VMDiskImage is syncing. This would still result in a workspace getting stuck and requiring outside intervention. +This option seems viable on the face however it does not resolve the issue where a OT user attempts to stand up a workspace while a VMDiskImage is syncing. This would still result in a workspace getting stuck and requiring outside intervention. #### Usage of outside service to Record VMDI usage in workspaces From 411b5fb8b9afaf754a71fc3dff3f2e754fc52952 Mon Sep 17 00:00:00 2001 From: Harrison Billings <46241098+hmbill694@users.noreply.github.com> Date: Wed, 3 Dec 2025 08:42:42 -0700 Subject: [PATCH 06/11] Update WORKSPACE_PROPOSAL.md Co-authored-by: Caleb Tallquist <55416214+tallquist10@users.noreply.github.com> --- WORKSPACE_PROPOSAL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index 3d9e2a5..365d57b 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -196,4 +196,4 @@ This option seems viable on the face however it does not resolve the issue where #### Usage of outside service to Record VMDI usage in workspaces -This option would circumvent the need for the new CRD and Controller however would result in the duplication of state. OT should be the ultimate sources of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state incosistency. +This option would circumvent the need for the new CRD and Controller. However, it would result in the duplication of state. OT should be the ultimate source of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state inconsistency. From 94df04b1d5a9990ef978859558c66c730fcba5ed Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Wed, 3 Dec 2025 11:39:28 -0700 Subject: [PATCH 07/11] feat: added new alternative for moving abstraction down a level --- WORKSPACE_PROPOSAL.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md index 365d57b..f5c3987 100644 --- a/WORKSPACE_PROPOSAL.md +++ b/WORKSPACE_PROPOSAL.md @@ -10,7 +10,7 @@ As Open Terrain (OT) environments grow in complexity, there is an increasing nee ### 2.1 Main pain points - **Resource removal**: Currently it is difficult to know when old resources required to back `Workspaces` are ready to be removed. Removal of these resources could be tied to Workspace Lifecycles. - **No self healing**: The current approach of deploying Workspaces all at once via a helm chart results in the inability for the platform to attempt to resolve problems without outside interaction. -- **No Unified Status**: It is difficult to determine the overall health and status of a workspace. An admin must manually inspect each individual component to diagnose issues. +- **No Unified Management**: It is difficult to determine the overall health and status of a workspace. An admin must manually inspect each individual component to diagnose issues when things go wrong. ### **3. Proposed Architecture** The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. @@ -197,3 +197,12 @@ This option seems viable on the face however it does not resolve the issue where #### Usage of outside service to Record VMDI usage in workspaces This option would circumvent the need for the new CRD and Controller. However, it would result in the duplication of state. OT should be the ultimate source of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state inconsistency. + +#### Moving the abstraction level down to the VM + +The self healing and resource removal pain points revolve primary around the lifecycle of VMs. When VMs are launched without backing data they enter a unrecoverable state and when we remove them we have no easy way to tell if the backing resources can also be removed. The team could move the level of abstraction down to just the VM level. A CR could be made to wrap our existing VM solution. A controller could then watch for this CR and check for the existence of the required VMDI, if it does not exist this controller could make them. Once the required backing data is created we can stand up our VMs as normal. When this custom resource is removed we could check whether or not we can the VMDI and if so delete it. This approach could also be used to compose into a workspace at some point. + +This approach is not a complete solution. The below are unhandled issues and potential other considerations: +- In the case that this proposed lower level CR encounters an unrecoverable error (VMDI failed to create) this would still result in a dangling Workspace and require manual intervention to clean up the Workspace. +- Deployment of a workspace still requires deployment all component parts. +- More layers and CRs in a potential later workspace solution From 15626f732c134fa7dd800ebaa8a1ab322295aa5d Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Wed, 3 Dec 2025 20:38:55 -0700 Subject: [PATCH 08/11] feat: alter the propsal entirely --- VIRTUAL_MACHINE_PROPOSAL.md | 111 ++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 VIRTUAL_MACHINE_PROPOSAL.md diff --git a/VIRTUAL_MACHINE_PROPOSAL.md b/VIRTUAL_MACHINE_PROPOSAL.md new file mode 100644 index 0000000..2675c73 --- /dev/null +++ b/VIRTUAL_MACHINE_PROPOSAL.md @@ -0,0 +1,111 @@ +## **Formal Design Proposal: VirtualMachine CRD** + +This document proposes the creation of a `VirtualMachine` Custom Resource (CR) and a corresponding Kubernetes Controller. A `VirtualMachine` represents single VM within OT and is responsible for all of the underlying child resources required to allow the VM to be used within Open Terrain (OT) workspaces. A workspace in OT can consist of one or more VMs and containers all networked together. + +This proposed solution is intended to move the complexity of managing VMs in OT from customers to the platform. This solution will allow users to worry less about the implementation details of deploying a VM to OT and will reduce admins/engineer toil as addressing things like missing VM resources and resource pruning which are handled manually at this time. + +### 2. Background and Problem Statement +As Open Terrain (OT) environments grow in use the team has noticed many sharp edges when dealing with VMs that make up workspaces within OT. These sharp edges manifest themselves mainly in the form of a few problems. Those being that Workspaces with VMs can be launched and end up in state where the VM cannot boot as it has attempted startup prior to having the backing data to allow it to spin up. + +The second issue revolves around resource pruning. Currently it is very difficult for admins know when they can safely delete the backing data for a VM. Determining if VM backing data is prunable is currently very manual and involves cross referencing cluster state with various outside state stores. + +### 2.1 Main pain points +- **Resource removal**: Currently it is difficult to know when old resources required to back `VirtualMachines` are ready to be removed. +- **No self healing**: The current approach of deploying `VirtualMachines` results in the inability for the platform to attempt to resolve problems without outside interaction. + +### **3. Proposed Solution** +The proposed solution is a new `VirtualMachine` CR and controller along with the expansion of the capabilities of the `VMDiskImage` controller. + +The `VirtualMachine` CR will act as a thin wrapper around what the teams existing VM solution. This will allow OT to have it's own interface to represent a virtual machine decoupling us from direct reference to underlying resources which actually spin up virtual machines in the cluster. This CR paired with the controller will allow the platform to interact with the creation lifecycle of underlying resources as well. We can use this to ensure that we always have the required backing resources for virtual machines allowing the platform to self heal. + +To address the second pain point of resource pruning the team can expand the `VMDiskImage` controller to also record the number of referencing `VirtualMachines` on `VMDiskImages`. We can prevent deletion of `VMDiskImages` while they're referencing vms and delete these resources if there are no referencing vms created within a given time period. + +### **3.1 VirtualMachine CR** +A `VirtualMachine` can be in the following phases during its lifecycle. + +- `Provisioning`: The VirtualMachine is being stood up and any backing data is being created if needed +- `Failed`: Something happened and we cannot recover. +- `Ready`: The virtual machine has been successfully provisioned and is ready for use. + +**Example `VirtualMachine` Manifest:** +``` +apiVersion: "pelotech.ot/v1alpha1" +kind: "VirtualMachine" +metadata: + name: "demo-workspace-1" +spec: + baseVm: ubuntu_2004_lts_en-us_x64 + baseVmVersion: 2.1.2 + ignoreOnDeploy: true + name: demo-vm + version: 2.1.0 + users: + - name: demo user + plain_text_passwd: demo-pwd + lock_passwd: false + sudo: ALL=(ALL) NOPASSWD:ALL + groups: sudo + resources: + cpu: '2' + memory: 2Gi + diskSize: 18Gi + interfaces: + - network: control-net + ipAddress: 10.10.0.161/24 + - network: bridge-inet + ipAddress: 4.29.163.6/28 + - network: bridge-edge + ipAddress: 172.27.11.11/28 + vmDiskImageRef: + name: demo-vmdi + namespace: vmdi-farm + vmDiskImageTemplate: + storageClass: "gp3" + snapshotClass: "ebs-snapshot" + secretRef: "foo-bar" + name: "harrison-vm" + url: "https://s3.us-gov-west-1.amazonaws.com/vm-images/images/harrison-vm/1.0.0/vm.qcow2" + sourceType: "s3" + diskSize: "24Gi" +``` + +#### **3.2: The VirtualMachine Controller** +When a instance of an OT `VirtualMachine` is created the CR's controller will pick up the resource. + +The controller will first check if the `VMDiskImage` referenced exists. If it does not exist and a template has been provided for the `VMDiskImage` the controller will create it with the name provided as a reference. + +The controller will then create the VM using our standard approach within OT. + +```mermaid + stateDiagram-v2 + direction TB + + [*] --> NewVirtualMachineDetected + state "Check VMDiskImages" as CheckVMDI + NewVirtualMachineDetected --> CheckVMDI + + %% Creation Path + CheckVMDI --> StandUpVMDIs : VMDIs Missing + CheckVMDI --> Provisioning : VMDIs Exist + + StandUpVMDIs --> Provisioning : VMDIs Ready + + Provisioning --> VirtualMachineFailed : Unrecoverable error occured. + Provisioning --> VirtualMachineReady : Success. +``` +### **4: Considered alternatives** + +The following alternatives have been considered + +#### Track underlying VM implementation resources + +We could directly track the underlying resource which represents a VM which is a kubevirt VM CR. This would allow us to not introduce a new custom resource and a layer of indirection. The determining factor in not taking this approach is the added value in decoupling the OT representation of VM from what is used to implement them. The chosen approach allows the team to change underlying implementation details without having to alter the interface. + +#### Usage of outside service to Record VMDI usage in workspaces + +This option would circumvent the need for the new CRD and Controller. However, it would result in the duplication of state. OT should be the ultimate source of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state inconsistency. + + +### **5: What this doesn't fix** + +There remains the open issue of handling workspace cleanup on the platform when encountering an error. A failed VM Provisioning is one such case. The team is currently working on a solution for this issue but it is out of scope of this proposal. From 7ef34cae613d48b100701c8084427ffafb78011c Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Wed, 3 Dec 2025 20:39:16 -0700 Subject: [PATCH 09/11] feat: bye bye workspace --- WORKSPACE_PROPOSAL.md | 208 ------------------------------------------ 1 file changed, 208 deletions(-) delete mode 100644 WORKSPACE_PROPOSAL.md diff --git a/WORKSPACE_PROPOSAL.md b/WORKSPACE_PROPOSAL.md deleted file mode 100644 index f5c3987..0000000 --- a/WORKSPACE_PROPOSAL.md +++ /dev/null @@ -1,208 +0,0 @@ -## **Formal Design Proposal: Workspace CRD** - -This document proposes the creation of a `Workspace` Custom Resource Definition (CRD) and a corresponding Kubernetes Controller. A Workspace represents a grouping of one or more virtual machines (VMs) and/or containers that communicate with each other in an isolated network within the cluster. - -The proposed system will provide a unified, declarative API for deploying complex, multi-component environments. This simplifies the management of applications that require both traditional VMs and modern containerized services. The Workspace controller will orchestrate the creation, networking, and lifecycle of all associated resources, ensuring they are deployed cohesively. - -### 2. Background and Problem Statement -As Open Terrain (OT) environments grow in complexity, there is an increasing need to deploy applications composed of multiple, interconnected components. These "workspaces" often consist of several VMs and containers that need to function as a single logical unit. The current approach to deploying these workspaces requires more knowledge of the guts of workspaces than required for users of the platform. - -### 2.1 Main pain points -- **Resource removal**: Currently it is difficult to know when old resources required to back `Workspaces` are ready to be removed. Removal of these resources could be tied to Workspace Lifecycles. -- **No self healing**: The current approach of deploying Workspaces all at once via a helm chart results in the inability for the platform to attempt to resolve problems without outside interaction. -- **No Unified Management**: It is difficult to determine the overall health and status of a workspace. An admin must manually inspect each individual component to diagnose issues when things go wrong. - -### **3. Proposed Architecture** -The proposed solution is a new `Workspace` CRD and controller that builds upon the existing `VMDiskImage` controller's patterns. - -The `Workspace` CRD will serve as a blueprint for a complete, isolated environment. It will define all the necessary VMs, containers, and networking rules. The introduction of this cluster scoped resource will allow the platform team to have a single interface for OT customers to deploy workspaces to the system. - -### **3.1 Workspace Lifecycle** -The `Workspace` can be in the following phases during its lifecycle. - -- `Provisioning`: The Workspace is being stood up. -- `Failed`: Something happened and we cannot recover the workspace. -- `Ready`: The workspace has been successfully provisioned and is ready for use. - -When a workspace is first recognized by the Cluster it will be placed in a `Provisioning` state. While provisioning the cluster will check to see if any VMDIs required do not yet exist. Should the VMDIs not exist the controller will create them and wait on standing up the rest of the workspace until the VMDIs are ready. - -Once the VMDIs are ready the controller will preform the operations to stand up workspaces by creating any underlying resources. Should this step fail the workspace will be moved into a `Failed` state. When the workspace is successfully created and all sub components are reading green it will be moved into a `Ready` state. - -**Example `Workspace` Manifest:** -```yaml -apiVersion: "pelotech.ot/v1alpha1" -kind: "Workspace" -metadata: - name: "demo-workspace-1" -spec: - vms: - - baseVm: ubuntu_2004_lts_en-us_x64 - baseVmVersion: 2.1.2 - ignoreOnDeploy: true - name: demo-vm - version: 2.1.0 - backingVMDiskImage: demo-vmdi - resources: - cpu: '2' - memory: 2Gi - diskSize: 18Gi - interfaces: - - network: control-net - ipAddress: 10.10.0.161/24 - - network: bridge-inet - ipAddress: 4.29.163.6/28 - - network: bridge-edge - ipAddress: 172.27.11.11/28 - ansible: - roles: - build: - - role: specialize - - role: linuxRouter - variables: - enable_nat: true - nat_out_iface_idx: 1 - deploy: - - role: runtimeChecks - containerClusters: - - ignoreOnDeploy: false - name: demo-app - interfaces: - - network: bridge-inet - ipAddress: 4.29.163.7/28 - containers: - - name: chef - image: 'ghcr.io/demo/demo-containers/demo-app:1.0.2' - resources: - cpu: 128m - memory: 256Mi - capabilities: - drop: - - ALL - add: - - NET_BIND_SERVICE - volumeMounts: - - name: dnsconfig - containerPath: /config - readOnly: true - portMappings: - - containerPort: 53 - protocol: udp - hostPort: 53 - - ignoreOnDeploy: false - name: ansible - interfaces: - - network: control-net - ipAddress: 10.10.0.141/24 - containers: - - name: ansible - image: 'ghcr.io/demo/demo-containers/ansible:2.5.14' - resources: - cpu: 384m - memory: 2048Mi - env: - - name: DEBUG - value: 'true' - volumes: - - name: monitorconfigs - size: 1Gi - localPath: /config - - name: dnsconfig - size: 1Gi - localPath: /demo-app-config - networks: - - name: control-net - cidr: 10.10.0.0/24 - - name: capture-net - cidr: 10.10.1.0/24 - - name: bridge-inet - cidr: 4.29.163.0/28 - nameservers: - addresses: - - 4.29.163.7 - rangeGateway: 4.29.163.14 - routes: - - destinationNetworkCidr: 0.0.0.0/0 - nextHopIpAddress: 4.29.163.14 - - name: bridge-edge - cidr: 172.27.11.0/28 - routes: - - destinationNetworkCidr: 0.0.0.0/0 - nextHopIpAddress: 172.27.11.11 - - destinationNetworkCidr: 172.26.0.0/15 - nextHopIpAddress: 172.27.11.12 - nameservers: - addresses: - - 4.29.163.7 - - name: edge-fw - cidr: 172.26.5.0/28 - routes: - - destinationNetworkCidr: 0.0.0.0/0 - nextHopIpAddress: 172.26.5.11 - - destinationNetworkCidr: 172.26.0.0/16 - nextHopIpAddress: 172.26.5.12 - nameservers: - addresses: - - 4.29.163.7 - - name: range-services - cidr: 172.26.1.0/24 - routes: - - destinationNetworkCidr: 0.0.0.0/0 - nextHopIpAddress: 172.26.1.254 - nameservers: - addresses: - - 172.26.1.101 -``` -#### **3.2: The Workspace Controller** -The Workspace Controller will orchestrate the creation and management of all resources defined in a `Workspace` manifest. Below is -the high flow of a Workspace through the controller - -```mermaid - stateDiagram-v2 - direction TB - - [*] --> NewWorkspaceDetected - state "Check VMDiskImages" as CheckVMDI - NewWorkspaceDetected --> CheckVMDI - - %% Creation Path - CheckVMDI --> StandUpVMDIs : VMDIs Missing - CheckVMDI --> Provisioning : VMDIs Exist - - StandUpVMDIs --> Provisioning : VMDIs Ready - - Provisioning --> WorkspaceReady : Success - - %% Deletion Path - WorkspaceReady --> WorkspaceDeleted : Delete Triggered - - state "Check VMDI References" as CheckRef - - WorkspaceDeleted --> CheckRef - - CheckRef --> RemoveVMDIs : Last Reference - CheckRef --> FinalizeCleanup : Reference Exists - - RemoveVMDIs --> FinalizeCleanup - - FinalizeCleanup --> [*] -``` -### **4: Considered alternatives** - -The following alternatives have been considered - -#### Deploying VMDiskImages prior to Workspace creation - -This option seems viable on the face however it does not resolve the issue where a OT user attempts to stand up a workspace while a VMDiskImage is syncing. This would still result in a workspace getting stuck and requiring outside intervention. - -#### Usage of outside service to Record VMDI usage in workspaces - -This option would circumvent the need for the new CRD and Controller. However, it would result in the duplication of state. OT should be the ultimate source of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state inconsistency. - -#### Moving the abstraction level down to the VM - -The self healing and resource removal pain points revolve primary around the lifecycle of VMs. When VMs are launched without backing data they enter a unrecoverable state and when we remove them we have no easy way to tell if the backing resources can also be removed. The team could move the level of abstraction down to just the VM level. A CR could be made to wrap our existing VM solution. A controller could then watch for this CR and check for the existence of the required VMDI, if it does not exist this controller could make them. Once the required backing data is created we can stand up our VMs as normal. When this custom resource is removed we could check whether or not we can the VMDI and if so delete it. This approach could also be used to compose into a workspace at some point. - -This approach is not a complete solution. The below are unhandled issues and potential other considerations: -- In the case that this proposed lower level CR encounters an unrecoverable error (VMDI failed to create) this would still result in a dangling Workspace and require manual intervention to clean up the Workspace. -- Deployment of a workspace still requires deployment all component parts. -- More layers and CRs in a potential later workspace solution From bf266a0c34aeeca54517d2520b506ab3deab0a7b Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Thu, 11 Dec 2025 08:40:20 -0700 Subject: [PATCH 10/11] feat: add new alternative solution --- VIRTUAL_MACHINE_PROPOSAL.md | 92 ++++++++++++++++++++++++++++++------- 1 file changed, 76 insertions(+), 16 deletions(-) diff --git a/VIRTUAL_MACHINE_PROPOSAL.md b/VIRTUAL_MACHINE_PROPOSAL.md index 2675c73..7d2fe7a 100644 --- a/VIRTUAL_MACHINE_PROPOSAL.md +++ b/VIRTUAL_MACHINE_PROPOSAL.md @@ -1,26 +1,26 @@ -## **Formal Design Proposal: VirtualMachine CRD** +## **Formal Design Proposal** -This document proposes the creation of a `VirtualMachine` Custom Resource (CR) and a corresponding Kubernetes Controller. A `VirtualMachine` represents single VM within OT and is responsible for all of the underlying child resources required to allow the VM to be used within Open Terrain (OT) workspaces. A workspace in OT can consist of one or more VMs and containers all networked together. - -This proposed solution is intended to move the complexity of managing VMs in OT from customers to the platform. This solution will allow users to worry less about the implementation details of deploying a VM to OT and will reduce admins/engineer toil as addressing things like missing VM resources and resource pruning which are handled manually at this time. - -### 2. Background and Problem Statement +### **1. Background and Problem Statement** As Open Terrain (OT) environments grow in use the team has noticed many sharp edges when dealing with VMs that make up workspaces within OT. These sharp edges manifest themselves mainly in the form of a few problems. Those being that Workspaces with VMs can be launched and end up in state where the VM cannot boot as it has attempted startup prior to having the backing data to allow it to spin up. The second issue revolves around resource pruning. Currently it is very difficult for admins know when they can safely delete the backing data for a VM. Determining if VM backing data is prunable is currently very manual and involves cross referencing cluster state with various outside state stores. -### 2.1 Main pain points +### **1.1 Main pain points** - **Resource removal**: Currently it is difficult to know when old resources required to back `VirtualMachines` are ready to be removed. - **No self healing**: The current approach of deploying `VirtualMachines` results in the inability for the platform to attempt to resolve problems without outside interaction. -### **3. Proposed Solution** +### **1.2 Current Solution** + +Currently manual intervention is required to resolve the issue of a virtual machine starting without backing data. The team must first deploy the backing data required by the VM. This is often enough as the underlying implementation libraries will move the VM to a "Ready" state once the backing data is in place. The team has noticed on occasion that simply putting the required data in place is not enough and the team must "kick" underlying vm resources to "unstick" them. + +### **2. Solution 1** The proposed solution is a new `VirtualMachine` CR and controller along with the expansion of the capabilities of the `VMDiskImage` controller. The `VirtualMachine` CR will act as a thin wrapper around what the teams existing VM solution. This will allow OT to have it's own interface to represent a virtual machine decoupling us from direct reference to underlying resources which actually spin up virtual machines in the cluster. This CR paired with the controller will allow the platform to interact with the creation lifecycle of underlying resources as well. We can use this to ensure that we always have the required backing resources for virtual machines allowing the platform to self heal. To address the second pain point of resource pruning the team can expand the `VMDiskImage` controller to also record the number of referencing `VirtualMachines` on `VMDiskImages`. We can prevent deletion of `VMDiskImages` while they're referencing vms and delete these resources if there are no referencing vms created within a given time period. -### **3.1 VirtualMachine CR** +### **2.1 VirtualMachine CR** A `VirtualMachine` can be in the following phases during its lifecycle. - `Provisioning`: The VirtualMachine is being stood up and any backing data is being created if needed @@ -69,7 +69,7 @@ spec: diskSize: "24Gi" ``` -#### **3.2: The VirtualMachine Controller** +#### **2.2: The VirtualMachine Controller** When a instance of an OT `VirtualMachine` is created the CR's controller will pick up the resource. The controller will first check if the `VMDiskImage` referenced exists. If it does not exist and a template has been provided for the `VMDiskImage` the controller will create it with the name provided as a reference. @@ -93,19 +93,79 @@ The controller will then create the VM using our standard approach within OT. Provisioning --> VirtualMachineFailed : Unrecoverable error occured. Provisioning --> VirtualMachineReady : Success. ``` -### **4: Considered alternatives** -The following alternatives have been considered +### **2.3 Pros and Cons** + +#### Pros +- Decoupling OT representation of VM what is actually deployed to make a VM happen. Allows team to have mechanism for handling unexpected behavior that is not yet addressed or won't be addressed by underlying implemetation libraries +- Allows for more robust error handling. Since this CR would own all the implementation resources the operator would have more freedom as far retry strategies it could attempt. +- Slots into existing deployment flows + +#### Cons +- A new CR is more to manage and more complexity +- Could be overkill for the above painpoints +- May encourage team to not contribute back to our underlying tooling since the whole purpose is to shim missing behavior. -#### Track underlying VM implementation resources +### **3. Proposed Solution 2** +The proposed solution is a new controller along with the expansion of the capabilities of the `VMDiskImage` controller. -We could directly track the underlying resource which represents a VM which is a kubevirt VM CR. This would allow us to not introduce a new custom resource and a layer of indirection. The determining factor in not taking this approach is the added value in decoupling the OT representation of VM from what is used to implement them. The chosen approach allows the team to change underlying implementation details without having to alter the interface. +The team can setup a new controller within the operator to watch "runtime" `datavolumes`. These `datavolumes` are used to clone buildtime `volumesnapshots` that are managed by `VMDIs`. When this controller notices that a runtime `Datavolume` references a `Volumesnapshot` managed by a non-existent `VMDiskImage` it can issue a creation request and derive what the new `VMDiskImage` should look like based on what the runtime `datavolume` is expecting. -#### Usage of outside service to Record VMDI usage in workspaces +To address the second pain point of resource pruning the team can expand the `VMDiskImage` controller to also record the number of referencing runtime `DataVolumes` on `Volumesnapshots` controlled by a given `VMDI`. We can prevent deletion of `VMDiskImages` while they're referencing vms and delete these resources if there are no referencing vms created within a given time period. -This option would circumvent the need for the new CRD and Controller. However, it would result in the duplication of state. OT should be the ultimate source of truth when it comes to resource ownership and this introduction of an outside service may result in more indirection and issues with state inconsistency. +### **3.1 Runtime DV Controller** +When a instance of a runtime `datavolume` is created the new controller will pick up the resource. The team can use a label to easily identify and filter on to determine if a `Datavolume` is indeed a runtime `datavolume`. + +The controller will first check if the `VMDiskImage` referenced exists. If it does not exist and a template has been provided for the `VMDiskImage` the controller will create it with the name provided as a reference. +The controller will then create the VM using our standard approach within OT. + +```mermaid + stateDiagram-v2 + direction TB + + [*] --> NewRuntimeVMDetected + state "Check VMDiskImages" as CheckVMDI + NewRuntimeVMDetected --> CheckVMDI + + %% Creation Path + CheckVMDI --> CreateVmdi : VMDIs Missing + CheckVMDI --> Done : VMDIs Exist + + CreateVmdi --> Done +``` + +### **3.3 Pros and Cons** + +#### Pros +- Does not require any alteration of existing customer deployment flows. +- Does not require new CR and adds no new abstractions + +#### Cons +- Potentially less flexibility for error handling. + +### **4: Considered alternatives** + +The following alternatives have been considered + +#### Declare these pain points out of scope of the platform itself + +Leave it to the customer to handle these issues. It may not necessarily be the responsibility of OT to handle these things. ### **5: What this doesn't fix** There remains the open issue of handling workspace cleanup on the platform when encountering an error. A failed VM Provisioning is one such case. The team is currently working on a solution for this issue but it is out of scope of this proposal. + +### **6: Discussion** + +The team would like to explore whether the introduction of a custom "VM" CR is worthwhile. The team does acknowledge the need for us to check if something is requesting a build resource that has not been cached. + +The team would like to explore if we can get enough control by checking for runtime Datavolumes which reference a volumesnapshot controlled by buildtime VMDIs. If creation of the referenced volumesnapshot is enough to get the VM out of it's loop then we should be good to go. If it is not enough the team will need to re-evaluate. + +Watching runtime Datavolumes does give us the information to do some kind of reference counting/caching on data volumes. The team could devise a clean up strategy for VMDIs this way. +Both of these solutions when put together would allow to team to implement robust self healing for OT VMs within a workspace and aide in visibility of VMDI usage. + + +### **7: Decision** + +TODO From 2c4d01840cdd8fb600b67c598f2a061e7844af93 Mon Sep 17 00:00:00 2001 From: Harrison Billings Date: Thu, 11 Dec 2025 08:42:49 -0700 Subject: [PATCH 11/11] feat: more info --- VIRTUAL_MACHINE_PROPOSAL.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/VIRTUAL_MACHINE_PROPOSAL.md b/VIRTUAL_MACHINE_PROPOSAL.md index 7d2fe7a..59e04c2 100644 --- a/VIRTUAL_MACHINE_PROPOSAL.md +++ b/VIRTUAL_MACHINE_PROPOSAL.md @@ -13,6 +13,8 @@ The second issue revolves around resource pruning. Currently it is very difficul Currently manual intervention is required to resolve the issue of a virtual machine starting without backing data. The team must first deploy the backing data required by the VM. This is often enough as the underlying implementation libraries will move the VM to a "Ready" state once the backing data is in place. The team has noticed on occasion that simply putting the required data in place is not enough and the team must "kick" underlying vm resources to "unstick" them. +In regards to resource removal the team must currently cross reference existing backing data with expectations of customers for which workspaces will be launched. If a resources has no more planned usage for the customer it can safely be removed from the cluster. + ### **2. Solution 1** The proposed solution is a new `VirtualMachine` CR and controller along with the expansion of the capabilities of the `VMDiskImage` controller.