Polyp-Segmentation-System-MLOps is an end-to-end medical image segmentation platform designed to detect and segment colorectal polyps from endoscopic images.
This project goes beyond traditional model training — it shows a complete MLOps system that automates the entire machine learning lifecycle: from data ingestion and training pipeline to deployment, monitoring, and continuous delivery.
The platform is orchestrated with Kubeflow Pipelines and deployed on Google Kubernetes Engine (GKE), using KServe (Triton Inference Server) for scalable model serving.
| Category | Tools / Frameworks |
|---|---|
| Orchestration | Kubeflow Pipelines, Jenkins |
| Training | PyTorch, Ray Train, Ray Tune |
| Tracking | MLflow (PostgreSQL + MinIO backend) |
| Deployment | KServe, Triton Inference Server |
| Storage | MinIO, GCS |
| Monitoring | Prometheus, Grafana |
| Infrastructure | GKE, Docker, Terraform |
| UI | Gradio |
- 💺 Polyp Segmentation System MLOps
- 📚 Overview
- 🧩 Architecture
- 🧱 Stack Overview
- ⚙️ Environment Setup
- 🧪 Local Development
- ☸️ Cluster Setup (Kind / GKE)
- 📦 Kubeflow Deployment
- 🧾 MLflow Deployment
- ⚡ KubeRay Installation
- 🧽 RayCluster Setup
- 🚀 Pipeline Integration
- Development & Deployment Playbook (English)
git clone https://github.com/Harly-1506/polyp-segmentation-mlops.git
cd polyp-segmentation-mlopsconda create -n polyp_mlops python==3.12.9
conda activate polyp_mlops
pip install uv
uv sync --all-groupsYou can start with a local environment to verify training and MLflow tracking before deploying to the cluster.
Follow the official Docker installation guide: 👉 Install Docker Engine
docker compose -f docker-compose-mlflow.yaml up -d --builduv run --active -m training.ray_main --config training/configs/configs.yamlThis runs Ray-based distributed training while logging metrics and artifacts to MLflow (using PostgreSQL + MinIO backends).
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Install Kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
sudo mv kustomize /usr/local/bin/cat <<EOF | kind create cluster --name=kubeflow --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.32.0@sha256:c48c62eac5da28cdadcf560d1d8616cfa6783b58f0d94cf63ad1bf49600cb027
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
extraArgs:
"service-account-issuer": "https://kubernetes.default.svc"
"service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"
extraMounts:
- hostPath: /home/harly/data
containerPath: /mnt/data
EOFSave Kubeconfig
kind get kubeconfig --name kubeflow > /tmp/kubeflow-config
export KUBECONFIG=/tmp/kubeflow-configCreate Docker Registry Secret
docker login
kubectl create secret generic regcred \
--from-file=.dockerconfigjson=$HOME/.docker/config.json \
--type=kubernetes.io/dockerconfigjsonDownload and deploy the official Kubeflow manifests
RELEASE=v1.10.1
git clone -b $RELEASE --depth 1 --single-branch https://github.com/kubeflow/manifests.git
cd manifests
while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do
echo "Retrying to apply resources"
sleep 20
doneCreate a namespace and deploy MLflow via Helm:
kubectl create namespace mlflow
kubens mlflow
docker build -t harly1506/mlflow-custom:v1.0 .
kind load docker-image harly1506/mlflow-custom:v1.0 --name kubeflow
helm install mlflow ./mlflow -f ./mlflow/values.yaml -n mlflow
helm upgrade mlflow mlflow -f mlflow/values.yamlEdit the Kubeflow Central Dashboard ConfigMap:
kubectl -n kubeflow get configmap centraldashboard-config -o yamlThen add this menu item under the dashboard JSON configuration:
{
"type": "item",
"link": "/mlflow/",
"text": "MLflow",
"icon": "check"
}Deploy the KubeRay operator:
cd ray
kustomize build kuberay-operator/overlays/kubeflow | kubectl apply --server-side -f -
kubectl get pod -l app.kubernetes.io/component=kuberay-operator -n kubeflowExpected output:
NAME READY STATUS RESTARTS AGE
kuberay-operator-5b8cd69758-rkpvh 1/1 Running 0 6m
Create a new namespace and service account:
kubectl create ns development
kubectl create sa default-editor -n developmentModify the RayCluster YAML:
-
Update the
AuthorizationPolicyprincipal:principals: - "cluster.local/ns/development/sa/default-editor"
-
Update the node address for
headGroupSpecandworkerGroupSpec:node-ip-address: $(hostname -I | tr -d ' ' | sed 's/\./-/g').raycluster-istio-headless-svc.development.svc.cluster.local
Deploy RayCluster:
cd ray
helm install raycluster ray-cluster -n development
helm upgrade raycluster ray-cluster -n development -f values.yamlCheck pods
kubectl get po -ABuild and push the image for Kubeflow Pipeline:
docker build -t harly1506/polyp-mlops:kfpv2 .
kind load docker-image harly1506/polyp-mlops:kfpv2 --name kubeflowRun this to generate pipeline yaml
uv run training/orchestration/kube_pipeline.py Upload file ray_segmentation_pipeline_v9.yaml on kubeflow portal via localhost:8080
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 MLflow tracking
Minio:
kubectl port-forward svc/minio-service -n mlflow 9000:9000 9001:9001End‑to‑end guide to deploy a Triton Inference Stack on GKE with KServe, with Prometheus/Grafana for observability and optional GPU + Canary + CI/CD.
Validate Triton and the FastAPI gateway before deploying to the cluster.
CHECKPOINT="training/checkpoints/UNet/Unet81PolypPVT-best.pth"
ONNX_EXPORT="artifacts/polyp-segmentation/1/model.onnx"
uv run python -m training.scripts.export_to_onnx "${CHECKPOINT}" "${ONNX_EXPORT}" --image-size 256 --dynamic
cat <<'EOF' > "artifacts/polyp-segmentation/config.pbtxt"
name: "polyp-segmentation"
platform: "onnxruntime_onnx"
max_batch_size: 1
input [
{ name: "input", data_type: TYPE_FP32, dims: [3, -1, -1] }
]
output [
{ name: "output", data_type: TYPE_FP32, dims: [1, -1, -1] }
]
instance_group [{ kind: KIND_CPU }]
EOFexport MODEL_REPO="$(pwd)/artifacts/polyp-segmentation"
docker run --rm --name triton \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v "${MODEL_REPO}:/models/polyp-segmentation" \
nvcr.io/nvidia/tritonserver:24.10-py3 \
tritonserver --model-repository=/models \
--strict-model-config=false \
--log-verbose=1Health check:
curl http://localhost:8000/v2/health/readydocker compose -f docker-compose-app.yaml up --buildexport PROJECT_ID="polyp-mlops-1506"
export REGION="asia-southeast1"
export ARTIFACT_REPO="polyp-inference"
gcloud auth login
gcloud config set project "${PROJECT_ID}"
gcloud config set compute/region "${REGION}"
gcloud services enable container.googleapis.com artifactregistry.googleapis.com compute.googleapis.comgcloud artifacts repositories create "${ARTIFACT_REPO}" \
--repository-format=docker \
--location="${REGION}" \
--description="Containers for Triton gateway"GATEWAY_IMAGE="${REGION}-docker.pkg.dev/${PROJECT_ID}/${ARTIFACT_REPO}/polyp-gateway:latest"
docker build -t "${GATEWAY_IMAGE}" -f app/Dockerfile .
gcloud auth configure-docker "${REGION}-docker.pkg.dev"
docker push "${GATEWAY_IMAGE}"MODEL_BUCKET="gs://my-polyp-models"
# (Create bucket if not existing)
gsutil mb -l "${REGION}" "${MODEL_BUCKET}" || echo "Bucket may already exist"
# Upload ONNX + config
gsutil cp "${ONNX_EXPORT}" "${MODEL_BUCKET}/models/polyp-segmentation/1/model.onnx"
gsutil cp artifacts/polyp-segmentation/config.pbtxt "${MODEL_BUCKET}/models/polyp-segmentation/config.pbtxt"If you use Terraform, run:
terraform init
terraform plan
terraform applyThen connect kubectl:
export CLUSTER="ml-inference-cluster"
# Use --region for regional clusters; use --zone for zonal clusters.
gcloud container clusters get-credentials "${CLUSTER}" --region "${REGION}" --project "${PROJECT_ID}"# 0) cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.1/cert-manager.yaml
kubectl -n cert-manager wait --for=condition=Available deploy --all --timeout=300s
# 1) Istio (control plane + ingressgateway)
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.22.3 sh -
./istio-1.22.3/bin/istioctl install -y --set profile=default
kubectl -n istio-system wait --for=condition=Available deploy --all --timeout=300s
kubectl -n istio-system get svc istio-ingressgateway
# 2) Knative Serving (CRDs + core)
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.19.5/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.19.5/serving-core.yaml
kubectl -n knative-serving wait --for=condition=Available deploy --all --timeout=300s
# 3) Knative net-istio (use Istio as ingress)
kubectl apply -f https://github.com/knative-extensions/net-istio/releases/download/knative-v1.19.5/net-istio.yaml
kubectl -n knative-serving patch configmap/config-network \
--type merge -p='{"data":{"ingress.class":"istio.ingress.networking.knative.dev"}}'
# 4) KServe
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.15.2/kserve.yaml --server-side --force-conflicts
kubectl -n kserve wait --for=condition=Available deploy/kserve-controller-manager --timeout=300s
kubectl apply -f "https://github.com/kserve/kserve/releases/download/v0.15.2/kserve-cluster-resources.yaml"
# 5) Verify webhook is ready
kubectl -n kserve get endpoints kserve-webhook-server-serviceGoal: allow KServe Pods to read the model from your GCS bucket without any key files, using Workload Identity.
export NAMESPACE="kserve-inference"
export BUCKET="my-polyp-models"
export GSA="kserve-infer-sa"
export KSA="kserve-model-sa"
kubectl create namespace "${NAMESPACE}" || true
# Enable Workload Identity on the cluster
gcloud container clusters update "${CLUSTER}" --region "${REGION}" \
--workload-pool="${PROJECT_ID}.svc.id.goog"
# Create a Google Service Account (GSA)
gcloud iam service-accounts create "${GSA}" --project "${PROJECT_ID}"
# Grant minimal read access to the model bucket (object viewer)
gsutil iam ch serviceAccount:${GSA}@${PROJECT_ID}.iam.gserviceaccount.com:roles/storage.objectViewer gs://${BUCKET}
# Create a Kubernetes Service Account (KSA)
kubectl -n "${NAMESPACE}" create serviceaccount "${KSA}" || true
# Bind GSA ↔ KSA (Workload Identity)
gcloud iam service-accounts add-iam-policy-binding \
"${GSA}@${PROJECT_ID}.iam.gserviceaccount.com" \
--role "roles/iam.workloadIdentityUser" \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${KSA}]"
# Annotate KSA with the GSA identity
kubectl -n "${NAMESPACE}" annotate serviceaccount "${KSA}" \
iam.gke.io/gcp-service-account="${GSA}@${PROJECT_ID}.iam.gserviceaccount.com" --overwriteShort explanation:
- GSA (Google Service Account): holds IAM permissions on GCP resources (GCS bucket).
- KSA (Kubernetes Service Account): attached to your Pods.
- Workload Identity links KSA → GSA, so Pods transparently use the GSA’s permissions—no JSON key files needed.
Tip: Ensure your InferenceService pods run with
serviceAccountName: kserve-model-sa(or the${KSA}you created).
kubectl apply -k deployment/kserve
kubectl apply -k deployment/uik port-forward svc/polyp-ui 7860:80 -n kserve-inference - Get external IP with Nginx
kubectl -n kserve-inference get svc polyp-ui-nginxOutput should be:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
polyp-ui-nginx LoadBalancer 10.30.1.4 34.124.246.185 80:31912/TCP 9d
Install Prometheus + Grafana (Helm).
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace observability --create-namespace \
-f deployment/monitoring/prometheus/values.yaml
helm upgrade --install grafana grafana/grafana \
--namespace observability \
-f deployment/monitoring/grafana/values.yaml
kubectl apply -k deployment/monitoringAccess to grafana and Prometheus:
kubectl port-forward svc/prometheus-operated 9090:9090 -n observability
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n observability kubectl apply -f deployment/kserve/inferenceservice-canary.yaml -n "${NAMESPACE}"Adjust canaryTrafficPercent to shift traffic between the old and new model revisions.
- Use Jenkins (chart under
Jenkins/) to build, push, and update KServe manifests. - In Kubeflow Pipelines (
training/orchestration/kube_pipeline.py), updatestorageUriautomatically when a model passes evaluation. - Optionally trigger Jenkins to upload the best checkpoint to GCS upon promotion.











