OpenShift Guide

Day 8 — AI/ML Workloads

OpenShift AI, model serving, GPU scheduling, MLflow, pipelines, and LLMOps

Red Hat OpenShift AI

Red Hat OpenShift AI (formerly RHODS) is the enterprise ML platform built on OpenShift. It provides data scientists with Jupyter notebooks, model serving infrastructure, and pipeline automation — all within your existing RBAC and network security boundary.

Data Science Projects

Isolated namespaces with GPU quotas, S3-connected workbenches, and shared model registries — one per team or initiative.

Workbenches

Jupyter and code-server environments with pre-installed data science toolchains. Spawned on-demand; terminated when idle to reclaim GPU.

Model Registry

MLflow-compatible registry integrated with OpenShift AI pipelines. Tracks model versions, metrics, and deployment lineage.

KServe (Model Serving)

Serverless inference with autoscaling to zero. Supports ONNX, TorchServe, Triton, and vLLM backends.

Pipelines (Tekton-Elyra)

Drag-and-drop pipeline editor backed by Tekton. Export as YAML for GitOps-driven retraining jobs.

Distributed Training

PyTorchJob and TFJob via KubeFlow Training Operator. Coordinate multi-node GPU training across multiple nodes.

Install OpenShift AI via OperatorHub

# 1. Install the operator
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: rhods-operator
  namespace: redhat-ods-operator
spec:
  channel: stable
  name: rhods-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
---
# 2. Create the DSCInitialization (first-time setup)
apiVersion: dscinitialization.opendatahub.io/v1
kind: DSCInitialization
metadata:
  name: default-dsci
spec:
  applicationsNamespace: redhat-ods-applications
  monitoring:
    managementState: Managed
    namespace: redhat-ods-monitoring
  serviceMesh:
    managementState: Managed
    auth:
      audiences:
      - https://kubernetes.default.svc
---
# 3. Enable components
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    dashboard:       { managementState: Managed }
    workbenches:     { managementState: Managed }
    datasciencepipelines: { managementState: Managed }
    kserve:
      managementState: Managed
      serving:
        ingressGateway:
          certificate: { type: SelfSigned }
        managementState: Managed
        name: knative-serving
    modelmeshserving: { managementState: Managed }
    trainingoperator:  { managementState: Managed }

GPU Quota

Set GPU quotas at the DataScienceProject (namespace) level using ResourceQuota with requests.nvidia.com/gpu: "2". Without quotas, one job can starve the entire cluster.