Cake Overview
Cake is an end-to-end environment for managing the entire AI lifecycle, from data engineering and model training, all the way to inference and monitoring. This article will guide you through the high-level Cake platform architecture, providing an overview of how its design choices streamline AI operations while maintaining flexibility, security, and control.
Holistic Lifecycle Management
The Cake platform integrates the entire range of capabilities needed for managing the AI lifecycle, including:
Advanced large language and embedding models
Multi-agent systems
3D parallel training and fine-tuning capabilities
Model monitoring and observability tools
GPU auto-scaling (from zero to thousands of nodes) for both training and inference
Exploratory data analysis (EDA) and AutoML frameworks
Cloud cost monitoring and optimization
PII/PHI anonymization utilities
Built to handle both traditional ML and generative AI workloads, Cake provides centralized management—a “single pane of glass”—to oversee every AI project.
Deployment Flexibility
Cake deploys directly into your own virtual private cloud (VPC) or on-premises infrastructure. This ensures no sensitive data ever leaves your environment. With encryption both in transit and at rest, along with robust Kubernetes role-based access controls (RBAC), Cake prioritizes security at every layer.
Every component is authenticated, and platform access is scoped based on user roles, ensuring a least-privilege model. Even the deployment itself adheres to infrastructure-as-code (IaC) principles, where all changes are version-controlled through Git repositories. This gives teams full transparency and control over their infrastructure.
Cake and Open Source
Understanding how to use Cake really means understanding how to effectively leverage a carefully curated stack of open source tools and frameworks. The Cake platform is built on a modular architecture that stitches together best-in-class open technologies—augmented with Cake’s custom tooling—to support every phase of the generative AI lifecycle. To make the most of the platform, users should become familiar with the foundational components across four key domains: Core, ML Ops, AI Ops, and Data Engineering.
Core encompasses the base platform components like Kubernetes, Istio, and Prometheus
ML Ops focuses on model development workflows, including training, fine-tuning, experiment tracking (e.g., with MLFlow), and inference (via Ray and KubeRay).
AI Ops deals with monitoring, tracing, alerting, and serving infrastructure for generative and agentic components—leveraging tools like vLLM, LangFuse, and LiteLLM to ensure operational reliability and observability of AI Ops
Key Core Platform Documentation
Key Components
Kubernetes - Orchestration Platform
Calling the Kubernetes API -Kubeflow In-Cluster Config and API
Lens - https://docs.k8slens.dev/
Istio - Service Mesh. East-West
Accessing Cake Resources Externally - Accessing Cake Platform Resources Externally
Envoy - North-South Gateway
Dex - Integrates Kubernetes with Enterprise Iam
Prometheus - Metrics Engine
Main - https://prometheus.io/docs/prometheus/latest/getting_started/
Monitoring your Ray Deployed Models with Prometheus and Grafana - Monitoring your Ray Deployed Models with Prometheus and Grafana
Grafana - Dashboarding and Alerting
Terraform - Infrastructure as Code
ArgoCD - IaC for Kubernetes, Terraform for Helm and Kustomize
Crossplane - Sets Kubernetes as universal control plane
Karpenter - Autoscaler used in-place of default AWS/Azure auto scaler (Not used on GCS. Default autoscaler is just as powerful)
Main - https://karpenter.sh/v1.4/
Key Security and Access Control Documentation
Security Overview - Cake Security Doc.pdf
Managing Users and Namespaces -Managing users and namespaces
Adding Custom Overlays to Cake - Cake Overlays
Programmatic OAUTH Logins for custom apps -Programmatic OAuth Logins
Key ML Ops Documentation
Key Components
Kubeflow Pipelines - Open Source Orchestration Engine focused on ML Ops
Kubeflow Notebooks - notebook authoring application and editing environment
K9S in Notebooks - How to set up k9s in a notebook
Jupyter Magics tips for AI assisted notebooks Jupyter Magics
Tensorboard - Visualization Toolbox
Creating a tensorboard in Kubeflow Creating a Tensorboard
Katib - Hyper-parameter Tuning
Main - https://www.kubeflow.org/docs/components/katib/getting-started/
Extracting Katib Data - Extracting Katib Trial Data
KServe - Open Source Inference Engine.
MLflow - Experiment Tracking and Model Registry. Early evaluation tooling
Adding fine-tuning experiments to MLflow Adding Experiments to MLflow
Ray - Parallel Compute Engine
Ray Serve - Inference Engine
Ray Tune - Hyperparameter Tuning
Feast - Feature Store
Main - https://docs.feast.dev
Feast in Cake - Feast Intro
Key AI Ops Documentation
Overview Flow
AI Ops for Local and Fine-Tuned Models - Cake AI Ops for Fine-Tuned Models
Key Components
vLLM - Performance Focused Model Server for LLMs
Ollama - Experiment Focused Model Server for LLMs
Ray Serve - Inference Engine
KubeRay - https://ray-project.github.io/kuberay/
Ray vLLM for multi-node models - Deploying a fine-tuned model across multiple nodes with KubeRay and vLLM
Langflow - Open Source Agentic Orchestration
Main - https://docs.langflow.org
LiteLLM - Model Proxy
Add deployed model to LiteLLM - Add a Deployed Model to LiteLLM
Langfuse - Open Source Tracing
Main - https://langfuse.com/docs
Tracing Calls to LiteLLM with Langfuse - Tracing Calls to LiteLLM with Langfuse
Langfuse Security Considerations -
Open WebUI - Rich Chat UIs
Main - https://docs.openwebui.com/
Adding LiteLLM models to Open WebUI - Connecting to LiteLLM proxied models via OpenWeb UI
LangGraph - Agentic Framework
Promptfoo - Prompt and Model Evaluation
DSpy - Prompt Generation
Main - https://dspy.ai/learn/
Fine-Tuning - SFT and RLHF
Key Data Engineering Documentation
Airflow - Open Source Orchestration Engine
DBT - Open Source Data Transformation
Airbyte - Low Code Data Movement
Main - https://docs.airbyte.com/
Prefect - Open Source Orchestration Engine