How to use Cake - Overview

Cake Overview

Cake is an end-to-end environment for managing the entire AI lifecycle, from data engineering and model training, all the way to inference and monitoring. This article will guide you through the high-level Cake platform architecture, providing an overview of how its design choices streamline AI operations while maintaining flexibility, security, and control.

Holistic Lifecycle Management

The Cake platform integrates the entire range of capabilities needed for managing the AI lifecycle, including:

Advanced large language and embedding models
Multi-agent systems
3D parallel training and fine-tuning capabilities
Model monitoring and observability tools
GPU auto-scaling (from zero to thousands of nodes) for both training and inference
Exploratory data analysis (EDA) and AutoML frameworks
Cloud cost monitoring and optimization
PII/PHI anonymization utilities

Built to handle both traditional ML and generative AI workloads, Cake provides centralized management—a “single pane of glass”—to oversee every AI project.

Deployment Flexibility

Cake deploys directly into your own virtual private cloud (VPC) or on-premises infrastructure. This ensures no sensitive data ever leaves your environment. With encryption both in transit and at rest, along with robust Kubernetes role-based access controls (RBAC), Cake prioritizes security at every layer.

Every component is authenticated, and platform access is scoped based on user roles, ensuring a least-privilege model. Even the deployment itself adheres to infrastructure-as-code (IaC) principles, where all changes are version-controlled through Git repositories. This gives teams full transparency and control over their infrastructure.

Cake and Open Source

Understanding how to use Cake really means understanding how to effectively leverage a carefully curated stack of open source tools and frameworks. The Cake platform is built on a modular architecture that stitches together best-in-class open technologies—augmented with Cake’s custom tooling—to support every phase of the generative AI lifecycle. To make the most of the platform, users should become familiar with the foundational components across four key domains: Core, ML Ops, AI Ops, and Data Engineering.

Core encompasses the base platform components like Kubernetes, Istio, and Prometheus
ML Ops focuses on model development workflows, including training, fine-tuning, experiment tracking (e.g., with MLFlow), and inference (via Ray and KubeRay).
AI Ops deals with monitoring, tracing, alerting, and serving infrastructure for generative and agentic components—leveraging tools like vLLM, LangFuse, and LiteLLM to ensure operational reliability and observability of AI Ops

Key Core Platform Documentation

Key Components

Kubernetes - Orchestration Platform
- Main - https://kubernetes.io/docs/home/
- Calling the Kubernetes API -Kubeflow In-Cluster Config and API
- Lens - https://docs.k8slens.dev/
Istio - Service Mesh. East-West
- Main - https://istio.io/latest/docs/
- Accessing Cake Resources Externally - Accessing Cake Platform Resources Externally
Envoy - North-South Gateway
- Main - https://gateway.envoyproxy.io/docs/
Dex - Integrates Kubernetes with Enterprise Iam
- Main - https://dexidp.io/docs/getting-started/
Prometheus - Metrics Engine
- Main - https://prometheus.io/docs/prometheus/latest/getting_started/
- Monitoring your Ray Deployed Models with Prometheus and Grafana - Monitoring your Ray Deployed Models with Prometheus and Grafana
Grafana - Dashboarding and Alerting
- Main - https://grafana.com/docs/grafana/latest/
Terraform - Infrastructure as Code
- Main - https://developer.hashicorp.com/terraform/docs
ArgoCD - IaC for Kubernetes, Terraform for Helm and Kustomize
- Main - https://argo-cd.readthedocs.io/en/stable/
Crossplane - Sets Kubernetes as universal control plane
- Main - https://docs.crossplane.io/latest/
Karpenter - Autoscaler used in-place of default AWS/Azure auto scaler (Not used on GCS. Default autoscaler is just as powerful)
- Main - https://karpenter.sh/v1.4/

Key Security and Access Control Documentation

Security Overview - Cake Security Doc.pdf
Managing Users and Namespaces -Managing users and namespaces
Adding Custom Overlays to Cake - Cake Overlays
Programmatic OAUTH Logins for custom apps -Programmatic OAuth Logins

Key ML Ops Documentation

Key Components

Kubeflow Pipelines - Open Source Orchestration Engine focused on ML Ops
- Main - https://www.kubeflow.org/docs/components/pipelines/
- KServe and KFP - Deploying a Model from MLflow to KServe with Kubeflow Pipelines v2
Kubeflow Notebooks - notebook authoring application and editing environment
- Main - https://jupyterlab.readthedocs.io/en/latest/
- K9S in Notebooks - How to set up k9s in a notebook
- Jupyter Magics tips for AI assisted notebooks Jupyter Magics
Tensorboard - Visualization Toolbox
- Main - https://www.tensorflow.org/tensorboard/get_started
- Creating a tensorboard in Kubeflow Creating a Tensorboard
Katib - Hyper-parameter Tuning
- Main - https://www.kubeflow.org/docs/components/katib/getting-started/
- Extracting Katib Data - Extracting Katib Trial Data
KServe - Open Source Inference Engine.
- Main - https://kserve.github.io/website/latest/
- KFP - Deploying a Model from MLflow to KServe with Kubeflow Pipelines v2
MLflow - Experiment Tracking and Model Registry. Early evaluation tooling
- Main - https://mlflow.org/docs/latest/index.html
- Adding fine-tuning experiments to MLflow Adding Experiments to MLflow
Ray - Parallel Compute Engine
- Main - https://docs.ray.io/en/latest/index.html
Ray Serve - Inference Engine
- Main - https://docs.ray.io/en/latest/serve/index.html
- KubeRay - https://ray-project.github.io/kuberay/
Ray Tune - Hyperparameter Tuning
- Main https://docs.ray.io/en/latest/tune/index.html
Feast - Feature Store
- Main - https://docs.feast.dev
- Feast in Cake - Feast Intro

Key AI Ops Documentation

Overview Flow

AI Ops for Local and Fine-Tuned Models - Cake AI Ops for Fine-Tuned Models

Key Components

vLLM - Performance Focused Model Server for LLMs
- Main - https://docs.vllm.ai/en/latest/
Ollama - Experiment Focused Model Server for LLMs
- Main - https://github.com/ollama/ollama/tree/main/docs
Ray Serve - Inference Engine
- Main - https://docs.ray.io/en/latest/serve/index.html
- KubeRay - https://ray-project.github.io/kuberay/
- Ray vLLM for multi-node models - Deploying a fine-tuned model across multiple nodes with KubeRay and vLLM
Langflow - Open Source Agentic Orchestration
- Main - https://docs.langflow.org
LiteLLM - Model Proxy
- Main - https://docs.litellm.ai/docs/
- Add deployed model to LiteLLM - Add a Deployed Model to LiteLLM
Langfuse - Open Source Tracing
- Main - https://langfuse.com/docs
- Tracing Calls to LiteLLM with Langfuse - Tracing Calls to LiteLLM with Langfuse
- Langfuse Security Considerations -
Open WebUI - Rich Chat UIs
- Main - https://docs.openwebui.com/
- Adding LiteLLM models to Open WebUI - Connecting to LiteLLM proxied models via OpenWeb UI
LangGraph - Agentic Framework
- Main - https://langchain-ai.github.io/langgraph/concepts/why-langgraph/
Promptfoo - Prompt and Model Evaluation
- Main - - https://www.promptfoo.dev/docs/getting-started/
DSpy - Prompt Generation
- Main - https://dspy.ai/learn/
Fine-Tuning - SFT and RLHF

Key Data Engineering Documentation

Airflow - Open Source Orchestration Engine
- Main - https://airflow.apache.org/docs/
DBT - Open Source Data Transformation
- Main - https://docs.getdbt.com/docs/build/documentation
Airbyte - Low Code Data Movement
- Main - https://docs.airbyte.com/
Prefect - Open Source Orchestration Engine
- Main - https://docs.prefect.io/v3/get-started