[1.3.0] - 2025-07-09
Added
(platform)
deployed_models can now specify namespace
Adds job to Platform UI calculate project spends on a nightly basis
Changed
(platform)
user-namespaces: reworked AuthorizationPolicies
langfuse: increase resource requests and limits for greater throughput
(infrastructure)
aws-ebs 2.44.0
aws-efs 3.1.9
external-dns 8.8.3
external-secrets 0.17.0
(platform) crossplane 1.20.0 + providers/functions
aws 1.22.0
gcp 1.13.0
environment-configs 0.4.0
terraform 0.21.0
sql 0.12.0
(platform)
aws-efa-k8s-device-plugin 0.5.8
aws-load-balancer-controller 1.13.2
(platform) ArgoCD 3.0.5 (chart 8.0.13)
(infrastructure) karpenter 1.5.0
(platform)
kube-prometheus-stack 74.0.0
kubeflow 1.10.1
evidently 0.7.7
cert-manager 1.17.2 using helm chart
(platform) Platform UI is now a helm chart
(platform) grafana authenticates via JWT token and grants Admin to cake-admin
(platform) Cake API is now a helm chart
(platform)
label-studio 1.19.0 via chart 1.9.15
airflow v2.11.0
airbyte v1.5.5
mlflow v2.22.1
Arize Phoenix 10.11.0
langflow v1.5.0
feast 0.49.0
opencost 1.115.0 (chart 2.1.5)
promptfoo 0.115.1
weaviate 1.30.0
milvus 2.5.13
prefect 3.4.6
litellm 1.72.6-stable
open-webui v0.6.15 (chart 6.21.0)
langfuse 3.72.1
orthanc 1.12.8 (image 25.6.4)
jaeger v2.7.0
opentelemetry-operator 0.90.0
ohif 3.10.2
attu 2.5.11
(platform) Make litellm accessible from open-webui via open-webui.<base host>/litellm
(platform)
opencost-istio-resources merged into opencost
platform-ui can now reach opencost
(infrastructure) k8s 1.33
(platform) ray-service has access to mlflow's s3 bucket
(infrastructure) AWS: Configuration for the shared RDS instance
prefect uses the shared database
(infrastructure) AWS: Allow model registry to be parameterized
(platform) cake-api v1.6.0
Fixed
(platform)
LiteLLM "Test Key" page can talk to VLLM-hosted models without error
app_config.platform_ui.model_registry_bucket
now works
(bootstrap) Fix build role docker buildx push to private ECR
[1.2.6] - 2025-05-29
Added
(platform) LiteLLM helm chart (app version v1.69.0)
Changed
external-secrets resources are v1
crossplane functions are v1
(platform) ray-service 2.46.0
(platform) ray-cluster 2.46.0
(platform) only cake-admin users can access ArgoCD
Fixed
(platform) Default milvus cluster
(platform) Shared database passwords are now recreated when the database CRDs are recreated
(platform) Passwords for langfuse redis/valkey are escaped for use in connection string
1.2.5 - 2025-05-07
Changed
(platform) opencost 2.1.1
(platform) ray-service 2.45.0
(platform) ray-cluster 2.45.0
(platform) Allow Cake API to access argocd applicationsets
(infrastructure) tailscale requires a primary tag
(infrastructure) Karpenter default node pool does not allow metal
(platform) milvus 2.5.11
(platform) milvus operator 1.2.6
(platform) milvus attu 2.5.8
Fixed
(platform) langfuse oauth
(platform) open-webui routing
1.2.4 - 2025-05-07
Changed
(platform) langfuse does not version its buckets
Removed
(platform) - opencost: Removed spot instance cost support
1.2.3 - 2025-05-06
Added
karpenter
requirements for the default nodepool can be configured
requirements for the default GPU nodepool can be configured
(AWS) Create a tailscale exit node if an auth key is provided
(platform) Jupyter images can be configured
(platform) Add Elasticache Crossplane Composition
(AWS) bootstrap deploy role can add additional policies and statements
(AWS) Allow tagging for all resources or crossplane resources
(GCP) Allow labels for all resources or crossplane resources
(platform) Langfuse Helm Chart (app version 3.54.0)
Changed
(AWS) karpenter 1.4.0
AMI alias 20250410
(platform)
argocd 7.8.27
crossplane: providers switch to crossplane-contrib
evidently 0.7.3
langflow 1.3.4
promptfoo available as a Project App
(infrastructure)
aws-ebs 2.42.0
external-dns 8.7.11
external-secrets 0.16.1
(AWS) s3_prefix defaults to the account id.
(GCP) gcs_prefix defaults to the project number.
Fixed
Removed
(platform) - Add kubernetes-reflector to allow sharing secrets and configmaps between namespaces
(platform) - dex: No more default static user/passwords (inference-client)
1.2.2 - 2025-04-11
Fixed
(platform) Kubeflow misconfiguration preventing new jupyter notebooks from being created
1.2.1 - 2025-04-10
Added
(platform) mlflow - Access to mlflow from staging/production namespaces
(platform) add st1 and sc1 EBS volume types as Kubernetes storage classes
(platform) Sort project list for consistent iteration
(platform) fix Label Studio app launcher link
(platform) Allow Langflow to access platform-ui resources (e.g. deployed models)
(platform) add policy to allow cake-api access to aws secrets manager
(platform) Subdomains can be configured for Ingress via Istio
(GCP) variable cloud_dns_managed_zone to limit external-dns to a specific DNS zone
(platform) langflow-ide - Add database for components accessible via CAKE_SHARED_DB_URL
Changed
(AWS) karpenter 1.3.3
default AMI is pinned to an al2023 image
nodes never expire but can be configured otherwise
(platform) update Ray Grafana dashboards
(platform) ray-service 2.44.1
(platform) ray-cluster 1.3.2
(platform) kuberay-operator 1.3.2
(platform) open-webui 0.5.20
(platform) weaviate 1.29.1
(platform) weaviate will default to a 2 shard cluster
(platform) Arize Phoenix 8.14.1
(platform) Arize Phoenix migrated from SQLite to PostgreSQL
(platform) Langflow 1.3.0
(platform) argocd 7.8.23
(platform) promptfoo 0.2.3:0.107.6
(platform) Kubeflow 1.10.0
(platform) open-webui 6.0.0:0.6.0
(platform) open-webui ollama 0.6.3
(platform) label-studio 1.16.0
(platform) evidently 0.6.7
(platform) kube-prometheus-stack 70.4.2
(platform) feast 0.48.0
(infrastructure)
external-secrets 0.15.1
external-dns 8.7.10
use a txt prefix, previous ones will need to be removed manually
aws-ebs 2.41.0
aws-efs 3.1.8
use m6a ec2 instances for the critical node pool
use 6th generation or higher M series for the default node pool
Fixed
(platform) platform-ui image tags are properly configured
Removed
(platform) Kubecost is removed
1.2.0 - 2025-03-05
Added
(platform) allow langflow to access the weaviate instance for vector store
(infrastructure, gcp) Allow disable filestore backup via
filestore_backup_enabled
(infrastructure, gcp) New flags for optional features
filestore_backup_enabled
gcfs_enabled
cluster_autoscaling_profile
(default to BALANCED)
(platform) improved overlay support
kustomize works for mixed helm + kustomize apps like kube-prometheus-stack
prefect Helm values can be overlayed
(platform) Allow platform-ui access to Weaviate instances
(platform) Add github actions oidc to allow modifying platform-ui/cake-api from github actions
(platform) Langflow-ide updated to v0.0.6 and supports persistent volume for /cakefs and auto-login
(platform) Langflow-ide has its own ollama deployment that pre-loads models; updated karpenter resources to support more gpu types
Changed
(infrastructure, gcp) Artifact Registry resources are now project-wide (managed in bootstrap)
(platform) ray-service 2.42.0
(platform) ray-cluster 2.42.0
(platform) prefect 3.1.15
(platform) jaeger 1.65.0
(platform) Ray Serve autoscaler rename removed parameter target_num_ongoing_requests_per_replica --> target_ongoing_requests
(platform) Custom Ray image for GPU nodes no longer needed, nor is ray-ml based images
(platform) Langflow-ide 1.1.4
Enable Istio and proxy directly to backend for API routes to skip 60s timeout in nginx
(platform) Ray cluster now supports p4d, p5, g5, g6 instance types
(platform) Langflow-ide now supports persistent volume for /cakefs
(AWS) karpenter 1.2.1
(platform) argocd 7.8.7
(infrastructure)
external-secrets 0.14.2
external-dns 8.7.4
aws-efs 3.1.7
aws-ebs 2.40.0
(infrastructure) EKS and GKE default to k8s 1.32
(infrastructure) Karpenter uses AL2023 based images
crossplane 0.19.0
aws 1.20.1
gcp 1.11.4
kcl 0.11.2
(platform) Arize Phoenix 8.8.0
Fixed
(platform) critical error in config for KServe InferenceGraphs
(platform) registry-rewriter handles initContainers
1.1.3 - 2025-01-30
Fixed
(platform) Support long cluster names for the shared postgres database
1.1.2 - 2025-01-29
Added
(platform) Crossplane support for Postgres databases
(platform) dex: Configurable static passwords
(platform) preview version of Langflow-ide v1.1.3
(platform) ohif: support for Orthanc backend as well as ingestion from S3
Changed
(platform) Add Ray Serve Fault Tolerance using cloud based Valkey DB (AWS only)
(platform) Airbyte minio disk size increased to 10 GiB
(platform) Kubeflow Notebook images updated to Cake latest 0.0.1
(AWS) Tighten crossplane IAM permissions (crossplane-resources must be pruned)
Only allowed to create/modify Cake IAM resources
Only allows to create/modify S3 buckets based on bucket naming
(platform) milvus operator 1.1.9
(platform) milvus 2.4.21
(platform) aws provider v1.19.0
(platform) gcp provider v1.11.2
(platform) aws-load-balancer-controller 1.11.0
(platform) k8s-device-plugin 0.17.0
(platform) karpenter: Replace 20% of the EFA nodes (max 5) at a time
(platform) istio: PDBs are included
(AWS) network-aws: must specify region to create the VPC within
(platform) user-namespaces
ml_pipeline poddefault added only when kubeflow_pipeline is enabled
modelmesh authorization policy added when modelmesh is enabled
(infrastructure) terraform-modules has been relocated to infrastructure/modules
(AWS) aws-ebs-csi-driver 2.39.3
Fixed
(AWS) Fixed
ecr_helm_registries
resources are now actually applied(AWS) Elastic Fabric Adapter support
Prevent GPU only workloads from using the EFA nodepool by resolving to the same set of instance-types.
(platform-ui) Explicitly allow all traffic between workloads in the namespace.
Removed
(Orthanc) static-wado data source
(platform) user-namespaces poddefaults are removed for snowflake, and weights and biases
1.1.1 - 2024-11-21
Added
(AWS) Optional VPC endpoints for S3 and ECR
(platform) AuthorizationPolicy to allow ingress to Platform-UI
(platform) Added weaviate 17.3.2
(AWS) Elastic Fabric Adapter support
Managed nodegroups can enable EFA support (must have label vpc.amazonaws.com/efa="true")
aws-efa-device-plugin always deployed on AWS
Workloads must request at least 1 vpc.amazonaws.com/efa resource, and it must have either a node selector or affinity with vpc.amazonaws.com/efa="true"
Changed
(infrastructure) terraform-modules 1.5.1
(platform) Removed CPU limit from Dex
(platform) Istio provides the global deny-all (allow-nothing) policy.2024-11-20
(platform) Updated MLFlow from v2.9.2 to v2.17.2 to fix version mismatch issues
Added graceful handling when database isn't initialized for db upgrade
(infrastructure) kflow-platform-aws new variables for dex providers:
google_auth_enabled
google_auth_groups_acl_enabled
microsoft_auth_enabled
(infrastructure) kflow-platform-aws new variables for customer-installed apps:
kserve_inference_logging_enabled
mlflow_enabled
ohif_enabled
opencost_enabled
(platform) KServe inference logging can be toggled with
app_config.kserve.inference_logging.enabled
crossplane 0.18.1
kcl 0.10.10
gcp 1.11.0
(platform) Added AuthorizationPolicy changes for ClearML app api files
(platform) Updated dependencies:
aws-sdk to 5.79.0
kubernetes to 2.34.0
google-cloud-* to latest versions
promptfoo to 0.2.2
(platform) Granted access to ray-cluster-admin to platform-ui
(platform) Added GCS mount logging configuration options
Fixed
(infrastructure) Fixed build role IAM policy attachment
(platform) Fixed critical authorization issue preventing access to Milvus, Prefect, and ClearML
(platform) Fixed issue preventing the "Link ArgoCD" GitHub Actions workflow from working the first time
(platform) Improved reliability of ArgoCD sync status
1.1.0 - 2024-11-20
Added
(platform) modelmesh is disabled by default
set kserve.modelmesh.enabled to true to enable
(platform-ui) Plaform UI has permissions to set 🤗 secret
(platform) support oidc auth providers
(AWS) Addition configuration values for EKS add ons can be provided
(AWS) Network ACLs can be specified for public, private and database subnets
(platform) Include platform version number in configmap
(cake-api) Reads authorization policy from configmap instead of vars file directly
(platform-ui) Adds startup and liveness probes so that we can restart pod if cake-api version changes
(platform-ui) Adds support for specifying node selectors to deployed models
(infrastructure) optional feast_enabled variable for AWS & GCP - creates a redshift cluster or bigquery respectively
(platform) crossplane
function-environment-configs 0.1.0
(GCP) Optional CloudDNS logging, cloud_dns_logging, disabled by default
(GCP) Buckets have uniform bucket level access enabled
(platform-ui) Adds support for self serve karpenter node pools
(platform) ArgoCD can authenticate to cross-account private ECR registries for helm charts
(platform) Support per-project installation of evidently
Changed
(infrastructure) terraform-modules 1.4.0
(platform) cake-api v1.2.0
(infrastructure) BREAKING: bootstrap/github-actions-aws now uses a var for OIDC providers
github_repo var is now replaced with oidc_providers
vars for bootstrap/github-actions-aws moved from CI workflows to the module itself
(platform) BREAKING: crossplane 1.18.0
migration: update the ArgoCD application before terraform
aws provider v1.17.0
gcp provider v1.9.0
kcl function v0.10.8
(platform) jaeger-operator v2.57
(infrastructure, platform) modelmesh User/ServiceAccount is managed by Crossplane
destroy.sh
(AWS) removes user access keys
(GCP) deletes the gke cluster
(platform) Kubeflow edit and admin roles include the kuberay operator role
(platform) Kubeflow 1.9.1
(infrastructure) upgrade to Kubernetes 1.31
GKE switching to REGULAR release channel
(platform) Prometheus chart 65.5.0
(platform) ArgoCD 7.7.3
(platform) aws-load-balancer-controller 1.10.0
(platform) metrics-server 3.12.2
(platform) opentelemetry-operator 0.72.1
(platform) open-webui 3.4.3
(platform) MLflow 2.17.2
MLflow db migrations now run as a Job on ArgoCD sync
istio-injection enabeld by default
(platform) feast 0.41.3
(platform) Prefect 3.1.2 (chart 2024.11.12215108)
Fixed
user-namespace: no modelmesh secret when disabled
1.0.3 - 2024-11-01
Added
(cake-api) Support for supplying keys for commit signing
(platform) cake-api v1.1.0
1.0.2 - 2024-10-21
Changed
(platform) cake-api v1.0.1
Fixed
(AWS) Fix critical issue with Kubeflow Pipelines
1.0.1 - 2024-10-16
Added
(platform) new component: open-webui
(platform) Crossplane grants its own permissions
(GCP) kflow-platform-gcp will provide details for the gcp configuration block for gen_deploy
Changed
(platform) user-namespace is a helm chart
gen_deploy simplified, no more magic to generate user-namespaces or prefect applications
(infrastructure) terraform-modules 1.4.0
(infrastructure, platform) WorkloadIdentities and IRSA roles are managed by Crossplane
ray-cluster, kubeflow users, airbyte, airflow, feast, prefect, mlflow
(infrastructure, platform) Buckets are managed by Crossplane
kserve, kfp (kubeflow pipelines), cakefs, mlflow
(GCP) feast uses defined roles
(GCP) New buckets are regional, no longer multi-regional. Existing buckets are not changed.
(AWS) aws-load-balancer-controller chart v1.9.1 (v2.9.1)
enable cert-manager
supports Listener Attributes
(platform) Kubeflow profile-controller assigns k8s ServiceAccount to AWS Role/GCP ServiceAccount
(cake-api) now has permissions to sync argocd apps
(platform) most app-specific Terraform resources are now managed by Crossplane
(platform) Kubeflow 1.9.1-rc2 improves Istio security
(platform) KubeRay Operator 1.22, Ray 2.37.0-py311
(GCP) enable app engine for feast in bootstrap (one time per project)
import required if it was already enabled since it cannot be recreated.
Fixed
(platform) Removed unused kubeflow-jwt RequestAuthentication that was breaking KFP auth flows
(platform) Fixed auth flows for ClearML
Removed
(AWS) dvc bucket is removed.
1.0.0 - 2024-09-23
Added
(AWS) Karpenter 1.0.1
Default node pools for non-GPU and GPU have minimal disruption budget
AWS subnets are marked for use with Karpenter
(platform) OpenCost now has support for cloud costs
(platform-ui) VLLM runtime supports custom driver options
(gen_deploy) disable ArgoCD auto-sync by default (re-enable with
app_config.defaults.sync_policy.automated
)(gen_deploy) allow global change of ArgoCD targetRevision
(gen_deploy) Ray: add minimal_resources flag to toggle Ray FT
(GCP, infrastructure) module kflow-platform-gcp variable added: gke_service_account (nodepool service account)
(platform) Airflow: log in with Microsoft
(gen_deploy) added
app_config.airflow.auth_providers
which defaults to false
(AWS, scripts) destroy script for AWS
Changed
(infrastructure) terraform-modules 1.3.2
Kubernetes 1.30
Istio 1.22.1
ArgoCD is now installed via the "Link ArgoCD" GitHub Action workflow or similar script instead of being managed by Terraform
First class support for Crossplane
(AWS) IRSA roles are now managed by Crossplane
Airbyte 0.64.3
(AWS) now with IRSA role
Crossplane 1.17.1
Kubeflow 1.9.0
Kuberay Operator 1.2.1
Milvus 2.4.10
Milvus Attu 2.4.8
Milvus Operator 1.0.6
MLflow 2.15.1
OHIF 3.8.3
Ray Service 2.35.0-py311
(AWS) aws-load-balancer-controller has 1 replica
(GCP) GKE Node Pools install nvidia driver
(GCP) Services for a project are enabled via bootstrap/github-actions-gcp rather than kflow-platform-gcp
Deprecated
(AWS, infrastructure) AWS managed node groups are deprecated in favor of Crossplane node pools
(platform-ui) Ollama is now a deprecated runtime
Removed
GCP module kflow-platform-gcp variables removed: cluster_oidc_provider_arn, gke_ca_certificate, gke_endpoint