Release Notes

Prev

[1.3.0] - 2025-07-09

Added

  • (platform)

    • deployed_models can now specify namespace

    • Adds job to Platform UI calculate project spends on a nightly basis

Changed

  • (platform)

    • user-namespaces: reworked AuthorizationPolicies

    • langfuse: increase resource requests and limits for greater throughput

  • (infrastructure)

    • aws-ebs 2.44.0

    • aws-efs 3.1.9

    • external-dns 8.8.3

    • external-secrets 0.17.0

  • (platform) crossplane 1.20.0 + providers/functions

    • aws 1.22.0

    • gcp 1.13.0

    • environment-configs 0.4.0

    • terraform 0.21.0

    • sql 0.12.0

  • (platform)

    • aws-efa-k8s-device-plugin 0.5.8

    • aws-load-balancer-controller 1.13.2

  • (platform) ArgoCD 3.0.5 (chart 8.0.13)

  • (infrastructure) karpenter 1.5.0

  • (platform)

    • kube-prometheus-stack 74.0.0

    • kubeflow 1.10.1

    • evidently 0.7.7

    • cert-manager 1.17.2 using helm chart

  • (platform) Platform UI is now a helm chart

  • (platform) grafana authenticates via JWT token and grants Admin to cake-admin

  • (platform) Cake API is now a helm chart

  • (platform)

    • label-studio 1.19.0 via chart 1.9.15

    • airflow v2.11.0

    • airbyte v1.5.5

    • mlflow v2.22.1

    • Arize Phoenix 10.11.0

    • langflow v1.5.0

    • feast 0.49.0

    • opencost 1.115.0 (chart 2.1.5)

    • promptfoo 0.115.1

    • weaviate 1.30.0

    • milvus 2.5.13

    • prefect 3.4.6

    • litellm 1.72.6-stable

    • open-webui v0.6.15 (chart 6.21.0)

    • langfuse 3.72.1

    • orthanc 1.12.8 (image 25.6.4)

    • jaeger v2.7.0

    • opentelemetry-operator 0.90.0

    • ohif 3.10.2

    • attu 2.5.11

  • (platform) Make litellm accessible from open-webui via open-webui.<base host>/litellm

  • (platform)

    • opencost-istio-resources merged into opencost

    • platform-ui can now reach opencost

  • (infrastructure) k8s 1.33

  • (platform) ray-service has access to mlflow's s3 bucket

  • (infrastructure) AWS: Configuration for the shared RDS instance

  • prefect uses the shared database

  • (infrastructure) AWS: Allow model registry to be parameterized

  • (platform) cake-api v1.6.0

Fixed

  • (platform)

    • LiteLLM "Test Key" page can talk to VLLM-hosted models without error

    • app_config.platform_ui.model_registry_bucket now works

  • (bootstrap) Fix build role docker buildx push to private ECR

[1.2.6] - 2025-05-29

Added

  • (platform) LiteLLM helm chart (app version v1.69.0)

Changed

  • external-secrets resources are v1

  • crossplane functions are v1

  • (platform) ray-service 2.46.0

  • (platform) ray-cluster 2.46.0

  • (platform) only cake-admin users can access ArgoCD

Fixed

  • (platform) Default milvus cluster

  • (platform) Shared database passwords are now recreated when the database CRDs are recreated

  • (platform) Passwords for langfuse redis/valkey are escaped for use in connection string

1.2.5 - 2025-05-07

Changed

  • (platform) opencost 2.1.1

  • (platform) ray-service 2.45.0

  • (platform) ray-cluster 2.45.0

  • (platform) Allow Cake API to access argocd applicationsets

  • (infrastructure) tailscale requires a primary tag

  • (infrastructure) Karpenter default node pool does not allow metal

  • (platform) milvus 2.5.11

  • (platform) milvus operator 1.2.6

  • (platform) milvus attu 2.5.8

Fixed

  • (platform) langfuse oauth

  • (platform) open-webui routing

1.2.4 - 2025-05-07

Changed

  • (platform) langfuse does not version its buckets

Removed

  • (platform) - opencost: Removed spot instance cost support

1.2.3 - 2025-05-06

Added

  • karpenter

    • requirements for the default nodepool can be configured

    • requirements for the default GPU nodepool can be configured

  • (AWS) Create a tailscale exit node if an auth key is provided

  • (platform) Jupyter images can be configured

  • (platform) Add Elasticache Crossplane Composition

  • (AWS) bootstrap deploy role can add additional policies and statements

  • (AWS) Allow tagging for all resources or crossplane resources

  • (GCP) Allow labels for all resources or crossplane resources

  • (platform) Langfuse Helm Chart (app version 3.54.0)

Changed

  • (AWS) karpenter 1.4.0

    • AMI alias 20250410

  • (platform)

    • argocd 7.8.27

    • crossplane: providers switch to crossplane-contrib

    • evidently 0.7.3

    • langflow 1.3.4

    • promptfoo available as a Project App

  • (infrastructure)

    • aws-ebs 2.42.0

    • external-dns 8.7.11

    • external-secrets 0.16.1

  • (AWS) s3_prefix defaults to the account id.

  • (GCP) gcs_prefix defaults to the project number.

Fixed

Removed

  • (platform) - Add kubernetes-reflector to allow sharing secrets and configmaps between namespaces

  • (platform) - dex: No more default static user/passwords (inference-client)

1.2.2 - 2025-04-11

Fixed

  • (platform) Kubeflow misconfiguration preventing new jupyter notebooks from being created

1.2.1 - 2025-04-10

Added

  • (platform) mlflow - Access to mlflow from staging/production namespaces

  • (platform) add st1 and sc1 EBS volume types as Kubernetes storage classes

  • (platform) Sort project list for consistent iteration

  • (platform) fix Label Studio app launcher link

  • (platform) Allow Langflow to access platform-ui resources (e.g. deployed models)

  • (platform) add policy to allow cake-api access to aws secrets manager

  • (platform) Subdomains can be configured for Ingress via Istio

  • (GCP) variable cloud_dns_managed_zone to limit external-dns to a specific DNS zone

  • (platform) langflow-ide - Add database for components accessible via CAKE_SHARED_DB_URL

Changed

  • (AWS) karpenter 1.3.3

    • default AMI is pinned to an al2023 image

    • nodes never expire but can be configured otherwise

  • (platform) update Ray Grafana dashboards

  • (platform) ray-service 2.44.1

  • (platform) ray-cluster 1.3.2

  • (platform) kuberay-operator 1.3.2

  • (platform) open-webui 0.5.20

  • (platform) weaviate 1.29.1

  • (platform) weaviate will default to a 2 shard cluster

  • (platform) Arize Phoenix 8.14.1

  • (platform) Arize Phoenix migrated from SQLite to PostgreSQL

  • (platform) Langflow 1.3.0

  • (platform) argocd 7.8.23

  • (platform) promptfoo 0.2.3:0.107.6

  • (platform) Kubeflow 1.10.0

  • (platform) open-webui 6.0.0:0.6.0

  • (platform) open-webui ollama 0.6.3

  • (platform) label-studio 1.16.0

  • (platform) evidently 0.6.7

  • (platform) kube-prometheus-stack 70.4.2

  • (platform) feast 0.48.0

  • (infrastructure)

    • external-secrets 0.15.1

    • external-dns 8.7.10

      • use a txt prefix, previous ones will need to be removed manually

    • aws-ebs 2.41.0

    • aws-efs 3.1.8

    • use m6a ec2 instances for the critical node pool

    • use 6th generation or higher M series for the default node pool

Fixed

  • (platform) platform-ui image tags are properly configured

Removed

  • (platform) Kubecost is removed

1.2.0 - 2025-03-05

Added

  • (platform) allow langflow to access the weaviate instance for vector store

  • (infrastructure, gcp) Allow disable filestore backup via filestore_backup_enabled

  • (infrastructure, gcp) New flags for optional features

    • filestore_backup_enabled

    • gcfs_enabled

    • cluster_autoscaling_profile (default to BALANCED)

  • (platform) improved overlay support

  • kustomize works for mixed helm + kustomize apps like kube-prometheus-stack

  • prefect Helm values can be overlayed

  • (platform) Allow platform-ui access to Weaviate instances

  • (platform) Add github actions oidc to allow modifying platform-ui/cake-api from github actions

  • (platform) Langflow-ide updated to v0.0.6 and supports persistent volume for /cakefs and auto-login

  • (platform) Langflow-ide has its own ollama deployment that pre-loads models; updated karpenter resources to support more gpu types

Changed

  • (infrastructure, gcp) Artifact Registry resources are now project-wide (managed in bootstrap)

  • (platform) ray-service 2.42.0

  • (platform) ray-cluster 2.42.0

  • (platform) prefect 3.1.15

  • (platform) jaeger 1.65.0

  • (platform) Ray Serve autoscaler rename removed parameter target_num_ongoing_requests_per_replica --> target_ongoing_requests

  • (platform) Custom Ray image for GPU nodes no longer needed, nor is ray-ml based images

  • (platform) Langflow-ide 1.1.4

    • Enable Istio and proxy directly to backend for API routes to skip 60s timeout in nginx

  • (platform) Ray cluster now supports p4d, p5, g5, g6 instance types

  • (platform) Langflow-ide now supports persistent volume for /cakefs

  • (AWS) karpenter 1.2.1

  • (platform) argocd 7.8.7

  • (infrastructure)

    • external-secrets 0.14.2

    • external-dns 8.7.4

    • aws-efs 3.1.7

    • aws-ebs 2.40.0

  • (infrastructure) EKS and GKE default to k8s 1.32

  • (infrastructure) Karpenter uses AL2023 based images

  • crossplane 0.19.0

    • aws 1.20.1

    • gcp 1.11.4

    • kcl 0.11.2

  • (platform) Arize Phoenix 8.8.0

Fixed

  • (platform) critical error in config for KServe InferenceGraphs

  • (platform) registry-rewriter handles initContainers

1.1.3 - 2025-01-30

Fixed

  • (platform) Support long cluster names for the shared postgres database

1.1.2 - 2025-01-29

Added

  • (platform) Crossplane support for Postgres databases

  • (platform) dex: Configurable static passwords

  • (platform) preview version of Langflow-ide v1.1.3

  • (platform) ohif: support for Orthanc backend as well as ingestion from S3

Changed

  • (platform) Add Ray Serve Fault Tolerance using cloud based Valkey DB (AWS only)

  • (platform) Airbyte minio disk size increased to 10 GiB

  • (platform) Kubeflow Notebook images updated to Cake latest 0.0.1

  • (AWS) Tighten crossplane IAM permissions (crossplane-resources must be pruned)

    • Only allowed to create/modify Cake IAM resources

    • Only allows to create/modify S3 buckets based on bucket naming

  • (platform) milvus operator 1.1.9

  • (platform) milvus 2.4.21

  • (platform) aws provider v1.19.0

  • (platform) gcp provider v1.11.2

  • (platform) aws-load-balancer-controller 1.11.0

  • (platform) k8s-device-plugin 0.17.0

  • (platform) karpenter: Replace 20% of the EFA nodes (max 5) at a time

  • (platform) istio: PDBs are included

  • (AWS) network-aws: must specify region to create the VPC within

  • (platform) user-namespaces

    • ml_pipeline poddefault added only when kubeflow_pipeline is enabled

    • modelmesh authorization policy added when modelmesh is enabled

  • (infrastructure) terraform-modules has been relocated to infrastructure/modules

  • (AWS) aws-ebs-csi-driver 2.39.3

Fixed

  • (AWS) Fixed ecr_helm_registries resources are now actually applied

  • (AWS) Elastic Fabric Adapter support

    • Prevent GPU only workloads from using the EFA nodepool by resolving to the same set of instance-types.

  • (platform-ui) Explicitly allow all traffic between workloads in the namespace.

Removed

  • (Orthanc) static-wado data source

  • (platform) user-namespaces poddefaults are removed for snowflake, and weights and biases

1.1.1 - 2024-11-21

Added

  • (AWS) Optional VPC endpoints for S3 and ECR

  • (platform) AuthorizationPolicy to allow ingress to Platform-UI

  • (platform) Added weaviate 17.3.2

  • (AWS) Elastic Fabric Adapter support

    • Managed nodegroups can enable EFA support (must have label vpc.amazonaws.com/efa="true")

    • aws-efa-device-plugin always deployed on AWS

    • Workloads must request at least 1 vpc.amazonaws.com/efa resource, and it must have either a node selector or affinity with vpc.amazonaws.com/efa="true"

Changed

  • (infrastructure) terraform-modules 1.5.1

  • (platform) Removed CPU limit from Dex

  • (platform) Istio provides the global deny-all (allow-nothing) policy.2024-11-20

  • (platform) Updated MLFlow from v2.9.2 to v2.17.2 to fix version mismatch issues

    • Added graceful handling when database isn't initialized for db upgrade

  • (infrastructure) kflow-platform-aws new variables for dex providers:

    • google_auth_enabled

    • google_auth_groups_acl_enabled

    • microsoft_auth_enabled

  • (infrastructure) kflow-platform-aws new variables for customer-installed apps:

    • kserve_inference_logging_enabled

    • mlflow_enabled

    • ohif_enabled

    • opencost_enabled

  • (platform) KServe inference logging can be toggled with app_config.kserve.inference_logging.enabled

  • crossplane 0.18.1

    • kcl 0.10.10

    • gcp 1.11.0

  • (platform) Added AuthorizationPolicy changes for ClearML app api files

  • (platform) Updated dependencies:

    • aws-sdk to 5.79.0

    • kubernetes to 2.34.0

    • google-cloud-* to latest versions

    • promptfoo to 0.2.2

  • (platform) Granted access to ray-cluster-admin to platform-ui

  • (platform) Added GCS mount logging configuration options

Fixed

  • (infrastructure) Fixed build role IAM policy attachment

  • (platform) Fixed critical authorization issue preventing access to Milvus, Prefect, and ClearML

  • (platform) Fixed issue preventing the "Link ArgoCD" GitHub Actions workflow from working the first time

  • (platform) Improved reliability of ArgoCD sync status

1.1.0 - 2024-11-20

Added

  • (platform) modelmesh is disabled by default

    • set kserve.modelmesh.enabled to true to enable

  • (platform-ui) Plaform UI has permissions to set 🤗 secret

  • (platform) support oidc auth providers

  • (AWS) Addition configuration values for EKS add ons can be provided

  • (AWS) Network ACLs can be specified for public, private and database subnets

  • (platform) Include platform version number in configmap

  • (cake-api) Reads authorization policy from configmap instead of vars file directly

  • (platform-ui) Adds startup and liveness probes so that we can restart pod if cake-api version changes

  • (platform-ui) Adds support for specifying node selectors to deployed models

  • (infrastructure) optional feast_enabled variable for AWS & GCP - creates a redshift cluster or bigquery respectively

  • (platform) crossplane

    • function-environment-configs 0.1.0

  • (GCP) Optional CloudDNS logging, cloud_dns_logging, disabled by default

  • (GCP) Buckets have uniform bucket level access enabled

  • (platform-ui) Adds support for self serve karpenter node pools

  • (platform) ArgoCD can authenticate to cross-account private ECR registries for helm charts

  • (platform) Support per-project installation of evidently

Changed

  • (infrastructure) terraform-modules 1.4.0

  • (platform) cake-api v1.2.0

  • (infrastructure) BREAKING: bootstrap/github-actions-aws now uses a var for OIDC providers

    • github_repo var is now replaced with oidc_providers

    • vars for bootstrap/github-actions-aws moved from CI workflows to the module itself

  • (platform) BREAKING: crossplane 1.18.0

    • migration: update the ArgoCD application before terraform

    • aws provider v1.17.0

    • gcp provider v1.9.0

    • kcl function v0.10.8

  • (platform) jaeger-operator v2.57

  • (infrastructure, platform) modelmesh User/ServiceAccount is managed by Crossplane

  • destroy.sh

    • (AWS) removes user access keys

    • (GCP) deletes the gke cluster

  • (platform) Kubeflow edit and admin roles include the kuberay operator role

  • (platform) Kubeflow 1.9.1

  • (infrastructure) upgrade to Kubernetes 1.31

    • GKE switching to REGULAR release channel

  • (platform) Prometheus chart 65.5.0

  • (platform) ArgoCD 7.7.3

  • (platform) aws-load-balancer-controller 1.10.0

  • (platform) metrics-server 3.12.2

  • (platform) opentelemetry-operator 0.72.1

  • (platform) open-webui 3.4.3

  • (platform) MLflow 2.17.2

    • MLflow db migrations now run as a Job on ArgoCD sync

    • istio-injection enabeld by default

  • (platform) feast 0.41.3

  • (platform) Prefect 3.1.2 (chart 2024.11.12215108)

Fixed

  • user-namespace: no modelmesh secret when disabled

1.0.3 - 2024-11-01

Added

  • (cake-api) Support for supplying keys for commit signing

  • (platform) cake-api v1.1.0

1.0.2 - 2024-10-21

Changed

  • (platform) cake-api v1.0.1

Fixed

1.0.1 - 2024-10-16

Added

  • (platform) new component: open-webui

  • (platform) Crossplane grants its own permissions

  • (GCP) kflow-platform-gcp will provide details for the gcp configuration block for gen_deploy

Changed

  • (platform) user-namespace is a helm chart

    • gen_deploy simplified, no more magic to generate user-namespaces or prefect applications

  • (infrastructure) terraform-modules 1.4.0

    • (infrastructure, platform) WorkloadIdentities and IRSA roles are managed by Crossplane

      • ray-cluster, kubeflow users, airbyte, airflow, feast, prefect, mlflow

    • (infrastructure, platform) Buckets are managed by Crossplane

    • kserve, kfp (kubeflow pipelines), cakefs, mlflow

  • (GCP) feast uses defined roles

  • (GCP) New buckets are regional, no longer multi-regional. Existing buckets are not changed.

  • (AWS) aws-load-balancer-controller chart v1.9.1 (v2.9.1)

    • enable cert-manager

    • supports Listener Attributes

  • (platform) Kubeflow profile-controller assigns k8s ServiceAccount to AWS Role/GCP ServiceAccount

  • (cake-api) now has permissions to sync argocd apps

  • (platform) most app-specific Terraform resources are now managed by Crossplane

  • (platform) Kubeflow 1.9.1-rc2 improves Istio security

  • (platform) KubeRay Operator 1.22, Ray 2.37.0-py311

  • (GCP) enable app engine for feast in bootstrap (one time per project)

    • import required if it was already enabled since it cannot be recreated.

Fixed

  • (platform) Removed unused kubeflow-jwt RequestAuthentication that was breaking KFP auth flows

  • (platform) Fixed auth flows for ClearML

Removed

  • (AWS) dvc bucket is removed.

1.0.0 - 2024-09-23

Added

  • (AWS) Karpenter 1.0.1

    • Default node pools for non-GPU and GPU have minimal disruption budget

    • AWS subnets are marked for use with Karpenter

  • (platform) OpenCost now has support for cloud costs

  • (platform-ui) VLLM runtime supports custom driver options

  • (gen_deploy) disable ArgoCD auto-sync by default (re-enable with app_config.defaults.sync_policy.automated)

  • (gen_deploy) allow global change of ArgoCD targetRevision

  • (gen_deploy) Ray: add minimal_resources flag to toggle Ray FT

  • (GCP, infrastructure) module kflow-platform-gcp variable added: gke_service_account (nodepool service account)

  • (platform) Airflow: log in with Microsoft

    • (gen_deploy) added app_config.airflow.auth_providers which defaults to false

  • (AWS, scripts) destroy script for AWS

Changed

  • (infrastructure) terraform-modules 1.3.2

  • Kubernetes 1.30

  • Istio 1.22.1

  • ArgoCD is now installed via the "Link ArgoCD" GitHub Action workflow or similar script instead of being managed by Terraform

  • First class support for Crossplane

    • (AWS) IRSA roles are now managed by Crossplane

  • Airbyte 0.64.3

    • (AWS) now with IRSA role

  • Crossplane 1.17.1

  • Kubeflow 1.9.0

  • Kuberay Operator 1.2.1

  • Milvus 2.4.10

  • Milvus Attu 2.4.8

  • Milvus Operator 1.0.6

  • MLflow 2.15.1

  • OHIF 3.8.3

  • Ray Service 2.35.0-py311

  • (AWS) aws-load-balancer-controller has 1 replica

  • (GCP) GKE Node Pools install nvidia driver

  • (GCP) Services for a project are enabled via bootstrap/github-actions-gcp rather than kflow-platform-gcp

Deprecated

  • (AWS, infrastructure) AWS managed node groups are deprecated in favor of Crossplane node pools

  • (platform-ui) Ollama is now a deprecated runtime

Removed

  • GCP module kflow-platform-gcp variables removed: cluster_oidc_provider_arn, gke_ca_certificate, gke_endpoint