This path is for teams that are comfortable with Terraform and want a more complete AWS environment around Cake Agents. The reference Terraform approach provisions:Documentation Index
Fetch the complete documentation index at: https://docs.cake.ai/llms.txt
Use this file to discover all available pages before exploring further.
- A dedicated DNS zone and ACM certificate
- An EKS cluster, VPC, storage classes, and Karpenter
- An RDS database
- The Cake Agents Helm release on top of that cluster
What to Model in Terraform
The reference approach separates concerns into a few layers:- DNS zone and certificate management
- cluster platform resources such as VPC, EKS, ingress, storage, and autoscaling
- application dependencies such as PostgreSQL
- the final Cake Agents Helm release
Prerequisites
- Terraform
>= 1.6 - An AWS account where Cake Agents will run
- A parent DNS zone where you can delegate a subdomain
- Access to the OCI registry that serves the Cake Agents chart
End State
At the end of this guide you will have:- an AWS environment (DNS, EKS, and database) suitable for Cake Agents
- a deployed Cake Agents Helm release
- a public hostname for the control plane
Step 1: Set Up DNS and TLS
Cake Agents needs a public hostname for the control plane. You can manage DNS and certificates however you normally do, or use a dedicated Terraform module that creates:- a Route53 hosted zone for the Cake Agents environment
- an ACM certificate for the apex hostname and wildcard subdomains
- DNS validation records inside that zone
- a hostname for Cake Agents
- a validated ACM certificate ARN
- a hosted zone ID for the public record
Step 2: Provision the Cluster Stack
The main cluster module provisions:- a dedicated VPC
- an EKS cluster
- Karpenter for node provisioning
- AWS Load Balancer Controller
- an RDS PostgreSQL instance
- a Kubernetes namespace and database secret
- the
cake-agentsHelm release - a Route53 alias record pointing the hostname at the load balancer
Step 3: Decide How the Cluster Pulls Images and Charts
In most customer environments, the cluster should not pull directly from a private upstream ECR registry across accounts on every request. The recommended pattern is to configure a pull-through cache rule in the customer account and point Cake Agents at that cached registry path. Benefits:- simpler runtime access from the cluster
- fewer cross-account registry dependencies during normal operation
- better control over what the environment pulls and caches
Step 4: Configure the Module Inputs
The most important inputs for the cluster module are:name: cluster name and discovery identifiervpc_cidr: network boundary for the environmenthostname: public control plane hostnamecertificate_arn: validated ACM certificate for the hostnameroute53_zone_id: hosted zone where the public alias record is createdregistry: OCI registry serving the Cake Agents chartcake_agents_chart_version: exact chart version to deployenable_ecr_pull_through: whether nodes can trigger pull-through cache imports on first pulloidc: optional bootstrap for the built-in OIDC configurationslack: optional bootstrap for the Slack secret used by the applicationnat_gateway_per_az: resilience versus cost tradeoff for private subnet egressdatabase_multi_az: higher availability for the databasedatabase_deletion_protection: protection for longer-lived environmentsdatabase_final_snapshot: snapshot behavior during database teardown
Example Inputs
Registry Reference
The example root module uses an OCI registry path such as:- the chart is published upstream
- the pull-through cache has been warmed up before
terraform apply - the Helm provider can authenticate to ECR
Private Endpoints for Upstream Services
Some environments also need private access from the cluster to upstream services outside the VPC boundary. One pattern is to add interface VPC endpoints for private gateway services that session workloads or supporting components must reach. If your model provider, auth system, or internal gateway is exposed privately, document that dependency alongside the cluster module and provision the required endpoint and security group rules as part of the same environment.Provider Authentication
Your Terraform providers should support:- AWS provider locked to the expected account and region
- Helm provider authenticated both to EKS and ECR
- Kubectl and Kubernetes providers using
aws eks get-token
Apply Order
For a new environment, the safe sequence is:- Apply the DNS module and complete parent-zone delegation if needed.
- Re-run apply until ACM validation succeeds.
- Ensure the Helm chart is available in the OCI registry path.
- Apply the Cake Agents cluster root module.
- the EKS cluster is reachable
- the RDS instance is available
- the Route53 record resolves to the load balancer
- the Helm release is healthy in the target namespace
IAM Reference
Some teams apply the cluster module using a dedicated Terraform role instead of broad administrator credentials. For those teams, a least-privilege policy should cover at least:- EKS cluster access and token generation
- reading or writing the specific Route53 records used by Cake Agents
- reading ACM certificate metadata
- ECR access for the chart registry and optional pull-through imports
- the AWS APIs needed by the cluster module to provision VPC, EKS, load balancing, KMS, and RDS resources