Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cake.ai/llms.txt

Use this file to discover all available pages before exploring further.

This path is for teams that are comfortable with Terraform and want a more complete AWS environment around Cake Agents. The reference Terraform approach provisions:
  • A dedicated DNS zone and ACM certificate
  • An EKS cluster, VPC, storage classes, and Karpenter
  • An RDS database
  • The Cake Agents Helm release on top of that cluster

What to Model in Terraform

The reference approach separates concerns into a few layers:
  • DNS zone and certificate management
  • cluster platform resources such as VPC, EKS, ingress, storage, and autoscaling
  • application dependencies such as PostgreSQL
  • the final Cake Agents Helm release

Prerequisites

  • Terraform >= 1.6
  • An AWS account where Cake Agents will run
  • A parent DNS zone where you can delegate a subdomain
  • Access to the OCI registry that serves the Cake Agents chart
This Terraform model also expects the Helm, Kubernetes, and Kubectl Terraform providers.

End State

At the end of this guide you will have:
  • an AWS environment (DNS, EKS, and database) suitable for Cake Agents
  • a deployed Cake Agents Helm release
  • a public hostname for the control plane

Step 1: Set Up DNS and TLS

Cake Agents needs a public hostname for the control plane. You can manage DNS and certificates however you normally do, or use a dedicated Terraform module that creates:
  • a Route53 hosted zone for the Cake Agents environment
  • an ACM certificate for the apex hostname and wildcard subdomains
  • DNS validation records inside that zone
Important: certificate validation will not complete until the parent zone delegates the new hosted zone’s name servers. At the end of this step you should have:
  • a hostname for Cake Agents
  • a validated ACM certificate ARN
  • a hosted zone ID for the public record

Step 2: Provision the Cluster Stack

The main cluster module provisions:
  • a dedicated VPC
  • an EKS cluster
  • Karpenter for node provisioning
  • AWS Load Balancer Controller
  • an RDS PostgreSQL instance
  • a Kubernetes namespace and database secret
  • the cake-agents Helm release
  • a Route53 alias record pointing the hostname at the load balancer

Step 3: Decide How the Cluster Pulls Images and Charts

In most customer environments, the cluster should not pull directly from a private upstream ECR registry across accounts on every request. The recommended pattern is to configure a pull-through cache rule in the customer account and point Cake Agents at that cached registry path. Benefits:
  • simpler runtime access from the cluster
  • fewer cross-account registry dependencies during normal operation
  • better control over what the environment pulls and caches
If you use pull-through caching, enable the cluster support for first-pull imports and set the chart registry input to the cached OCI path.

Step 4: Configure the Module Inputs

The most important inputs for the cluster module are:
  • name: cluster name and discovery identifier
  • vpc_cidr: network boundary for the environment
  • hostname: public control plane hostname
  • certificate_arn: validated ACM certificate for the hostname
  • route53_zone_id: hosted zone where the public alias record is created
  • registry: OCI registry serving the Cake Agents chart
  • cake_agents_chart_version: exact chart version to deploy
  • enable_ecr_pull_through: whether nodes can trigger pull-through cache imports on first pull
  • oidc: optional bootstrap for the built-in OIDC configuration
  • slack: optional bootstrap for the Slack secret used by the application
  • nat_gateway_per_az: resilience versus cost tradeoff for private subnet egress
  • database_multi_az: higher availability for the database
  • database_deletion_protection: protection for longer-lived environments
  • database_final_snapshot: snapshot behavior during database teardown

Example Inputs

module "cake_agents" {
  source = ".../cake-agents-cluster"

  name            = "demo"
  vpc_cidr        = "10.128.0.0/16"
  hostname        = "demo.example.com"
  certificate_arn = aws_acm_certificate_validation.cake_agents.certificate_arn
  route53_zone_id = aws_route53_zone.cake_agents.zone_id

  registry                  = "oci://123456789012.dkr.ecr.us-east-2.amazonaws.com/cake/charts"
  cake_agents_chart_version = "0.1.3"

  enable_ecr_pull_through    = true
  oidc = {
    provider_id   = "company"
    domain        = "example.com"
    issuer        = "https://auth.example.com"
    client_id     = "cake-agents"
    public_client = false
    client_secret = var.oidc_client_secret
  }
  slack = {
    signing_secret = var.slack_signing_secret
    bot_token      = var.slack_bot_token
  }
  nat_gateway_per_az         = false
  database_multi_az          = true
  database_deletion_protection = true
}
If you do not want Terraform to bootstrap OIDC or Slack secrets, omit those inputs and manage the corresponding application configuration separately.

Registry Reference

The example root module uses an OCI registry path such as:
oci://<account-id>.dkr.ecr.<region>.amazonaws.com/cake/charts
If you adopt the same pattern, make sure:
  • the chart is published upstream
  • the pull-through cache has been warmed up before terraform apply
  • the Helm provider can authenticate to ECR

Private Endpoints for Upstream Services

Some environments also need private access from the cluster to upstream services outside the VPC boundary. One pattern is to add interface VPC endpoints for private gateway services that session workloads or supporting components must reach. If your model provider, auth system, or internal gateway is exposed privately, document that dependency alongside the cluster module and provision the required endpoint and security group rules as part of the same environment.

Provider Authentication

Your Terraform providers should support:
  • AWS provider locked to the expected account and region
  • Helm provider authenticated both to EKS and ECR
  • Kubectl and Kubernetes providers using aws eks get-token
That arrangement lets a single Terraform apply provision the cloud resources and then install the Helm chart into the freshly created cluster.

Apply Order

For a new environment, the safe sequence is:
  1. Apply the DNS module and complete parent-zone delegation if needed.
  2. Re-run apply until ACM validation succeeds.
  3. Ensure the Helm chart is available in the OCI registry path.
  4. Apply the Cake Agents cluster root module.
After apply finishes, verify:
  • the EKS cluster is reachable
  • the RDS instance is available
  • the Route53 record resolves to the load balancer
  • the Helm release is healthy in the target namespace

IAM Reference

Some teams apply the cluster module using a dedicated Terraform role instead of broad administrator credentials. For those teams, a least-privilege policy should cover at least:
  • EKS cluster access and token generation
  • reading or writing the specific Route53 records used by Cake Agents
  • reading ACM certificate metadata
  • ECR access for the chart registry and optional pull-through imports
  • the AWS APIs needed by the cluster module to provision VPC, EKS, load balancing, KMS, and RDS resources

How This Differs from the Helm-Only Path

The Kubernetes guide assumes you already have a cluster and only need the application install. The Terraform path defines the full environment boundary around Cake Agents, including cloud networking, compute, storage, DNS, and the final Helm release. If you already operate an EKS platform, you may only need part of this model rather than adopting a full environment bootstrap flow. For the runtime architecture inside that environment, see Core Concepts.