Custom Models

Prev Next

Custom Models are an easy way to deploy GitOps-backed, production-ready Ray Serve Applications. In this tutorial, we demonstrate how to deploy Qwen3-4B onto a g5.xlarge.

Prerequisites

Before starting, ensure your project has at least one namespace. You create new namespaces under “Project > Configure”

Example project with a valid namespace

Deploying a model

First, navigate to the Custom Models form. Under Project > Models, click “Deploy New Model” and switch to the “Custom Model” tab.

Then, start filling out the details for your model:

  1. Namespace: Select the target namespace for deploying the model from the dropdown

  2. Name: Choose a unique name for your model. Names must be unique within a namespace.

  3. Machine type: Select the desired AWS instance type for your model. Note: only a limited subset of AWS instance types are supported.

Finally, upload the contents of your Ray Serve configuration. For example,

applications:
- name: test-qwen3-4b
  route_prefix: "/"
  import_path: "ray.serve.llm:build_openai_app"
  runtime_env:
    env_vars:
      VLLM_USE_V1: "0"
      ENGINE_START_TIMEOUT_S: "1600"
      SERVE_LOG_LEVEL: "INFO"
      VLLM_LOGGING_LEVEL: "INFO"
      VLLM_ALLOW_RUNTIME_LORA_UPDATING: "True"
  args:
    llm_configs:
      - model_loading_config:
          model_id: "Qwen/Qwen3-4B"
        deployment_config:
          health_check_timeout_s: 600
          autoscaling_config:
            min_replicas: 1
            max_replicas: 1
        engine_kwargs:
          tensor_parallel_size: 1
          pipeline_parallel_size: 1
          tokenizer_pool_size: 2
          tokenizer_pool_extra_config: '{"runtime_env": {}}'
          gpu_memory_utilization: 0.9
          trust_remote_code: true
          dtype: half
          max_model_len: 20000

Before deploying, you can generate a read-only preview of the full RayService Kubernetes resource right in the Cake UI:

Once you are satisfied with your configuration, click “Deploy Model”. You should see the Cake platform immediately push a new commit to your GitOps repo:

Troubleshooting

Q. My model is stuck in “Pending”. What can I do?

It can take at least 5-10 minutes for a cold boot of a new custom model, including launching the underlying EC2 instance and pulling the Ray Serve container image. If your model has progressed to “Starting”, that is a good sign that it will become healthy soon.

If after a minute or two the model is still “Pending”, we recommend checking in ArgoCD. Project Models, like most Cake resources, are deployed through ArgoCD. Users that have been assigned to the “Admin” role in Cake can navigate directly to the ArgoCD application associated with the project and check the sync state. If the Application is out of sync, simply click “Sync” and the model deployment should progress. There may be a permission issue preventing the Cake UI from programmatically syncing Argo; reach out to Cake support for more help via Slack or email support@cake.ai.