Overview
In any production-grade machine learning system—especially one serving large, fine-tuned models via distributed infrastructure like Ray—monitoring is critical for ensuring system health, performance, and reliability. To address this, the Cake AI platform integrates Prometheus and Grafana for real-time monitoring and visualization of Ray-based model deployments, including those served via vLLM and orchestrated with KubeRay.
Prometheus is an open-source metrics collection and alerting toolkit that scrapes time-series data from configured endpoints, while Grafana is a flexible dashboard and visualization tool that sits on top of Prometheus (or other data sources) to create real-time, interactive views into system behavior.
This monitoring stack provides deep visibility into the performance and resource usage of Ray clusters, the health of inference services (e.g., vLLM actors), and the flow of user requests across nodes and GPUs.
Instructions
Prometheus
Navigate to Prometheus
Select ‘Explore Metrics’
Review a metrics selector for Ray
Explore Metrics allows you to investigate all the selectors for the various metrics available for your Cake cluster. It provides a detailed view of each selector, helping you uncover associated metadata—including tags, dimensions, and fields—for the various apps running in Cake.
You can ray and you will see all selectors starting with Ray.
NOTE: You can also have Ray capture custom metrics for your application with the Ray API https://docs.ray.io/en/latest/ray-observability/user-guides/add-app-metrics.html#application-level-metrics
Look at the metrics associated with a particular selector
Each selector will show the labels and possible values for each type of metric associated with that selector. Lets select ray_actors to see all the metrics associated with that selector
You can use these metrics to create a PromQL query and investigate how Ray is running in your Cake cluster
Grafana
Navigate to Grafana
Go to Ray Serve Dashboard
Look at individual Ray pod dashboards
The Kubernetes / Compute Resources / Namespace (Pods) dashboard and the Kubernetes / Compute Resources / Pods dashboard have interesting info to investigate during performance monitoring.
You can use the panels in new or existing Ray dashboards. Lets create a new dashboard. Select new Dashboard
You can create new Grafana panels from individual Prometheus metrics, complex PromQL queries you create in Prometheus, or various other supported Grafana sources.
See this guide to building new dashboards in Grafana:
https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/create-dashboard/
Ray Dashboard
Go to Ray Dashboard
The Ray Dashboard provides a UI to make it easier to find out information about KubeRay Clusters and RayServives
The metrics tab collects several Ray dashboards for your cluster
These dashboards are all also available in Grafana
The serve tab collects information and logs related to your running models
You can also use the Ray Distribute Debugger to gather information about your running models
NOTE: You must ensure that ssdh is running in the Ray head node container you specified in the YAML for your cluster.
Notebooks running in the same namespace as the Ray Cluster will have access to debug cluster jobs and models.
See documentation on starting the distributed debugger for KubeRay here: https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html
See documentation for
Kubernetes Troubleshooting
You can use Lens or:
# Return snapshot logs from pod aiapp with only one container
kubectl logs aiapp
To review Kubernetes logs for various pods. The most interesting pods from the perspective of Ray troubleshooting are below. They are certainly not the only pods you may want to investigate but they are usually the most interesting
Ray
Ray pods launch in the namespace you request in your RayService YAML. A default RayService cluster is available in the ray-service namespace. The head pod is usually the most interesting for debugging
KubeRay Operator
RayService and Cluster YAMLs launch clusters via the KubeRay Operator controller. This controller is very helpful in tracking down Ray issues. It is located in the kuberay-operator namespace
Prometheus
Prometheus pods are the ones that are the most recommended to investigate. Go to the monitoring namespace to look at the various Grafana and Prometheus pods. Let’s start with the main prometheus pod. It should be prefixed with prometheus-kube-prometheus-stack-prometheusX
Prometheus Operator
The component that launches Its logs are sometimes worth investigating
Grafana
Kube-prometheus-stack-grafana-xxxxx is the Grafana pod.
otel-collector-xxxxx is the pod responsible for collecting OpenTelemetry info.
Documentation on OTEL collector troubleshooting is here:
https://opentelemetry.io/docs/collector/troubleshooting/