Monitoring your Ray Deployed Models with Prometheus and Grafana

Prev Next

Overview

In any production-grade machine learning system—especially one serving large, fine-tuned models via distributed infrastructure like Raymonitoring is critical for ensuring system health, performance, and reliability. To address this, the Cake AI platform integrates Prometheus and Grafana for real-time monitoring and visualization of Ray-based model deployments, including those served via vLLM and orchestrated with KubeRay.

Prometheus is an open-source metrics collection and alerting toolkit that scrapes time-series data from configured endpoints, while Grafana is a flexible dashboard and visualization tool that sits on top of Prometheus (or other data sources) to create real-time, interactive views into system behavior.

This monitoring stack provides deep visibility into the performance and resource usage of Ray clusters, the health of inference services (e.g., vLLM actors), and the flow of user requests across nodes and GPUs.

Instructions

Prometheus

Navigate to Prometheus

Select ‘Explore Metrics’

Review a metrics selector for Ray

Explore Metrics allows you to investigate all the selectors for the various metrics available for your Cake cluster. It provides a detailed view of each selector, helping you uncover associated metadata—including tags, dimensions, and fields—for the various apps running in Cake.

You can ray and you will see all selectors starting with Ray.

NOTE: You can also have Ray capture custom metrics for your application with the Ray API https://docs.ray.io/en/latest/ray-observability/user-guides/add-app-metrics.html#application-level-metrics

Look at the metrics associated with a particular selector

Each selector will show the labels and possible values for each type of metric associated with that selector. Lets select ray_actors to see all the metrics associated with that selector

You can use these metrics to create a PromQL query and investigate how Ray is running in your Cake cluster

Grafana

Navigate to Grafana

Go to Ray Serve Dashboard

Look at individual Ray pod dashboards

The Kubernetes / Compute Resources / Namespace (Pods) dashboard and the Kubernetes / Compute Resources / Pods dashboard have interesting info to investigate during performance monitoring.

You can use the panels in new or existing Ray dashboards. Lets create a new dashboard. Select new Dashboard

You can create new Grafana panels from individual Prometheus metrics, complex PromQL queries you create in Prometheus, or various other supported Grafana sources.

See this guide to building new dashboards in Grafana:

https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/create-dashboard/

Ray Dashboard

Go to Ray Dashboard

The Ray Dashboard provides a UI to make it easier to find out information about KubeRay Clusters and RayServives

The metrics tab collects several Ray dashboards for your cluster

These dashboards are all also available in Grafana

The serve tab collects information and logs related to your running models

You can also use the Ray Distribute Debugger to gather information about your running models

NOTE: You must ensure that ssdh is running in the Ray head node container you specified in the YAML for your cluster.

Notebooks running in the same namespace as the Ray Cluster will have access to debug cluster jobs and models.

See documentation on starting the distributed debugger for KubeRay here: https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html

See documentation for

Kubernetes Troubleshooting

You can use Lens or:

 # Return snapshot logs from pod aiapp with only one container
 kubectl logs aiapp

To review Kubernetes logs for various pods. The most interesting pods from the perspective of Ray troubleshooting are below. They are certainly not the only pods you may want to investigate but they are usually the most interesting

Ray

Ray pods launch in the namespace you request in your RayService YAML. A default RayService cluster is available in the ray-service namespace. The head pod is usually the most interesting for debugging

KubeRay Operator

RayService and Cluster YAMLs launch clusters via the KubeRay Operator controller. This controller is very helpful in tracking down Ray issues. It is located in the kuberay-operator namespace

Prometheus

Prometheus pods are the ones that are the most recommended to investigate. Go to the monitoring namespace to look at the various Grafana and Prometheus pods. Let’s start with the main prometheus pod. It should be prefixed with prometheus-kube-prometheus-stack-prometheusX

Prometheus Operator

The component that launches Its logs are sometimes worth investigating

Grafana

Kube-prometheus-stack-grafana-xxxxx is the Grafana pod.

otel-collector-xxxxx is the pod responsible for collecting OpenTelemetry info.

Documentation on OTEL collector troubleshooting is here:

https://opentelemetry.io/docs/collector/troubleshooting/