Overview
Tracing is a key component of observability in AI systems, particularly in complex, production-grade deployments involving large language models (LLMs). In the Cake AI platform, Langfuse can be used to trace calls made to models that are proxied through LiteLLM—a lightweight, pluggable API proxy that abstracts and standardizes access to LLM endpoints (e.g., OpenAI, Gemini, Anthropic, or custom-deployed models via vLLM).
What Is Traced with Langfuse?
When a model call is routed through LiteLLM, Langfuse can automatically intercept and log detailed metadata about the interaction, including:
Prompt and completion contents (tokenized or redacted if necessary)
Timestamps (start time, end time, latency)
Response success/failure status
Token counts and cost metrics
User or session identifiers
Custom metadata such as application context, experiment version, or agent state
These logs are collected in Langfuse’s dashboard, where developers and ops engineers can search, filter, compare, and debug across different traces and versions of their applications.
Why Use Langfuse for Tracing LiteLLM Calls?
Here’s why tracing LiteLLM calls with Langfuse is valuable:
Debugging Model Behavior
When users report strange or incorrect responses, having trace-level visibility lets engineers replay and inspect the exact prompt, context, and configuration used in that call. This is crucial for understanding why a model behaved a certain way under specific conditions.Performance Monitoring and Cost Control
Langfuse provides visibility into token usage per call and overall latency. This enables real-time cost tracking and latency optimization, helping teams manage expensive model usage across high-traffic applications.Experiment Comparison
With experiment tags and version tracking, Langfuse can be used to compare the performance and reliability of different prompt templates, parameter settings, or model versions—especially helpful during A/B testing or prompt engineering cycles.Compliance and Auditability
For regulated industries or user-sensitive applications, Langfuse’s detailed logs provide an audit trail that helps with compliance reporting, reproducibility, and user accountability.Root Cause Analysis (RCA)
When alerts are triggered (e.g., from Prometheus/Grafana), tracing can link problematic metrics directly to individual calls. This shortens the time to resolution during incidents or performance regressions.
By integrating Langfuse with LiteLLM, the Cake platform gives developers a high-fidelity, low-overhead method to gain transparency into their LLM usage—empowering better decisions, faster debugging, and safer deployments.
Key References
The key reference documents for LiteLLM are located at:
LiteLLM Getting Started - Main location for LiteLLM info
LiteLLM UI Quick Start - Discusses the LiteLLM Administration interface https://docs.litellm.ai/docs/proxy/ui
Instructions
First, we need to bootstrap Langfuse. This only needs to happen once for each Cake Langfuse instance
Open the Cake Platform UI and then launch Apps › Langfuse (https://[my_platform_root_url]/platform-ui/)
NOTE: In a near term release, projects apps will be available and there may be multiple Langfuse deployments. A shared Langfuse will still be available as above
Select “New Organization” and enter the tenant name.
Organization wizard starts up.

Add members (if applicable)
Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod).
Under Setup › API Keys hit “Create API Key”; copy:
Langfuse_PUBLIC_KEY (username)
Langfuse_SECRET_KEY (password)
Langfuse_HOST (https URL)
Second, install any used SDKs where you want to use them. NOTE: In Cake 1.3, you will not need to do this step for model calls. You will only need to use the SDK for non-model tracing.
For example: in your notebook or service image you could run:
pip install 'Langfuse>=2.60' 'litellm[proxy]' openai
Next, Wire environment variables
Export the keys plus LiteLLM proxy endpoints:
export Langfuse_PUBLIC_KEY=<copy>
export Langfuse_SECRET_KEY=<copy>
export Langfuse_HOST=https://Langfuse.<cluster>.svc
export OPENAI_BASE_URL=http://litellm.litellm.svc.cluster.local:4000
export OPENAI_API_KEY=cake‑internal # any non‑blank value
Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod)
Tip – Add Langfuse_TRACING_ENVIRONMENT=dev|staging|prod so traces auto‑bucket by environment.
Add the Langfuse decorator to your code & call a model
Minimal traced function (Python):
From Langfuse.decorators import observe
from Langfuse.openai import openai # OpenAI‑compatible wrapper
@observe()
def hello():
rsp = openai.ChatCompletion.create(
model="meta-llama/Llama-3.2-1B-Instruct",
messages=[{"role":"user","content":"hi"}],
)
return rsp.choices[0].message.content
print(hello())
Another example:
Now, verify any traces
Go back to Langfuse › Tracing › Traces
confirm a new row with latency & token stats
Click the row to inspect prompt, completion, temperature, top‑p and cost.

Finally Harden & monitor your Langfuse traces
Set Roles
use Owner / Admin / Member / Viewer as per RBAC matrix; assign least‑privilege.
Set Budgets / rate‑limits
LiteLLM can cap requests per API key before the model runs.
Set Single‑instance
Langfuse is recommended; multi‑instance only for air‑gap or data‑residency mandates.
Security
A deep discussion of Langfuse security including RBAC’s and multi tenant architecture is located here:
Langfuse Security Considerations
Troubleshooting
A troubleshooting guide to Langfuse is here:
https://Langfuse.com/self-hosting/troubleshooting
Logs
System logs for the Langfuse App can be accessed via Lens or kubectl. These pods are in the Langfuse namespace. Generally, the Langfuse-web log is interesting:
Metrics
Metrics for Langfuse can be found in Grafana in the Kubernetes Pods and Namespace (pods) dashboards: