Overview

Tracing is a key component of observability in AI systems, particularly in complex, production-grade deployments involving large language models (LLMs). In the Cake AI platform, Langfuse can be used to trace calls made to models that are proxied through LiteLLM—a lightweight, pluggable API proxy that abstracts and standardizes access to LLM endpoints (e.g., OpenAI, Gemini, Anthropic, or custom-deployed models via vLLM).

What Is Traced with Langfuse?

When a model call is routed through LiteLLM, Langfuse can automatically intercept and log detailed metadata about the interaction, including:

Prompt and completion contents (tokenized or redacted if necessary)
Timestamps (start time, end time, latency)
Response success/failure status
Token counts and cost metrics
User or session identifiers
Custom metadata such as application context, experiment version, or agent state

These logs are collected in Langfuse’s dashboard, where developers and ops engineers can search, filter, compare, and debug across different traces and versions of their applications.

Why Use Langfuse for Tracing LiteLLM Calls?

Here’s why tracing LiteLLM calls with Langfuse is valuable:

Debugging Model Behavior
When users report strange or incorrect responses, having trace-level visibility lets engineers replay and inspect the exact prompt, context, and configuration used in that call. This is crucial for understanding why a model behaved a certain way under specific conditions.
Performance Monitoring and Cost Control
Langfuse provides visibility into token usage per call and overall latency. This enables real-time cost tracking and latency optimization, helping teams manage expensive model usage across high-traffic applications.
Experiment Comparison
With experiment tags and version tracking, Langfuse can be used to compare the performance and reliability of different prompt templates, parameter settings, or model versions—especially helpful during A/B testing or prompt engineering cycles.
Compliance and Auditability
For regulated industries or user-sensitive applications, Langfuse’s detailed logs provide an audit trail that helps with compliance reporting, reproducibility, and user accountability.
Root Cause Analysis (RCA)
When alerts are triggered (e.g., from Prometheus/Grafana), tracing can link problematic metrics directly to individual calls. This shortens the time to resolution during incidents or performance regressions.

By integrating Langfuse with LiteLLM, the Cake platform gives developers a high-fidelity, low-overhead method to gain transparency into their LLM usage—empowering better decisions, faster debugging, and safer deployments.

Key References

The key reference documents for LiteLLM are located at:

LiteLLM Getting Started - Main location for LiteLLM info

https://docs.litellm.ai/docs/

LiteLLM UI Quick Start - Discusses the LiteLLM Administration interface https://docs.litellm.ai/docs/proxy/ui

Instructions

First, we need to bootstrap Langfuse. This only needs to happen once for each Cake Langfuse instance

Open the Cake Platform UI and then launch Apps › Langfuse (https://[my_platform_root_url]/platform-ui/)

NOTE: In a near term release, projects apps will be available and there may be multiple Langfuse deployments. A shared Langfuse will still be available as above

Select “New Organization” and enter the tenant name.

Organization wizard starts up.

Add members (if applicable)

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod). 

Under Setup › API Keys hit “Create API Key”; copy:

Langfuse_PUBLIC_KEY (username)

Langfuse_SECRET_KEY (password)

Langfuse_HOST (https URL) 

Second,  install any used SDKs where you want to use them. NOTE: In Cake 1.3, you will not need to do this step for model calls. You will only need to use the SDK for non-model tracing.

For example: in your notebook or service image you could run:

pip install 'Langfuse>=2.60' 'litellm[proxy]' openai



Next, Wire environment variables

Export the keys plus LiteLLM proxy endpoints:

export Langfuse_PUBLIC_KEY=<copy>

export Langfuse_SECRET_KEY=<copy>

export Langfuse_HOST=https://Langfuse.<cluster>.svc

export OPENAI_BASE_URL=http://litellm.litellm.svc.cluster.local:4000

export OPENAI_API_KEY=cake‑internal # any non‑blank value

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod)

Tip – Add Langfuse_TRACING_ENVIRONMENT=dev|staging|prod so traces auto‑bucket by environment.

 Add the Langfuse decorator to your code & call a model

Minimal traced function (Python):

From Langfuse.decorators import observe

from Langfuse.openai import openai        # OpenAI‑compatible wrapper

@observe()

def hello():
    rsp = openai.ChatCompletion.create(
        model="meta-llama/Llama-3.2-1B-Instruct",
        messages=[{"role":"user","content":"hi"}],
    )

    return rsp.choices[0].message.content

print(hello())

Another example:

Now, verify any traces

Go back to Langfuse › Tracing › Traces

confirm a new row with latency & token stats

Click the row to inspect prompt, completion, temperature, top‑p and cost. 

Finally Harden & monitor your Langfuse traces

Set Roles

use Owner / Admin / Member / Viewer as per RBAC matrix; assign least‑privilege.

Set Budgets / rate‑limits

LiteLLM can cap requests per API key before the model runs.

Set Single‑instance

Langfuse is recommended; multi‑instance only for air‑gap or data‑residency mandates.

Security

A deep discussion of Langfuse security including RBAC’s and multi tenant architecture is located here:

Langfuse Security Considerations

Troubleshooting

A troubleshooting guide to Langfuse is here:

https://Langfuse.com/self-hosting/troubleshooting

Logs

System logs for the Langfuse App can be accessed via Lens or kubectl. These pods are in the Langfuse namespace. Generally, the Langfuse-web log is interesting:

Metrics

Metrics for Langfuse can be found in Grafana in the Kubernetes Pods and Namespace (pods) dashboards:

Tracing Calls to LiteLLM with Langfuse

What Is Traced with Langfuse?

Why Use Langfuse for Tracing LiteLLM Calls?

Key References

Instructions

First, we need to bootstrap Langfuse. This only needs to happen once for each Cake Langfuse instance

Open the Cake Platform UI and then launch Apps › Langfuse (https://[my_platform_root_url]/platform-ui/)

Select “New Organization” and enter the tenant name.

Organization wizard starts up.

Add members (if applicable)

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod).

Under Setup › API Keys hit “Create API Key”; copy:

Langfuse_PUBLIC_KEY (username)

Langfuse_SECRET_KEY (password)

Langfuse_HOST (https URL)

Second, install any used SDKs where you want to use them. NOTE: In Cake 1.3, you will not need to do this step for model calls. You will only need to use the SDK for non-model tracing.

For example: in your notebook or service image you could run:

Next, Wire environment variables

Export the keys plus LiteLLM proxy endpoints:

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod)

Add the Langfuse decorator to your code & call a model

Minimal traced function (Python):

Now, verify any traces

Go back to Langfuse › Tracing › Traces

Click the row to inspect prompt, completion, temperature, top‑p and cost.

Finally Harden & monitor your Langfuse traces

Set Roles

Set Budgets / rate‑limits

Set Single‑instance

Security

Troubleshooting

Logs

Metrics

Open the Cake Platform UI and then launch Apps › Langfuse (https://[my_platform_root_url]/platform-ui/)

Select “New Organization” and enter the tenant name.

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod). 

Under Setup › API Keys hit “Create API Key”; copy:

Langfuse_HOST (https URL) 

Second,  install any used SDKs where you want to use them. NOTE: In Cake 1.3, you will not need to do this step for model calls. You will only need to use the SDK for non-model tracing.

Inside the org wizard select “Create Project”, name it (e.g., claims‑ingest‑prod)

 Add the Langfuse decorator to your code & call a model

Now, verify any traces

Go back to Langfuse › Tracing › Traces

Click the row to inspect prompt, completion, temperature, top‑p and cost.