Adding Experiments to MLflow

Overview

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, model versioning, and deployment metadata. Within the Cake AI platform, fine-tuned models—especially those using LoRA (Low-Rank Adaptation) adapters—are integrated into MLflow to ensure traceability, reproducibility, and scalable experiment management.

When fine-tuning a base LLM, practitioners often generate multiple model variants through different LoRA configurations, hyperparameters, and datasets. By logging each of these variations to MLflow, teams can associate structured metadata (e.g., training metrics, dataset versions, prompts, hyperparameters) with a unique model artifact, making it easier to compare performance and manage lifecycle transitions.

Why Add Fine-Tuned and LoRA Models to MLflow?

Experiment Traceability
Logging to MLflow ensures every fine-tuned model version is recorded alongside its training configuration, metrics (loss, accuracy, perplexity, etc.), and artifacts. This makes it easy to reproduce results or diagnose regressions months later.
Comparative Evaluation
MLflow supports dashboard views and search/filter capabilities that allow practitioners to compare different LoRA runs side-by-side. This accelerates iterative improvements by making it clear which experiment settings yield the best results.
Model Governance and Auditability
For organizations operating under compliance requirements or with shared infrastructure, MLflow provides an audit trail for model training and deployment decisions. This supports responsible AI practices and regulatory preparedness.
Seamless Integration with Deployment Pipelines
Once logged, MLflow models (including LoRAs) can be versioned and registered for promotion into staging or production environments, integrating cleanly with CI/CD and monitoring systems.

Generally, when using MLflow for LLM’s, we recommend only storing fine-tuning info. See: https://MLflow.org/docs/3.0.0rc2/llms/transformers/large-models We recommend this for several reasons.

Storage Efficiency
Base models, especially large LLMs (e.g., 7B to 65B parameters), can be hundreds of gigabytes. Storing these repeatedly across many experiments is highly redundant and inefficient. LoRA adapters are usually tiny (often <1% of the base model size), making them ideal for lightweight logging.
Modular Reusability
By storing only LoRA deltas, the same base model can be reused across environments, with LoRA adapters loaded dynamically at inference time (as supported by vLLM and HuggingFace). This enables flexible model composition while keeping version control tidy.
Faster Iteration Cycles
Training and deploying only the LoRA layers means teams can iterate on task-specific behaviors without retraining or re-logging the full model. This significantly accelerates experimentation and deployment workflows.
Security and IP Considerations
In some enterprise settings, base models may be governed by licensing restrictions or proprietary protections. Logging only LoRA adapters minimizes exposure and allows teams to separate internal IP (the adaptation) from externally sourced models.

In summary, adding fine-tuned models and LoRA experiments to MLflow empowers structured model development, operational efficiency, and long-term reproducibility—while logging only LoRAs optimizes storage, flexibility, and deployment agility across the Cake AI stack.

Instructions

Below are step-by-step instructions and corresponding Python code for tracking a LLaMA 3.3 70B fine-tuning experiment using MLflow hosted on Kubernetes, following best practices such as logging only the LoRA adapters (not the base model). This setup assumes:

You have access to the MLflow Tracking URI (The default shared Cake MLflow instance is located at http://mlflow-server.mlflow.svc.cluster.local).
You are using HuggingFace Transformers + peft for LoRA.
You have pre-downloaded LLaMA 3.3 70B (e.g., via HuggingFace with token access).
The training is done via PEFT-style low-rank adaptation.

Step 1: Install Required Packages

Ensure the following Python packages are installed:

pip install mlflow transformers peft accelerate bitsandbytes

Step 2: Set MLflow Tracking URI and Experiment

In your Python script or notebook:

import mlflow

# Point to the Kubernetes-hosted MLflow instance
mlflow.set_tracking_uri("http://mlflow-server.mlflow.svc.cluster.local")

# Optional: create and use a named experiment
mlflow.set_experiment("llama3.3-70b-lora-finetuning")

Step 3: Load Base Model and Apply LoRA

Use transformers + peft to load LLaMA 3.3 70B and apply LoRA.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import get_peft_model, LoraConfig, TaskType

model_id = "meta-llama/Meta-Llama-3-70B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)
base_model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")

# LoRA config for causal LM
lora_config = LoraConfig(
    r=64,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],  # adjust to target appropriate modules
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Inject LoRA into base model
lora_model = get_peft_model(base_model, lora_config)

Step 4: Start MLflow Run and Track Hyperparams

with mlflow.start_run(run_name="llama3.3-70b-lora-run-1"):
    # Log parameters manually (or pull from argparse/trainer)
    mlflow.log_params({
        "base_model": model_id,
        "lora_r": lora_config.r,
        "lora_alpha": lora_config.lora_alpha,
        "lora_dropout": lora_config.lora_dropout,
        "target_modules": ",".join(lora_config.target_modules),
        "training_steps": 1000,
        "batch_size": 8
    })

Step 5: Train and Log Metrics

Assume you're using a Trainer or custom training loop. Here's how you could log metrics:

 for step in range(1000):

# your training code here...

# Log metrics at intervals

if step % 100 == 0:

mlflow.log_metric("loss", 2.5 - 0.01 * step, step=step)

Step 6: Save and Log ONLY the LoRA Adapters

MLflow recommends saving only the LoRA adapters, not the base model:

import os

from peft import PeftModel

save_dir = "./lora_adapter"

lora_model.save_pretrained(save_dir)

tokenizer.save_pretrained(save_dir)

# Log the adapter folder as MLflow artifact

mlflow.log_artifacts(save_dir, artifact_path="lora_model")



Step 7: (Optional) Register Model in MLflow

If you'd like to register the LoRA adapter for reuse:

mlflow.pyfunc.log_model(
        artifact_path="lora_adapter_model",
        code_path=["my_adapter_loader.py"],  # optional custom loader
        artifacts={"model_path": save_dir},
        python_model=None  # define if using custom loader
    )

Recap Summary

Step	Purpose
1	Install and prepare environment
2	Connect to MLflow server
3	Load model and apply LoRA
4	Start tracking experiment
5	Log metrics during training
6	Save and log only LoRA adapters
7	(Optional) Register adapter for deployment

Security

By default, MLflow does not segment users or have any notion of permissioning various resources. But, MLflow has experimental support for these permissioning and RBAC. Documentation on this experimental feature is here:

https://www.mlflow.org/docs/latest/ml/auth

Cake defaults to turning this support off since it is experimental. Cake is also working on support for Project Apps which would allow multiple MLflow servers to be deployed. Generally, we recommend using the access control support in MLflow since eventually this will be the default for MLflow and it saves resources. If you would like this configured, please ask Cake support.

Examples of using MLflow to create a new experiment with and without access control are below.

Create a MLflow Experiment without Access Control

Here, you create a new experiment using the default API. The creator gets automatic MANAGE rights; no explicit RBAC is applied:

from mlflow import MlflowClient

client = MlflowClient(tracking_uri="http://mlflow-server.mlflow.svc.cluster.local")

# Create a new experiment — creator gets MANAGE permission by default
exp_id = client.create_experiment("no-rbac-experiment")
print(f"Created experiment {exp_id} with default MANAGE permission for the creator.")

Result: Only the creator (and admins) can view or modify this experiment. No one else has access unless explicitly granted.

Create a MLflow Experiment with Access Control

This example demonstrates enabling authentication and then setting specific user permissions via AuthServiceClient:

# Install authentication first (done via pip install mlflow[auth])
import os

from mlflow.server.auth.client import AuthServiceClient
from mlflow import MlflowClient

# Your JWT token obtained via OAuth 2.0 login flow (e.g., from Auth0, Okta, etc.)
jwt_token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

# Set MLflow server URI and token-based auth header
tracking_uri = "http://mlflow-server.mlflow.svc.cluster.local"
os.environ["MLFLOW_TRACKING_TOKEN"] = jwt_token  # used by MlflowClient and AuthServiceClient

# Initialize clients with the JWT-authenticated session
ml_client = MlflowClient(tracking_uri=tracking_uri)
auth_client = AuthServiceClient(tracking_uri=tracking_uri)

# Create experiment
ml_client = MlflowClient(tracking_uri=tracking_uri)
exp_id = ml_client.create_experiment("rbac-enabled-experiment")
print(f"Experiment {exp_id} created.")

# Grant READ access to another user
exp_perm = auth_client.create_experiment_permission(
    experiment_id=exp_id,
    username="alice",
    permission="READ"
)

print(f"Granted READ access to alice: {exp_perm}")

# Upgrade alice's permission to EDIT
exp_perm = auth_client.update_experiment_permission(
    experiment_id=exp_id,
    username="alice",
    permission="EDIT"
)

print(f"Alice's permission now EDIT: {exp_perm}")

Outcome:
- Admins get unrestricted access.
- The creator has MANAGE permission by default.
- User “alice” is first granted READ access and can later be elevated to EDIT, enabling her to modify runs and metadata—but not manage permissions.

Summary of experiment creation APIs

Action	Access Control Applied?	Who Has Access
Default experiment creation	No	Creator (MANAGE), admins
Explicit RBAC example	Yes	Creator & admins (MANAGE), “alice” (READ → EDIT)

Troubleshooting

Logs

System logs for MLflow can be accessed via Lens. The important pods for MLflow are located in the mlflow namespace. Generally, in the mlflow server pods with the mlflow container log is the most informative about component health:

Metrics

Metrics for MLflow can be found in Grafana in the Kubernetes > Pods and Kubernetes > Namespace (pods) dashboards: