Run LLMs in a Notebook with Ollama

Prev Next

To install Ollama and run it in a GPU notebook:

  1. Upgrade the notebook to root

    kubectl patch notebook $(echo $NB_PREFIX | cut -d'/' -f4) --type=json --patch '[{"op": "add", "path": "/spec/template/spec/securityContext", "value": {"runAsNonRoot": false, "runAsUser": 0}}]'
    

This will reboot the notebook. Once rebooted, you can proceed with installing ollma.

  1. install pciutils (so that ollama can use 'lspci' to find the GPU)

apt update -y && apt install pciutils -y
  1. install ollama

curl <https://ollama.ai/install.sh> | sh
  1. start ollama serve

ollama serve
  1. In a new window, run:

ollama pull mistral:instruct
  1. In a new window, run run-ollama-curl.sh

Example:

curl -X POST <http://localhost:11434/api/generate> -d '{
	"model": "mistral:instruct",
	"prompt":"score the sentiment of: \\"I love this coffee!\\"",
	"stream": false,
	"options": {
		"seed": 42,
		"temperature": 0.0
	}
}'
(base) root@torch-gpu-001-0:~/shared/users/brad-kflow-ai/ollama# ./run-ollama-curl.sh | jq .
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
100   669  100   502  100   167    828    275 --:--:-- --:--:-- --:--:--  1103
{
"model": "mistral:instruct",
"created_at": "2023-12-06T16:56:54.006461209Z",
"response": "The sentiment of the input text \\"I love this coffee!\\" is positive.",
"done": true,
"context": [
733,
16289,
28793,
28705,
7420,
272,
21790,
302,
28747,
345,
28737,
2016,
456,
7045,
2781,
733,
28748,
16289,
28793,
13,
1014,
21790,
302,
272,
2787,
2245,
345,
28737,
2016,
456,
7045,
2781,
349,
5278,
28723
],
"total_duration": 604755876,
"load_duration": 613095,
"prompt_eval_count": 21,
"prompt_eval_duration": 288824000,
"eval_count": 15,
"eval_duration": 309943000
}