To install Ollama and run it in a GPU notebook:
Upgrade the notebook to root
kubectl patch notebook $(echo $NB_PREFIX | cut -d'/' -f4) --type=json --patch '[{"op": "add", "path": "/spec/template/spec/securityContext", "value": {"runAsNonRoot": false, "runAsUser": 0}}]'
This will reboot the notebook. Once rebooted, you can proceed with installing ollma.
install pciutils (so that ollama can use 'lspci' to find the GPU)
apt update -y && apt install pciutils -y
install ollama
curl <https://ollama.ai/install.sh> | sh
start ollama serve
ollama serve
In a new window, run:
ollama pull mistral:instruct
In a new window, run
run-ollama-curl.sh
Example:
curl -X POST <http://localhost:11434/api/generate> -d '{
"model": "mistral:instruct",
"prompt":"score the sentiment of: \\"I love this coffee!\\"",
"stream": false,
"options": {
"seed": 42,
"temperature": 0.0
}
}'
(base) root@torch-gpu-001-0:~/shared/users/brad-kflow-ai/ollama# ./run-ollama-curl.sh | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 669 100 502 100 167 828 275 --:--:-- --:--:-- --:--:-- 1103
{
"model": "mistral:instruct",
"created_at": "2023-12-06T16:56:54.006461209Z",
"response": "The sentiment of the input text \\"I love this coffee!\\" is positive.",
"done": true,
"context": [
733,
16289,
28793,
28705,
7420,
272,
21790,
302,
28747,
345,
28737,
2016,
456,
7045,
2781,
733,
28748,
16289,
28793,
13,
1014,
21790,
302,
272,
2787,
2245,
345,
28737,
2016,
456,
7045,
2781,
349,
5278,
28723
],
"total_duration": 604755876,
"load_duration": 613095,
"prompt_eval_count": 21,
"prompt_eval_duration": 288824000,
"eval_count": 15,
"eval_duration": 309943000
}