I have a Dell XPS with 64GB of RAM and a NVIDA GPU, I use the NVIDIA Containers Docker toolchain.
I want to try the AI code generation stack using commercial open source using Cursor, CodeGPT and Ollama hosting Deepseek R1.
Using this CodeGPT founders X post as inspiration
Download AppImage
cursor-0.44.11-build-250103fqxdt5u9z-x86_64.AppImage
Install Cursor
From the Cursor website
Workaround an issue in Electron upstream from Cursor.
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
Install pre-requisites
sudo apt install libfuse-dev
Make AppImage executable
chmod +x cursor-0.44.11-build-250103fqxdt5u9z-x86_64.AppImage
Run Cursor AppImage
./cursor-0.44.11-build-250103fqxdt5u9z-x86_64.AppImage
Signup or signin using Gmail, Github or email.
Open Project
Install CodeGPT in Cursor
From the CodeGPT installation official docs.
Search for CodeGPT: Chat & AI Agents in the Marketplace bar.
Cursor seems to be a wrapper around VSCode, based on Electron.
Install Ollama
Inside Docker Compose I spin up an Ollama NVIDIA GPU enabled Docker container, using the following compose.yml
networks:
internet: {}
data: {}
services:
cuda:
image: nvidia/cuda:12.3.1-base-ubuntu20.04
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- capabilities: ["utility"]
count: all
store_ollama:
volumes:
- ./ollama/ollama:/root/.ollama
container_name: store-ollama
pull_policy: always
tty: true
restart: unless-stopped
image: ollama/ollama:latest
ports:
- 11434:11434
environment:
- OLLAMA_KEEP_ALIVE=24h
networks:
- data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Validate CUDA
We want to make sure Docker CUDA is up and running with the NVIDIA GPU in the coontainer.
01:54:08 niccolox@devekko devekko.store ±|master ✗|→ docker compose up cuda
WARN[0000] Found orphan containers ([ollama]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 1/1
✔ Container devekkostore-cuda-1 Created 0.0s
Attaching to cuda-1
cuda-1 | Fri Jan 24 21:54:25 2025
cuda-1 | +-----------------------------------------------------------------------------------------+
cuda-1 | | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: N/A |
cuda-1 | |-----------------------------------------+------------------------+----------------------+
cuda-1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
cuda-1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
cuda-1 | | | | MIG M. |
cuda-1 | |=========================================+========================+======================|
cuda-1 | | 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
cuda-1 | | N/A 48C P0 11W / 40W | 8MiB / 4096MiB | 0% Default |
cuda-1 | | | | N/A |
cuda-1 | +-----------------------------------------+------------------------+----------------------+
cuda-1 |
cuda-1 | +-----------------------------------------------------------------------------------------+
cuda-1 | | Processes: |
cuda-1 | | GPU GI CI PID Type Process name GPU Memory |
cuda-1 | | ID ID Usage |
cuda-1 | |=========================================================================================|
cuda-1 | +-----------------------------------------------------------------------------------------+
cuda-1 exited with code 0
Ollama GPU
docker compose up store_ollama
We run without daemon mode so we can see into the container
01:54:45 niccolox@devekko devekko.store ±|master ✗|→ docker compose up store_ollama
[+] Running 1/1
✔ store_ollama Pulled 1.0s
WARN[0001] Found orphan containers ([ollama]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Attaching to store-ollama
store-ollama | 2025/01/24 21:56:29 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
store-ollama | time=2025-01-24T21:56:29.246Z level=INFO source=images.go:432 msg="total blobs: 0"
store-ollama | time=2025-01-24T21:56:29.246Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
store-ollama | [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
store-ollama |
store-ollama | [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
store-ollama | - using env: export GIN_MODE=release
store-ollama | - using code: gin.SetMode(gin.ReleaseMode)
store-ollama |
store-ollama | [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
store-ollama | [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
store-ollama | [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
store-ollama | [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
store-ollama | [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
store-ollama | [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
store-ollama | [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
store-ollama | [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
store-ollama | [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
store-ollama | [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
store-ollama | [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
store-ollama | [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
store-ollama | [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
store-ollama | [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
store-ollama | [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
store-ollama | time=2025-01-24T21:56:29.248Z level=INFO source=routes.go:1238 msg="Listening on [::]:11434 (version 0.5.7-0-ga420a45-dirty)"
store-ollama | time=2025-01-24T21:56:29.249Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx]"
store-ollama | time=2025-01-24T21:56:29.250Z level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
store-ollama | time=2025-01-24T21:56:29.408Z level=INFO source=types.go:131 msg="inference compute" id=GPU-1cbfe056-fcc4-4ab2-c563-6fc44e92565a library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3050 Ti Laptop GPU" total="3.8 GiB" available="3.7 GiB"
Daemon mode would be
docker compose up store_ollama -d
Configure ChatGPT DeepSeek R1
Top left pane in Cursor pull down icon and Pin CodeGPT, choose Freemium
Signup and authorize and login
Pull download Ollama DeepSeek R1
Choose LLM Local Models
Pull down the model via Cursor and the CodeGPT Local LLM UX
I can now see DeepSeek R1 as my model for Cursor
In the Docker container for ollama I can see successful download and posts
store-ollama | time=2025-01-24T22:17:33.271Z level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
store-ollama | time=2025-01-24T22:17:33.429Z level=INFO source=types.go:131 msg="inference compute" id=GPU-1cbfe056-fcc4-4ab2-c563-6fc44e92565a library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3050 Ti Laptop GPU" total="3.8 GiB" available="3.7 GiB"
store-ollama | [GIN] 2025/01/24 - 22:18:52 | 200 | 6.029998ms | 172.28.0.1 | GET "/api/tags"
store-ollama | time=2025-01-24T22:18:53.213Z level=INFO source=download.go:175 msg="downloading 9801e7fce27d in 405 1 GB part(s)"
store-ollama | [GIN] 2025/01/24 - 22:22:32 | 200 | 463.954µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 22:22:37 | 200 | 184.288µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 22:30:00 | 200 | 166.766µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 22:30:02 | 200 | 193.451µs | 172.28.0.1 | GET "/api/tags"
store-ollama | time=2025-01-24T23:17:28.831Z level=INFO source=download.go:175 msg="downloading 369ca498f347 in 1 387 B part(s)"
store-ollama | time=2025-01-24T23:17:30.130Z level=INFO source=download.go:175 msg="downloading 6e4c38e1172f in 1 1.1 KB part(s)"
store-ollama | time=2025-01-24T23:17:31.397Z level=INFO source=download.go:175 msg="downloading f4d24e9138dd in 1 148 B part(s)"
store-ollama | time=2025-01-24T23:17:32.868Z level=INFO source=download.go:175 msg="downloading fdf3d6cb73c7 in 1 497 B part(s)"
store-ollama | [GIN] 2025/01/24 - 23:32:13 | 200 | 1h13m21s | 172.28.0.1 | POST "/api/pull"
store-ollama | [GIN] 2025/01/24 - 23:32:13 | 200 | 1h9m36s | 172.28.0.1 | POST "/api/pull"
store-ollama | [GIN] 2025/01/24 - 23:32:13 | 200 | 1h2m10s | 172.28.0.1 | POST "/api/pull"
store-ollama | [GIN] 2025/01/24 - 23:37:49 | 200 | 2.051761ms | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 23:37:54 | 200 | 618.334µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 23:37:54 | 200 | 548.637µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 23:37:57 | 200 | 465.119µs | 172.28.0.1 | GET "/api/tags"
store-ollama | [GIN] 2025/01/24 - 23:37:57 | 200 | 589.321µs | 172.28.0.1 | GET "/api/tags"