Ollama is the easiest way to run large language models locally. It handles model downloads, quantization selection, GPU detection, and serving — all with a single command. Here’s how to get running in under five minutes.
What you need
- A machine with at least 8 GB RAM (16 GB recommended)
- A GPU with 6+ GB VRAM, OR a fast CPU with 32+ GB RAM
- Linux, macOS, or Windows (WSL2)
Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
On macOS, download the app from ollama.com. On Windows, use the installer or run under WSL2.
Step 2: Pull a model
ollama pull llama3.2
This downloads Llama 3.2 (3B parameters, ~2 GB). For a more capable model:
ollama pull llama3.1:8b # 8B, ~5 GB — good balance
ollama pull llama3.1:70b # 70B, ~40 GB — requires 48+ GB VRAM
ollama pull phi4 # Microsoft Phi-4, excellent reasoning
ollama pull mistral # Mistral 7B, fast and capable
Step 3: Chat
ollama run llama3.2
That’s it. You’re now running a private AI assistant on your own hardware. No API key. No subscription. No data leaving your machine.
Step 4: Use the API
Ollama exposes an OpenAI-compatible API on localhost:11434:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'
This means any app that supports the OpenAI API can point at Ollama instead — Open WebUI, Obsidian, VS Code Continue, and dozens more.
Step 5: Add Open WebUI (optional)
For a ChatGPT-style interface:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Open http://localhost:3000 in your browser.
Model selection cheat sheet
| Model | Size | Best for |
|---|---|---|
| Phi-4 | 14B | Reasoning, coding |
| Llama 3.2 3B | 3B | Fast, low VRAM |
| Llama 3.1 8B | 8B | General use |
| Mistral 7B | 7B | Speed, instruction following |
| Llama 3.1 70B | 70B | Best quality, high VRAM |
| Gemma 3 27B | 27B | Great mid-range option |
Next steps
- Check out the hardware picks to upgrade your rig
- Explore Open WebUI for multi-model conversations and RAG
- Try
ollama listandollama psto manage models