Run a Local LLM in 5 Minutes with Ollama

The fastest way to get a private, offline AI assistant running on your own hardware.

Ollama is the easiest way to run large language models locally. It handles model downloads, quantization selection, GPU detection, and serving — all with a single command. Here’s how to get running in under five minutes.

What you need

  • A machine with at least 8 GB RAM (16 GB recommended)
  • A GPU with 6+ GB VRAM, OR a fast CPU with 32+ GB RAM
  • Linux, macOS, or Windows (WSL2)

Step 1: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

On macOS, download the app from ollama.com. On Windows, use the installer or run under WSL2.

Step 2: Pull a model

ollama pull llama3.2

This downloads Llama 3.2 (3B parameters, ~2 GB). For a more capable model:

ollama pull llama3.1:8b     # 8B, ~5 GB — good balance
ollama pull llama3.1:70b    # 70B, ~40 GB — requires 48+ GB VRAM
ollama pull phi4             # Microsoft Phi-4, excellent reasoning
ollama pull mistral          # Mistral 7B, fast and capable

Step 3: Chat

ollama run llama3.2

That’s it. You’re now running a private AI assistant on your own hardware. No API key. No subscription. No data leaving your machine.

Step 4: Use the API

Ollama exposes an OpenAI-compatible API on localhost:11434:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

This means any app that supports the OpenAI API can point at Ollama instead — Open WebUI, Obsidian, VS Code Continue, and dozens more.

Step 5: Add Open WebUI (optional)

For a ChatGPT-style interface:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Open http://localhost:3000 in your browser.

Model selection cheat sheet

ModelSizeBest for
Phi-414BReasoning, coding
Llama 3.2 3B3BFast, low VRAM
Llama 3.1 8B8BGeneral use
Mistral 7B7BSpeed, instruction following
Llama 3.1 70B70BBest quality, high VRAM
Gemma 3 27B27BGreat mid-range option

Next steps

  • Check out the hardware picks to upgrade your rig
  • Explore Open WebUI for multi-model conversations and RAG
  • Try ollama list and ollama ps to manage models