OllamaLLMgetting-started

Run a Local LLM in 5 Minutes with Ollama

January 10, 2025

The fastest way to get a private, offline AI assistant running on your own hardware.

Ollama is the easiest way to run large language models locally. It handles model downloads, quantization selection, GPU detection, and serving — all with a single command. Here’s how to get running in under five minutes.

What you need

A machine with at least 8 GB RAM (16 GB recommended)
A GPU with 6+ GB VRAM, OR a fast CPU with 32+ GB RAM
Linux, macOS, or Windows (WSL2)

Step 1: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

On macOS, download the app from ollama.com. On Windows, use the installer or run under WSL2.

Step 2: Pull a model

ollama pull llama3.2

This downloads Llama 3.2 (3B parameters, ~2 GB). For a more capable model:

ollama pull llama3.1:8b     # 8B, ~5 GB — good balance
ollama pull llama3.1:70b    # 70B, ~40 GB — requires 48+ GB VRAM
ollama pull phi4             # Microsoft Phi-4, excellent reasoning
ollama pull mistral          # Mistral 7B, fast and capable

Step 3: Chat

ollama run llama3.2

That’s it. You’re now running a private AI assistant on your own hardware. No API key. No subscription. No data leaving your machine.

Step 4: Use the API

Ollama exposes an OpenAI-compatible API on localhost:11434:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

This means any app that supports the OpenAI API can point at Ollama instead — Open WebUI, Obsidian, VS Code Continue, and dozens more.

Step 5: Add Open WebUI (optional)

For a ChatGPT-style interface:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Open http://localhost:3000 in your browser.

Model selection cheat sheet

Model	Size	Best for
Phi-4	14B	Reasoning, coding
Llama 3.2 3B	3B	Fast, low VRAM
Llama 3.1 8B	8B	General use
Mistral 7B	7B	Speed, instruction following
Llama 3.1 70B	70B	Best quality, high VRAM
Gemma 3 27B	27B	Great mid-range option

Next steps

Check out the hardware picks to upgrade your rig
Explore Open WebUI for multi-model conversations and RAG
Try ollama list and ollama ps to manage models