← All posts
· 9 min read

How to run AI agents locally with Ollama — a practical 30-minute setup

From a clean machine to a working multi-agent setup that actually does work. Five steps, two operating systems, real commands. By the end you'll have local agents that can read files, search the web, and write code — without sending a single token to OpenAI or Anthropic.

What you'll need

  • OS: macOS 11+ or Windows 10/11.
  • RAM: 16 GB recommended (8 GB minimum, expect a smaller model).
  • Disk: ~10 GB free for the model.
  • Time: ~30 minutes (most of it is the model download).

Step 1 — Install Ollama

Ollama is the easiest way to run quantised LLMs on a laptop. It exposes an OpenAI-compatible HTTP API on localhost:11434 that any agent runtime can hit.

macOS

brew install ollama
brew services start ollama   # or: ollama serve

Windows

winget install Ollama.Ollama
# Ollama installs as a Windows service and starts automatically.

Confirm it's running:

curl http://localhost:11434/api/tags
# You should get JSON with an empty 'models' list.

Step 2 — Pull a tool-capable model

Not every model handles function-calling well. For agentic workloads in 2026, the best small / fast option is Qwen2.5 7B Instruct — about 4.7 GB on disk, runs comfortably on a Mac M2 or a 16 GB RAM Windows box.

ollama pull qwen2.5:7b-instruct
ollama list   # confirm it's installed

Quick sanity check that tool-calling works:

curl http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:7b-instruct",
  "messages": [{"role":"user","content":"Hello — answer in one sentence."}],
  "stream": false
}'

If you have the hardware (24 GB+ VRAM), step up to qwen2.5-coder:32b or llama3.3:70b for noticeably better code generation. Same Ollama API; just heavier.

Step 3 — Install MultiAgentOS

MultiAgentOS is a desktop app that orchestrates multiple agents (Planner, Coder, Reviewer, Operator) against any OpenAI-compatible LLM endpoint — including the one Ollama just exposed.

Download the MSI (Windows) or DMG (Mac) from the pricing page after starting your 14-day trial. Install, launch.

Step 4 — Connect Ollama as a provider

Inside MultiAgentOS:

  1. Settings → Providers → Add provider.
  2. Type: OpenAI-compatible.
  3. Base URL: http://localhost:11434/v1
  4. API key: anything (Ollama ignores it; pass none).
  5. Save.

Then go to Local Models → it should auto-discover qwen2.5:7b-instruct. Click Set as default.

Finally — and this is the move that makes the whole thing local — go to Privacy & Offline and toggle Enable Offline Mode. The runtime will now refuse to call any cloud provider. Inference is local-only.

Step 5 — Run your first multi-agent task

Open a new chat. Switch the mode selector to Code. Try this prompt:

Read the README.md in ~/Documents/sample-project, summarise it in 5 bullet points, then write a one-paragraph cover note as cover.md in the same folder.

What you should see:

  • The Planner decomposes into 3 steps (read, summarise, write).
  • The Coder picks up step 1, calls file.read.
  • The summary appears inline in the chat.
  • The Coder calls file.write for cover.md.
  • The Timeline panel on the right shows each tool call as it happens.

Total round-trip on a Mac M2 with 16 GB: about 8–15 seconds. Total tokens used: somewhere north of 5,000. Total cost: zero. (Repeat that 50 times a day for a year and re-read our cost piece.)

Where it gets interesting

Once you've got the basic loop working, the four built-in agents unlock more interesting workflows:

  • Operator — switch the mode selector to Operator, ask it to “search HackerNews for posts about local LLMs this week and save the top 5 to hn.md.” Real browser, headless or visible.
  • Auto Agent — the autonomous mode runs without per-step approval, gated by your Guardrails preset. Best for longer multi-step tasks (“refactor every .ts file in this folder to use the new logging API”).
  • Broadcast — load a second model (ollama pull mistral:7b) and ask the same question of both at once. The orchestrator stitches a consensus answer with attribution.

Common gotchas

  • “Connection refused” on the provider config: Ollama isn't running. brew services restart ollama or check the Windows service.
  • Tool calls fail with weird JSON: model isn't tool-capable. Use a Qwen2.5 or Llama 3.3 variant; older models hallucinate the schema.
  • Out of RAM: step down to a smaller quant (:4b-instruct instead of :7b-instruct) or close other apps.

The bottom line

Local agents in 2026 are a 30-minute install, not a research project. The hardware is on your desk; the models are open; the tooling is mature. The main thing slowing people down is inertia — “I'll just use Cursor for now.” If your workload tolerates a slightly smaller model, the trade is usually worth it.

Try MultiAgentOS with your local Ollama.

$79 founder pricing · 14-day risk-free trial · no telemetry, ever.

Start trial