What you'll need

OS: macOS 11+ or Windows 10/11.
RAM: 16 GB recommended (8 GB minimum, expect a smaller model).
Disk: ~10 GB free for the model.
Time: ~30 minutes (most of it is the model download).

Step 1 — Install Ollama

Ollama is the easiest way to run quantised LLMs on a laptop. It exposes an OpenAI-compatible HTTP API on localhost:11434 that any agent runtime can hit.

macOS

brew install ollama
brew services start ollama   # or: ollama serve

Windows

winget install Ollama.Ollama
# Ollama installs as a Windows service and starts automatically.

Confirm it's running:

curl http://localhost:11434/api/tags
# You should get JSON with an empty 'models' list.

Step 2 — Pull a tool-capable model

Not every model handles function-calling well. For agentic workloads in 2026, the best small / fast option is Qwen2.5 7B Instruct — about 4.7 GB on disk, runs comfortably on a Mac M2 or a 16 GB RAM Windows box.

ollama pull qwen2.5:7b-instruct
ollama list   # confirm it's installed

Quick sanity check that tool-calling works:

curl http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:7b-instruct",
  "messages": [{"role":"user","content":"Hello — answer in one sentence."}],
  "stream": false
}'

If you have the hardware (24 GB+ VRAM), step up to qwen2.5-coder:32b or llama3.3:70b for noticeably better code generation. Same Ollama API; just heavier.

Step 3 — Install MultiAgentOS

MultiAgentOS is a desktop app that orchestrates multiple agents (Planner, Coder, Reviewer, Operator) against any OpenAI-compatible LLM endpoint — including the one Ollama just exposed.

Download the MSI (Windows) or DMG (Mac) from the pricing page after starting your 14-day trial. Install, launch.

Step 4 — Connect Ollama as a provider

Inside MultiAgentOS:

Settings → Providers → Add provider.
Type: OpenAI-compatible.
Base URL: http://localhost:11434/v1
API key: anything (Ollama ignores it; pass none).
Save.

Then go to Local Models → it should auto-discover qwen2.5:7b-instruct. Click Set as default.

Finally — and this is the move that makes the whole thing local — go to Privacy & Offline and toggle Enable Offline Mode. The runtime will now refuse to call any cloud provider. Inference is local-only.

Step 5 — Run your first multi-agent task

Open a new chat. Switch the mode selector to Code. Try this prompt:

Read the README.md in ~/Documents/sample-project, summarise it in 5 bullet points, then write a one-paragraph cover note as cover.md in the same folder.

What you should see:

The Planner decomposes into 3 steps (read, summarise, write).
The Coder picks up step 1, calls file.read.
The summary appears inline in the chat.
The Coder calls file.write for cover.md.
The Timeline panel on the right shows each tool call as it happens.

Total round-trip on a Mac M2 with 16 GB: about 8–15 seconds. Total tokens used: somewhere north of 5,000. Total cost: zero. (Repeat that 50 times a day for a year and re-read our cost piece.)

Where it gets interesting

Once you've got the basic loop working, the four built-in agents unlock more interesting workflows:

Operator — switch the mode selector to Operator, ask it to “search HackerNews for posts about local LLMs this week and save the top 5 to hn.md.” Real browser, headless or visible.
Auto Agent — the autonomous mode runs without per-step approval, gated by your Guardrails preset. Best for longer multi-step tasks (“refactor every .ts file in this folder to use the new logging API”).
Broadcast — load a second model (ollama pull mistral:7b) and ask the same question of both at once. The orchestrator stitches a consensus answer with attribution.

Common gotchas

“Connection refused” on the provider config: Ollama isn't running. brew services restart ollama or check the Windows service.
Tool calls fail with weird JSON: model isn't tool-capable. Use a Qwen2.5 or Llama 3.3 variant; older models hallucinate the schema.
Out of RAM: step down to a smaller quant (:4b-instruct instead of :7b-instruct) or close other apps.

The bottom line

Local agents in 2026 are a 30-minute install, not a research project. The hardware is on your desk; the models are open; the tooling is mature. The main thing slowing people down is inertia — “I'll just use Cursor for now.” If your workload tolerates a slightly smaller model, the trade is usually worth it.

Try MultiAgentOS with your local Ollama.

$79 founder pricing · 14-day risk-free trial · no telemetry, ever.

Start trial