What you'll need
- OS: macOS 11+ or Windows 10/11.
- RAM: 16 GB recommended (8 GB minimum, expect a smaller model).
- Disk: ~10 GB free for the model.
- Time: ~30 minutes (most of it is the model download).
Step 1 — Install Ollama
Ollama is the easiest way to run
quantised LLMs on a laptop. It exposes an OpenAI-compatible HTTP
API on localhost:11434 that any agent runtime can
hit.
macOS
brew install ollama
brew services start ollama # or: ollama serve
Windows
winget install Ollama.Ollama
# Ollama installs as a Windows service and starts automatically.
Confirm it's running:
curl http://localhost:11434/api/tags
# You should get JSON with an empty 'models' list.
Step 2 — Pull a tool-capable model
Not every model handles function-calling well. For agentic workloads in 2026, the best small / fast option is Qwen2.5 7B Instruct — about 4.7 GB on disk, runs comfortably on a Mac M2 or a 16 GB RAM Windows box.
ollama pull qwen2.5:7b-instruct
ollama list # confirm it's installed
Quick sanity check that tool-calling works:
curl http://localhost:11434/api/chat -d '{
"model": "qwen2.5:7b-instruct",
"messages": [{"role":"user","content":"Hello — answer in one sentence."}],
"stream": false
}'
If you have the hardware (24 GB+ VRAM), step up to
qwen2.5-coder:32b or llama3.3:70b for
noticeably better code generation. Same Ollama API; just heavier.
Step 3 — Install MultiAgentOS
MultiAgentOS is a desktop app that orchestrates multiple agents (Planner, Coder, Reviewer, Operator) against any OpenAI-compatible LLM endpoint — including the one Ollama just exposed.
Download the MSI (Windows) or DMG (Mac) from the pricing page after starting your 14-day trial. Install, launch.
Step 4 — Connect Ollama as a provider
Inside MultiAgentOS:
- Settings → Providers → Add provider.
- Type:
OpenAI-compatible. - Base URL:
http://localhost:11434/v1 - API key: anything (Ollama ignores it; pass
none). - Save.
Then go to Local Models → it should
auto-discover qwen2.5:7b-instruct. Click
Set as default.
Finally — and this is the move that makes the whole thing local — go to Privacy & Offline and toggle Enable Offline Mode. The runtime will now refuse to call any cloud provider. Inference is local-only.
Step 5 — Run your first multi-agent task
Open a new chat. Switch the mode selector to Code. Try this prompt:
Read the README.md in~/Documents/sample-project, summarise it in 5 bullet points, then write a one-paragraph cover note ascover.mdin the same folder.
What you should see:
- The Planner decomposes into 3 steps (read, summarise, write).
- The Coder picks up step 1, calls
file.read. - The summary appears inline in the chat.
- The Coder calls
file.writeforcover.md. - The Timeline panel on the right shows each tool call as it happens.
Total round-trip on a Mac M2 with 16 GB: about 8–15 seconds. Total tokens used: somewhere north of 5,000. Total cost: zero. (Repeat that 50 times a day for a year and re-read our cost piece.)
Where it gets interesting
Once you've got the basic loop working, the four built-in agents unlock more interesting workflows:
-
Operator — switch the mode selector to
Operator, ask it to “search HackerNews for posts about
local LLMs this week and save the top 5 to
hn.md.” Real browser, headless or visible. -
Auto Agent — the autonomous mode runs without
per-step approval, gated by your Guardrails preset. Best for
longer multi-step tasks (“refactor every
.tsfile in this folder to use the new logging API”). -
Broadcast — load a second model
(
ollama pull mistral:7b) and ask the same question of both at once. The orchestrator stitches a consensus answer with attribution.
Common gotchas
-
“Connection refused” on the
provider config: Ollama isn't running.
brew services restart ollamaor check the Windows service. - Tool calls fail with weird JSON: model isn't tool-capable. Use a Qwen2.5 or Llama 3.3 variant; older models hallucinate the schema.
-
Out of RAM: step down to a smaller quant
(
:4b-instructinstead of:7b-instruct) or close other apps.
The bottom line
Local agents in 2026 are a 30-minute install, not a research project. The hardware is on your desk; the models are open; the tooling is mature. The main thing slowing people down is inertia — “I'll just use Cursor for now.” If your workload tolerates a slightly smaller model, the trade is usually worth it.
$79 founder pricing · 14-day risk-free trial · no telemetry, ever.
Start trial