Guide · 30 min · Updated May 25 2026

Set up Ollama for local AI agents on Mac and Windows.

A complete walk-through for using Ollama as the local model backend for a desktop AI agent in MultiAgentOS. Install Ollama, pick the right model for your hardware, verify the API endpoint, and connect it as a private, offline-capable provider in under 30 minutes.

Connection modes

Route each task through the right model or tool surface.

MultiAgentOS supports API keys, local servers, CLI pipes, OAuth, terminal templates, and local AI/GGUF workflows.

  1. 1 Choose provider
  2. 2 Store secret
  3. 3 Test model
  4. 4 Enable tools
Full-frame MultiAgentOS settings sidecar showing connection cards inside the complete app shell.
Full-frame screenshot from the current MultiAgentOS app.
Connection tabs screenshot in MultiAgentOS.
Connection tabs Switch between API Key, Local Server, CLI Pipe, OAuth, Terminal, and Local AI routing.
Settings sidecar screenshot in MultiAgentOS.
Settings sidecar Configure provider and local server details without leaving the app frame.
Terminal route screenshot in MultiAgentOS.
Terminal route Use terminal-backed workflows alongside the main prompt and model selector.

Why use Ollama for AI agents?

Ollama is the easiest way to run open-weight large language models locally on Mac and Windows. For agent workflows that handle private code, customer data, financial records, or proprietary documents, sending every prompt to a cloud API is often unnecessary and sometimes prohibited. Ollama keeps everything on the machine while still giving you a clean OpenAI-compatible API surface.

MultiAgentOS treats Ollama as a first-class connection type. You plug in the local endpoint once, choose a model, and the agent can chat, attach files, run tools, call MCP servers, and execute desktop actions using a model that never leaves your computer.

1. Install Ollama

Download Ollama from the official site for macOS or Windows. The installer registers a background service that exposes a local HTTP API on port 11434.

  • macOS: drag the app to Applications and launch it once to start the service.
  • Windows: run the installer; the service starts automatically and persists across reboots.
  • Linux: use the install script curl -fsSL https://ollama.com/install.sh | sh.

Confirm Ollama is running by opening a terminal and entering ollama --version. If the command works, you are ready for the next step.

2. Pull and test a model

Choose a model that fits both your hardware and the kind of agent work you plan to do. For most users on 16 GB RAM, a 7B or 8B instruction-tuned model is the sweet spot for agent reasoning.

ollama pull llama3.1:8b
ollama run llama3.1:8b "Reply with the single word ready."

Other reliable starting points:

  • ollama pull qwen2.5:7b-instruct — strong tool use and code reasoning
  • ollama pull mistral:7b-instruct — fast, well-known for instruction following
  • ollama pull llama3.1:70b — only if you have 64 GB+ RAM or a strong GPU

If the model replies, the runtime is working. If not, see the troubleshooting section below.

3. Verify the local API server

Ollama serves a local API on http://localhost:11434. The fastest sanity check is the tags endpoint, which returns every model you have pulled.

curl http://localhost:11434/api/tags

You should see JSON listing the model you just pulled. If curl times out or refuses the connection, Ollama is not running. Restart the app on macOS or the service on Windows.

4. Connect MultiAgentOS to your local Ollama

  1. Open MultiAgentOS, then click Settings in the bottom-left.
  2. Choose the Local Server connection type (it covers Ollama, LM Studio, llama.cpp, and other OpenAI-compatible local APIs).
  3. Paste http://localhost:11434 into the endpoint field.
  4. Select the model you pulled in step 2.
  5. Send a short prompt to confirm the agent can reach the local model.

Once that round-trip works, you can start enabling tools, file attachments, MCP servers, and supervised subagents. See MCP tools for desktop AI for the next step.

5. Add agent context gradually

The fastest path to a flaky agent is to enable every tool, every MCP server, and the largest possible context window on day one. Resist that.

  1. Start with plain chat against the local model.
  2. Add a single file attachment and confirm the model references it correctly.
  3. Enable one runtime tool category at a time (files, web, shell, screenshot).
  4. Add MCP servers individually, with least-privilege scopes.
  5. Only then introduce supervised subagents and desktop actions.

This layered approach makes failures easy to attribute, which matters more for local models than for cloud frontier models.

Troubleshooting Ollama with MultiAgentOS

  • Connection refused. Ollama is not running. Open the app on macOS or restart the Windows service.
  • Model not found. The selected model name does not match a pulled model. Run ollama list and use the exact name shown.
  • Very slow responses. Pick a smaller quantized model such as llama3.1:8b-instruct-q4_K_M, close other RAM-heavy apps, or move to a machine with more memory or a discrete GPU.
  • Agent ignores tools. Some smaller models follow tool schemas poorly. Try Qwen 2.5 7B Instruct or step up to a 14B/32B model for tool-heavy agent work.
  • Different port or host. Set OLLAMA_HOST before launching Ollama if you need to bind to a different interface.

What to do after Ollama works

Once Ollama is reachable from MultiAgentOS, the next high-leverage steps are picking the right local model and wiring tools.

Frequently asked questions

Which Ollama model is best for agent workflows?

For most 16 GB machines, Llama 3.1 8B Instruct, Qwen 2.5 7B Instruct, or Mistral 7B Instruct are the right starting points. Tool-use quality matters more than raw benchmark scores — test the model with your own MCP tools before committing.

Can I run multiple Ollama models in parallel?

Yes, but RAM is the bottleneck. Loading a second model evicts the first if memory is tight. For agent workflows that fan out work to subagents, keep one strong tool-using model loaded and use it for everything.

Does using Ollama with MultiAgentOS send any data to the cloud?

No. Ollama runs entirely on your machine. MultiAgentOS only sends data to a cloud provider if you explicitly route a request to an API-key connection (OpenAI, Anthropic, etc).

How much RAM do I need for Ollama agents?

Minimum 16 GB for a 7B/8B model in Q4 quantization. 32 GB lets you run 13B comfortably or keep multiple smaller models warm. 64 GB+ opens 30B-class models.