Blog · Published May 25 2026

Local AI agent on Windows in 2026 — the full setup.

From a clean Windows 10 or 11 install to a working local AI agent with MCP tools, PowerShell scripts, and zero cloud dependency. Under an hour, $0 in ongoing cloud bills, prompts that never leave the machine.

Why local on Windows right now

Local AI on Windows finally crossed the "actually useful" line in 2025. Quantized 7B-14B models follow instructions reliably, Windows-native runtimes are fast enough for real work on common laptops and GPUs, and MCP gives agents portable tools. A modern Windows laptop with 16-32 GB RAM is enough to run a real agent privately.

What you'll have at the end

Ollama serving a tool-capable local model on localhost:11434.
MultiAgentOS using that local model with files, screenshots, PowerShell, and MCP tools.
Approval gates and supervised subagents so the agent can work without going rogue.
Zero recurring cloud cost for everyday work.

Step 1 — Install Ollama

Download Ollama from the official site and run the Windows installer. It exposes a local API on localhost:11434. Confirm it's alive:

ollama --version
Invoke-RestMethod http://localhost:11434/api/tags

The first command should show a version. The second should return JSON (empty models array is fine — you haven't pulled anything yet).

Step 2 — Pull a tool-capable model

For 16 GB Windows machines, a solid starting point is llama3.1:8b or qwen2.5:7b. Both are small enough for everyday local work and capable enough for basic agent tasks.

ollama pull llama3.1:8b
ollama run llama3.1:8b "Reply with the single word ready."

If the model replies with one word and stops, you have a working local brain. If it rambles or hallucinates a tool call, swap to a different model — the difference between models for agent work is real.

Step 3 — Install MultiAgentOS

Download the Windows installer from the MultiAgentOS product page. After install, open Settings, choose the Local Server connection type, paste http://localhost:11434, select the model you pulled, and save.

Send a one-line prompt to confirm the round trip works before you enable any tools. This is the moment where most setup failures show up — and they're easy to fix when nothing complicated is in the way yet.

Step 4 — Wire MCP tools

Pick one MCP server to start. The filesystem server is the most useful first choice — point it at one project folder, not your whole home directory. In MultiAgentOS settings, add the server, confirm it starts cleanly, and inspect the exposed tools.

Test with a read-only prompt: "list the files in this folder." If the model calls the tool, reads the result, and summarises it correctly, you have a working agent with real tool access. Walk up from there.

See Connect MCP tools to desktop AI agents for the full ladder.

Step 5 — Wire PowerShell as a tool

The Windows-native superpower is PowerShell. MultiAgentOS lets you expose PowerShell scripts as Terminal-template tools the agent can call under approval gates.

A simple example — a tool that reads the disk usage of a folder:

# tool-disk-usage.ps1
param([string]$Path)
Get-ChildItem $Path -Recurse |
  Measure-Object -Property Length -Sum |
  Select-Object @{Name='GB';Expression={[math]::Round($_.Sum / 1GB, 2)}}

Add the script as a Terminal tool with one parameter ($Path), enable approval-required, and the agent can now ask for disk usage of a folder you specify. Start small, expand once it works.

Step 6 — Lock down the workspace

Before you let an agent run unattended:

Set turn budgets on subagents (8 turns is a safe default).
Require approval for write/send/delete tool categories.
Scope MCP server tokens to one folder, one repo, one database role.
Use a tighter tool subset for subagents than the main agent.

What this setup gets you

Private. Prompts, files, credentials, and run history stay on your Windows machine.
Free at runtime. No per-token cost. Local inference is paid for once with hardware.
Offline-capable. Once Ollama and the model are pulled, no internet is needed.
Extendable. Add cloud routing later for the few prompts that need a frontier model.

What this setup does not do

Frontier-model reasoning. A 7B local model is good, not GPT-4 class.
Heavy multimodal work on small machines. 16 GB is tight for image-heavy agent work.
Replace your team's shared chat infrastructure — this is a single-user desktop setup.