Blog · Published May 25 2026

Local AI agent on Windows in 2026 — the full setup.

From a clean Windows 10 or 11 install to a working local AI agent with MCP tools, PowerShell scripts, and zero cloud dependency. Under an hour, $0 in ongoing cloud bills, prompts that never leave the machine.

Why local on Windows right now

Local AI on Windows finally crossed the "actually useful" line in 2025. Quantized 7B-14B models follow instructions reliably, Windows-native runtimes are fast enough for real work on common laptops and GPUs, and MCP gives agents portable tools. A modern Windows laptop with 16-32 GB RAM is enough to run a real agent privately.

What you'll have at the end

  • Ollama serving a tool-capable local model on localhost:11434.
  • MultiAgentOS using that local model with files, screenshots, PowerShell, and MCP tools.
  • Approval gates and supervised subagents so the agent can work without going rogue.
  • Zero recurring cloud cost for everyday work.

Step 1 — Install Ollama

Download Ollama from the official site and run the Windows installer. It exposes a local API on localhost:11434. Confirm it's alive:

ollama --version
Invoke-RestMethod http://localhost:11434/api/tags

The first command should show a version. The second should return JSON (empty models array is fine — you haven't pulled anything yet).

Step 2 — Pull a tool-capable model

For 16 GB Windows machines, a solid starting point is llama3.1:8b or qwen2.5:7b. Both are small enough for everyday local work and capable enough for basic agent tasks.

ollama pull llama3.1:8b
ollama run llama3.1:8b "Reply with the single word ready."

If the model replies with one word and stops, you have a working local brain. If it rambles or hallucinates a tool call, swap to a different model — the difference between models for agent work is real.

Step 3 — Install MultiAgentOS

Download the Windows installer from the MultiAgentOS product page. After install, open Settings, choose the Local Server connection type, paste http://localhost:11434, select the model you pulled, and save.

Send a one-line prompt to confirm the round trip works before you enable any tools. This is the moment where most setup failures show up — and they're easy to fix when nothing complicated is in the way yet.

Step 4 — Wire MCP tools

Pick one MCP server to start. The filesystem server is the most useful first choice — point it at one project folder, not your whole home directory. In MultiAgentOS settings, add the server, confirm it starts cleanly, and inspect the exposed tools.

Test with a read-only prompt: "list the files in this folder." If the model calls the tool, reads the result, and summarises it correctly, you have a working agent with real tool access. Walk up from there.

See Connect MCP tools to desktop AI agents for the full ladder.

Step 5 — Wire PowerShell as a tool

The Windows-native superpower is PowerShell. MultiAgentOS lets you expose PowerShell scripts as Terminal-template tools the agent can call under approval gates.

A simple example — a tool that reads the disk usage of a folder:

# tool-disk-usage.ps1
param([string]$Path)
Get-ChildItem $Path -Recurse |
  Measure-Object -Property Length -Sum |
  Select-Object @{Name='GB';Expression={[math]::Round($_.Sum / 1GB, 2)}}

Add the script as a Terminal tool with one parameter ($Path), enable approval-required, and the agent can now ask for disk usage of a folder you specify. Start small, expand once it works.

Step 6 — Lock down the workspace

Before you let an agent run unattended:

  • Set turn budgets on subagents (8 turns is a safe default).
  • Require approval for write/send/delete tool categories.
  • Scope MCP server tokens to one folder, one repo, one database role.
  • Use a tighter tool subset for subagents than the main agent.

What this setup gets you

  • Private. Prompts, files, credentials, and run history stay on your Windows machine.
  • Free at runtime. No per-token cost. Local inference is paid for once with hardware.
  • Offline-capable. Once Ollama and the model are pulled, no internet is needed.
  • Extendable. Add cloud routing later for the few prompts that need a frontier model.

What this setup does not do

  • Frontier-model reasoning. A 7B local model is good, not GPT-4 class.
  • Heavy multimodal work on small machines. 16 GB is tight for image-heavy agent work.
  • Replace your team's shared chat infrastructure — this is a single-user desktop setup.

Related reading