WWDC26 · 14 min · AI & Machine Learning

Run local agentic AI on the Mac using MLX

Run AI agents locally with privacy, low latency, and offline access. Dive into how MLX advancements and Mac hardware make powerful agentic workflows possible entirely on-device. You’ll explore code agents such as OpenCode, see how they integrate into Xcode, learn techniques for multi-Mac scaling, and discover how to integrate tools seamlessly — without ever leaving your machine.

Watch at developer.apple.com ↗

Transcript all transcripts

Chapters

0:00 — Introduction
0:32 — The chat and agentic loop
2:42 — Local agentic AI stack
4:36 — Setting up your own agent
5:39 — Making agents fast
6:53 — Concurrency and distributed inference
9:20 — More examples
13:01 — Next steps

Code shown on screen · 3 snippets

Set up MLX-LM and start the local server bash · at 4:40 ↗

# Step 1: Install MLX-LM
pip install mlx-lm

# Step 2: Start the server
mlx_lm.server --model mlx-community/Qwen-3.5-4B-8bit

# Step 3: Point your agent to the server
curl -X POST \
  http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"default_model","messages":[{"role":"user","content":"Hello!"}]}'

Configure an agent to use your local MLX server json · at 5:18 ↗

{
  "$schema": "https://opencode.ai/config.json",
  "model": "mlx/default_model",
  "small_model": "mlx/default_model",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "default_model": {
          "name": "Default MLX Model"
        }
      }
    }
  }
}

Launch distributed inference with MLX bash · at 8:33 ↗

mlx.launch --hostfile hosts.json \
  --backend jaccl \
  /remote/path/to/mlx_lm.server \
  --model mlx-community/Qwen-3.5-122B-A3B-8bit

Resources

Documentation MLX Swift LM on GitHub
Sample code MLX Swift Examples
Sample code MLX Examples
Documentation MLX Swift
Documentation MLX LM - Python API
Documentation MLX Explore - Python API
Documentation MLX Framework
Documentation MLX

Explore distributed inference and training with MLX

WWDC26 · 9 snippets

22 min
Explore numerical computing in Swift with MLX

WWDC26 · 6 snippets

15 min
Get started with MLX for Apple silicon

WWDC25 · 14 snippets

19 min
Explore large language models on Apple silicon with MLX

WWDC25 · 16 snippets

20 min

Chapters

Code shown on screen · 3 snippets

Resources

Related sessions

Explore distributed inference and training with MLX

Explore numerical computing in Swift with MLX

Get started with MLX for Apple silicon

Explore large language models on Apple silicon with MLX