Run the largest possible frontier AI model locally — without the complexity.

The goal is simple: run the biggest open-weight model possible at good tokens per second, entirely on local or consumer hardware hardware. No cloud. No API keys. Just your machine.

Follow the build → GitHub

Progress · Qwen3.6 35B-A3B · RTX 5060 Ti

64 t/s

+ MTP · 131k ctx

Experiment 03 →

57 t/s

llama-server · LM cache

Experiment 02 →

36 t/s

llama-cli · -ncmoe 11

Experiment 01 →

18 t/s

Ollama — default settings

baseline

scroll

Latest Posts 5 published

16 May 2026

Experiment 03 LM Cache MTP llama.cpp

Reaching 64 t/s: LM Cache, KV Checkpointing, and MTP

The proxy showed 57 TPS, but llama-cli gave 36. Same hardware. This post answers why — then adds MTP to reach 64 TPS at 131k context.

10 min read

→

9 May 2026

Experiment 01 MoE llama.cpp RTX 5060 Ti

Running Qwen3.6 35B at 40 TPS on Consumer Hardware

Ollama leaves 2.2× performance on the table for MoE models. A deep dive into memory bandwidth hierarchy and why GPU utilization % is a misleading metric.

8 min read

→

12 May 2026

Experiment 02 Claude Code Cline Proxy

Using Claude Code and Cline with a Local LLM

A 600-line proxy bridges Anthropic's API and llama-server — with lifetime metrics tracking and a live dashboard.

Change Default Directory for Ollama

How to change the default directory for Ollama models on Windows using an environment variable.

What Are LLMs (Large Language Models)?

A clear breakdown of what large language models are, how they work, and why they matter for local inference.

5 min read

→

What this is

Compiled Thoughts is a public build log with one goal: run the biggest open-weight model possible at good tokens per second, entirely on local hardware.

Open-weight models are getting bigger and better fast. But actually running them — on your own machine, without cloud APIs or expensive subscriptions — is still harder than it should be. This blog is about closing that gap.

Every post is a step in the build: benchmarks, tooling, configuration, failures, and breakthroughs. All numbers are real and measured.

Browse all articles, or follow the build from the beginning.

View all posts →