SoftwareHeißer Tipp

My LLMs for Pi: GPT, Claude, GLM, Kimi and DeepSeek

Which LLMs I currently like using in Pi for coding agents, why GPT-5.5 is my default and where OpenRouter or OpenCode Go become useful.

Published · June 4, 2026

My LLMs for Pi: GPT, Claude, GLM, Kimi and DeepSeek

With coding agents, I like talking about harnesses.

Terminal. Tools. Sessions. Context. Permissions. Extensions.

All those things that turn a chat window into a useful tool.

But at some point, a pretty simple question remains:

Which brain are you actually putting into that harness?

I already covered the harness part in the article about Pi as a coding agent. This text is the less philosophical and slightly more expensive part.

The models.

And because LLM prices, limits and model names change faster than I can say “just a quick refactor”: this is not an eternal ranking.

It is my state of things in June 2026.

Why the model question feels calmer in Pi

Pi is pleasantly unromantic about models for me.

I do not have to change my entire workflow just because I want to try a different LLM. I open /model, choose a model and keep working.

That sounds banal.

But it is one of the reasons I like Pi so much.

A model is not the entire identity of my setup. It is a component. An important, expensive, sometimes astonishingly annoying component. But still a component.

And Pi treats it exactly like that.

OpenAI through Codex, Anthropic through API, DeepSeek, Z.AI, Kimi, OpenRouter, OpenCode Go and custom providers can all land in the same workflow. Not always with exactly the same strengths, not always with exactly the same limits. But without that absurd tool switch where you start from zero again every time.

My default: GPT-5.5

If I could choose only one model, my default right now would be GPT-5.5.

Not for everything.

But for a lot.

OpenAI positions GPT-5.5 pretty clearly for coding and professional work.

You can feel that too.

My personal opinion has been for months that 5.4/5.5 are the "best" models, and apparently there are now more and more benchmarks that support me there 🎉

It is strong at refactors, debugging, codebase navigation, planning, sober execution and those sessions where the model should please not forget after three tool calls why it is here in the first place.

On top of that, GPT-5.5 brings more than one million tokens of context, xhigh reasoning and very large outputs. For agent work, that is not a small side note. Long contexts are not automatically smart, but they often prevent a model from acting every five minutes as if it had just met the project.

The actual sweet spot for me is Codex.

The Codex subscriptions are surprisingly generous if you regularly work with coding agents. Pure API usage can get very real very quickly. A subscription often feels calmer there, at least if you actually use it.

And this is the important difference to Claude: you are also officially allowed to use the ChatGPT or Codex subscription for that.

Pi directly supports login for OpenAI Codex subscriptions instead of making you dance around with questionable token workarounds.

That makes a big difference to me.

The caveat: frontend.

For visual work, layout, courage, taste and the question of whether a page at the end again looks like a very polite SaaS landing page, GPT-5.5 is not always my first grab. It can do that. But I trust it less blindly there than with backend, tooling or logic.

Claude Sonnet and Opus: strong, but expensive

Of course Claude Sonnet and Claude Opus belong on this list.

Anything else would be nonsense.

Sonnet is often the sensible Claude default: fast enough, strong enough, good at code, good at explanation, pleasant on longer tasks.

Opus is the variant I want to take when a task is really long, branched and potentially expensive because mistakes there would be even more expensive.

Anthropic describes Opus exactly in that corner: complex reasoning, long-horizon agentic coding, a lot of autonomy. The newer Sonnet and Opus versions also move in the area of 1M context, which is obviously tempting for large codebases and longer agent sessions.

My problem is not the quality.

My problem is the price and the practical usability in a small subscription.

And important: a normal Claude subscription is not a Pi ticket for me.

With a Claude subscription, I would not use the Claude models in Pi. Not because there are no technical workarounds. There are. But that is not the clean allowed path, and if Anthropic evaluates that as subscription circumvention, the worst case is a suspended Claude account.

That would be far too stupid for me for a few cheaper agent runs.

If Claude in Pi, then for me only through clean API access or an explicitly allowed provider.

If I am only playing around a little, that is irrelevant. But if I seriously let agents loose on projects, tokens suddenly stop being abstract. Then they are no longer “a bit of usage”, but a bill with personality.

Claude therefore remains a model I respect a lot, but do not always burn through casually.

GLM-5.2, Kimi K2.7 and DeepSeek V4 Pro

This is the corner where it gets really interesting for me.

Not because these models are always better than GPT or Claude.

But because they are often good enough to very good while smelling much less like luxury panic.

GLM-5.2 is the underrated worker. Z.AI describes it as a model for agentic engineering and long-horizon tasks. 200K context, large outputs, tool use, coding focus. Especially in the old GLM Coding Plan, this was a small cheat code for me: a lot of model for surprisingly little money. Unfortunately, exactly that plan has recently become significantly more expensive. The model still remains strong.
Kimi K2.7 is the pleasant OSS lane. Moonshot has released the weights, the model is natively multimodal, has 262K context and is clearly aimed at coding, long-horizon execution and agent work. It is not my model for every delicate production rebuild. But for cheap, good agent runs, it is far too strong to ignore.
DeepSeek V4 Pro is the cost drawer that is surprisingly often worth opening. 1M context, Mixture-of-Experts with 1.6T total parameters and 49B active, strong coding and reasoning orientation. Depending on route and provider, it is extremely cheap. For small things, cleanup, searching, explaining and agentic work where I do not want to count every token internally, that is very pleasant.

DeepSeek is also amusingly direct: the official DeepSeek docs even have a Pi integration. That naturally earns bonus points on my very scientific sympathy scale.

Very scientific here means: not at all.

But still.

OpenRouter, OpenCode Go and the model zoo

The point is not that I need five models every day.

The point is that I can try them without dismantling my setup.

OpenRouter is almost dangerously convenient for that. One API key, hundreds of models, different providers, fallbacks, routing by price, throughput or availability. If I want to know whether a new model works in my real workflow, that is often the fastest way.

Not in a benchmark.

In my project.

With my files.

With my badly named variables.

OpenCode Go is the other practical lane. It is a cheap subscription for open coding models, currently with models like GLM, Kimi, MiniMax, Qwen and DeepSeek. Do not understand it as a magical everything-flat. Limits remain limits. But as a playground for strong open models, it is very decent.

And the best part: both are not exotic in Pi.

OpenRouter and OpenCode Go sit as providers in the Pi model cosmos. Add direct providers like DeepSeek, Z.AI and Kimi For Coding. If a model annoys me, I switch. If it surprises me, it stays in the rotation.

That is how model hopping should be.

Yes, but not as a fixed best-of list.

More as an attitude.

If I want a strong default, I take GPT-5.5.

If a task is really difficult and cost is not the first pain point, I look at Claude Sonnet or Opus, but only through clean access, not through Claude subscription workarounds.

If I want to try a lot, work agentically or be cheaper on the road, GLM-5.2, Kimi K2.7 and DeepSeek V4 Pro are much more interesting than the western model conversation sometimes gives them credit for.

And if I do not know what is good right now, I take OpenRouter or OpenCode Go and just try it.

Not theoretically.

In Pi.

With real code.

That is the real strength for me: I do not have to religiously decide on one model.

I can just work.

And sometimes exactly that is the best model strategy.

Heißer Tipp

✦Not every model has to be my favorite model. The strong part is that Pi makes switching between them almost boring.

Get the stuff here →

More good stuff

GADGET

Daily Driver

Quooker Cube: expensive, unnecessary, and worth it every single day

My experience living with the Quooker Cube: instant boiling water for tea, chilled sparkling water on tap and whether the steep price is actually worth it.

An absurdly expensive tap that never feels spectacular, and is worth it every single day for exactly that reason.

SOFTWARE