Greg IsenbergJune 23, 202622m

GLM 5.2: How to Set Up Local AI (With Cursor/Codex etc)

TL;DR

GLM 5.2 is cheap enough to change behavior: Amir estimates a 50,000-input, 85,000-output coding workload costs about $0.44 on GLM 5.2 versus $2.38 on Opus 4.8, which matters once teams start running these tools constantly.
The real tactic is model chaining, not model loyalty: He recommends using a stronger or more multimodal model like Opus 4.8 first for planning or image understanding, then handing the structured task to GLM 5.2 for execution.
Setup is simpler than the hype makes it sound: In Cursor, you can paste a Z AI API key into the OpenAI field, override the endpoint, add GLM 5.2 as a custom model, or use OpenRouter in Codex and switch models from the CLI.
Benchmarks matter less than whether it actually ships UI work: Even though GLM 5.2 scores 81 on Terminal Bench 2.1 and trails Opus 4.8 by about 4 points there, Amir says he trusts live build tests more than leaderboard numbers.
GLM 5.2 still has gaps, especially vision: It cannot directly inspect screenshots the way some closed models can, so Amir works around that by having Opus describe the image and then asking GLM 5.2 to make the front-end changes.
Teams are shifting from token-maxing to token governance: Amir says companies are starting to question expensive default model use across engineering and non-engineering workflows, including people using high-end reasoning models for simple email formatting.

The Breakdown

GLM 5.2 gets surprisingly close to Opus-class coding output at about 44 cents versus $2.38 for the same workload, and the practical play is not replacing frontier models but chaining them. Amir shows the simplest setup path through Cursor or Codex with OpenRouter, then explains how to pair a vision-capable cloud model with GLM 5.2 to cut token spend without giving up quality.