Back to Podcast Digest
AI Tinkerers Montreal12m

Constrained Decoding: LLM Pixel ArtCarl Lapierre by Carl Lapierre @osedea

TL;DR

  • Carl Lapierre tried to force an OSS 20B model to emit pixel art one color-token at a time — instead of using DALL·E-style image generation, he mapped tokens like R, G, B, Y, O, P, W, K, and . to palette colors so he could control every pixel directly.

  • Constraint decoding works cleanly for banning tokens like em dashes, but gets messy fast for art generation — Carl shows how setting unwanted logits to negative infinity removes outputs like the em dash, then explains why token-level pixel constraints break at scale because tokens vary in length and often bundle dots and newlines together.

  • The failed decoding experiment turned into an accidental benchmark of LLMs for pixel art — after testing multiple models, he found some odd behaviors like 'fur mini' making circular pixel grids, while most models simply weren’t reliable enough for usable sprite generation.

  • Gemini 3 was the breakthrough: it generated 'pixel perfect' sprites in about six seconds — Carl demos hearts, ducks, and a health potion, and says Gemini 3's low-reasoning mode suddenly made this feel like a different paradigm because it 'gets it right all the time.'

  • The real advantage over diffusion models is editability, not just generation — once the sprite exists as constrained text tokens, Carl can directly swap colors, like turning a health potion into poison, which he says normal diffusion pipelines don’t give him.

  • Prompting still matters even under constraints — to keep the model oriented in a 'sea of points,' he injects a schema-like prompt explaining which token maps to which color, uses few-shot examples, and notes weird tokenizer artifacts like O and K repeatedly appearing because the model wants to say 'okay.'

The Breakdown

The fun question: can an LLM make cute pixel art?

Carl opens with a very specific itch: he wants lots of pixel-art assets for simulation games, but tools like DALL·E or 'nano banana' only give him finished PNGs, not control over individual pixels. So he goes one layer deeper, to token generation itself, with the hope that each token could stand for a color in a sprite.

Constraint decoding, explained with the anti–em dash hack

Before getting to art, he gives a clean demo of constrained decoding using an em dash as the villain. Since LLMs generate logits over tokens, you can effectively ban a token by setting its score to negative infinity — perfect, he jokes, if you write generic LinkedIn posts and don’t want people thinking 'you’re a robot.'

Turning the vocabulary into a pastel palette

Carl then remaps a tiny token set into colors: R, G, B, Y, O, P, W, K, plus . for transparent, which he says worked better than using T. On a small example like a heart, it looks magical: token by token, the model emits colored cells and starts resembling real pixel art.

Why it breaks when the image gets bigger

Then the demo hits reality: at larger scales, the output turns into junk. His key point is that token constraints are not like regex over characters — tokens have uneven sizes, sometimes bundling multiple dots or newlines, so you can’t just limit the model to a tiny clean alphabet and expect stable structured images.

From failed experiment to weird model benchmark

That failure became a new project: comparing models on pixel generation. Carl says one model, 'fur mini,' hilariously kept making circles — it somehow understood pixel grids, but only in round form — and for a while there just wasn’t a model that could produce consistently good sprites.

Gemini 3 changes the game

Then came Gemini 3, 'this week,' and the room wakes up with him. Using its low-reasoning mode, he demos a heart, mentions a duck he made with a friend, and shows a health potion, all generated in around six seconds; the crowd claps, and he calls it 'pixel perfect' and his 'new toy.'

Why this is better than diffusion for games

The big payoff is control after generation. Because the image is represented as constrained tokens, he can tweak colors directly — like flipping a health potion into a poison potion — which he says is exactly the kind of granular editing normal diffusion models don’t really offer.

The prompt scaffolding and the weird token gotchas

In Q&A, Carl explains that constraints alone aren’t enough; like structured JSON outputs, the model still needs a schema-like prompt telling it what each token means and what a valid drawing looks like, plus few-shot examples. He also shares a very tokenizer-specific bug: O and K often appeared at the start because the model kept trying to say 'okay,' and when he first banned em dashes, it simply switched to every other dash-looking character it could find.

Ending on loneliness, gray mountains, and constraint-decoding humility

Asked to draw something abstract like 'loneliness,' Carl jokes, 'Should I show a picture of us?' Then the model produces what he calls 'a gray mountain,' which feels like the perfect ending: half demo, half comedy, and a reminder that constrained decoding is both a real science and a playground for strange model behavior.

Share