Streaming replies without the keyboard jitter.

If you've used a chat app on iOS where replies stream in and watched the layout pulse every time a new chunk arrives, you know the feeling we were trying to avoid in Gale.

The naive approach and why it looks cheap

The naive approach is: on each token, append to the message string, ask UIKit to re-render the bubble. This works. It also makes the entire conversation below the streaming bubble jump on every token, because the bubble height changes by a line break, by two, by a character — and UIKit, doing what you asked, reflows the world.

The user-perceived effect is a kind of keyboard jitter. The content is arriving correctly, but the room is shaking as it does. It makes the app feel un-native.

What we do instead

Three things, stacked:

Reserve the height. As soon as we start streaming, we measure a plausible final height (from an expected-length hint the model gives us) and give the bubble that height up front. The bubble fills in top-down; the conversation below doesn't move.
Batch the render. Tokens don't arrive at 60 Hz. They arrive at ~50 per second, clustered. We queue them, and repaint at display-refresh frequency, merging whatever arrived during the vsync window.
Smooth the cadence. Human-like typing isn't "as fast as possible." We introduce micro-delays at punctuation and paragraph breaks — a rhythm your eye likes — and we cap the effective reveal speed at ~80 chars/second. It feels thoughtful without feeling slow.

The "thinking" block

A lot of open models emit a long reasoning preamble before the real answer. The first builds of Gale just showed this as body text and it was awful: you'd watch a model monologue about what to do for five seconds before it told you the answer.

We split it. A folded Thinking card appears immediately, collapsed, with a subtle shimmer to indicate work. The answer streams beneath it in its own bubble. The reasoning is there if you want it (tap to expand); it's out of your way if you don't.

The little thing

Gale streams with a trailing cursor — a 1.5px vertical bar that sits at the end of the visible text and pulses. It's the single smallest design detail in the app and the one I get the most "Oh, nice" reactions from. The bar is drawn on the CAMetalLayer directly rather than as a character, so it doesn't affect layout and can't cause a reflow by existing. It's pure decoration, and it earns its place.

The result

On a 120 Hz display the text lands smoothly. The conversation below the streaming bubble doesn't move. The cursor pulses where your eye already is. None of this is breakthrough research — it's just the effort most LLM chat apps don't bother to put in, because they assume the model is the product. The model is the engine. The text view is the product.

Tagged Gale · Published Mar 14, 2026 · Back to all posts