← Back to blog

Why induction heads emerge as a phase change

June 14, 2026 · 19 min read

Abstract

This memo looks at one narrow mechanistic question: why do induction heads appear suddenly during training, and why does that moment track the onset of copy-based in-context learning? The central claim from Olsson et al. (2022) is that induction heads are not just another descriptive attention pattern. They implement a concrete algorithm: if the model sees a repeated token, it can attend back to the previous occurrence and then copy the token that followed it earlier. That simple circuit solves the local problem behind many next-token prediction gains in long contexts. The striking result is not only that such heads exist, but that they emerge abruptly, at the same time as a noticeable drop in loss on later tokens. Read this way, induction heads are evidence that some in-context learning capabilities come from identifiable circuits rather than from a vague distributed "general intelligence" story. The deeper question is what this circuit explains, and what it does not.

Related Work

The primary source is In-context Learning and Induction Heads, which studies both small attention-only transformers and larger pretrained models. The paper argues that induction heads may be the mechanistic source of a large fraction of the in-context loss reduction that appears later in a sequence. It offers six lines of evidence, including timing during training, direct visualization, ablations, and synthetic-sequence interventions.

Two adjacent papers help position the result. A Mathematical Framework for Transformer Circuits introduced the residual-stream and QK/OV decomposition tools that make induction heads legible as a circuit rather than just an attention map. What learning algorithm is in-context learning? offers a different explanation for some in-context learning regimes, especially linear regression-style tasks where a transformer can emulate gradient descent or ridge regression. Those views are not mutually exclusive. They suggest that "in-context learning" is not one mechanism. Some tasks may be handled by copy-and-continue circuitry, while others require a more optimizer-like computation. A more data-centric angle comes from Understanding In-Context Learning via Supportive Pretraining Data, which argues that difficult long-range contexts in pretraining help create the conditions under which these behaviors later emerge.

Method/Mechanism

An induction head is easiest to understand on a repeated pattern. Suppose the context contains tokens like [A][B] ... [A]. At the second [A], the head tries to find the earlier [A] by matching on the token immediately before it. Once it attends to the earlier occurrence, the OV path can promote [B], the token that followed the first [A]. Operationally, the head behaves like a learned pointer from the current token to a previous matching token, plus a learned copy of what came next.

This requires at least a two-step story. One earlier head often writes a signal about the previous token into the residual stream, and a later head uses that signal to recognize "this token has occurred before in the same preceding context." That is why the induction-head phenomenon is especially clean in two-layer attention-only models: one layer can set up the offset-matching feature and the next can use it for copying. The mechanism is narrow but powerful because repeated substrings are everywhere in text: names recur, delimiters recur, function signatures recur, and local syntactic fragments recur.

The phase-change interpretation comes from training dynamics. Early in training, a model gets most of its gains from unigram and short-range statistics. Later, once it is worth paying the representational cost to coordinate heads across layers, an induction circuit becomes economical. Olsson et al. report a sharp emergence point that coincides with a bump in overall training loss and a sudden improvement in later-token performance. That pattern matters because it suggests the capability is not smoothly interpolated from weaker n-gram heuristics. A distinct circuit turns on.

Key Findings

Two concrete case studies make the paper's argument unusually crisp:

Five crisp insights follow:

The broader consequence is methodological. If one can tie a benchmark gain to a narrow circuit, then "emergence" becomes a tractable object of study rather than an opaque scaling narrative.

Limitations

The strongest causal evidence comes from small attention-only models and synthetic repeated-sequence setups. That is enough to establish the existence of the mechanism, but not enough to show that most real-world few-shot reasoning reduces to induction. Natural-language prompting often requires semantic abstraction, label remapping, or latent task inference that pure token copying cannot solve.

There is also a scope limitation in the phrase "in-context learning." In the Olsson et al. paper, the measured behavior is largely decreasing loss at increasing token indices. That is related to few-shot prompting, but it is not identical to tasks like linear regression in context, chain-of-thought problem solving, or instruction following from demonstrations. Auxiliary work on implicit gradient descent is a useful reminder that different experimental definitions of ICL may call on different internal machinery.

Future Directions

One obvious next step is to map the boundary between induction-style copying and more abstract in-context computation. When does a model stop relying on repeated surface forms and start constructing a task-level latent variable instead? Another is to connect circuit emergence to data design more explicitly: if supportive pretraining data are rich in difficult long-range dependencies, can we predict or shift the training step at which induction heads form? A third direction is safety-relevant interpretability. Copy-based circuits are operationally simple enough that they may be good targets for targeted auditing, steering, or even regularization during pretraining.

Open question: in large modern language models, what fraction of successful few-shot prompting on natural tasks still routes through induction-style copying, versus through qualitatively different circuits that represent task structure more abstractly?

Summary

Induction heads remain one of the clearest examples of a transformer capability that is both useful and mechanistically legible. They show how a model can turn repeated context into a concrete copying algorithm, and why that ability can appear suddenly rather than gradually. The result does not settle the whole mystery of in-context learning, but it sharply narrows part of it. At least some in-context gains come from specific circuits with identifiable training dynamics, which is exactly the kind of explanation fundamental LLM research should try to accumulate.

References