Tibo Vanleke

Pre-LN vs Post-LN: why layer norm placement stabilizes Transformers

February 17, 2026 · 13 min read

A deep review of how layer norm placement controls gradient flow and residual amplification in deep Transformers.

Hard attention and Turing completeness in Transformers

February 16, 2026 · 11 min read

A deep review of when Transformers are Turing complete, why hard attention matters, and what formal-language limits imply.

The softmax bottleneck in language modeling

February 15, 2026 · 13 min read

A deep review of how the softmax bottleneck limits expressivity in language models and how mixture-of-softmaxes raises the effective rank.

ALiBi vs RoPE: positional bias and length extrapolation

February 13, 2026 · 11 min read

A focused review of why ALiBi extrapolates to longer contexts more reliably than RoPE, and what that implies about positional inductive bias in attention.

FFN layers as key-value memories for factual recall

February 12, 2026 · 14 min read

A deep review of evidence that transformer feed-forward layers behave like key-value memories and localize factual recall in mid-layer MLPs.

Attention as modern Hopfield memory

February 11, 2026 · 13 min read

A focused review of the modern Hopfield-network view of attention, with emphasis on storage capacity and retrieval behavior.

In-context learning as implicit gradient descent

February 7, 2026 · 12 min read

A deep dive into gradient-descent-like mechanisms in in-context linear regression.

Grokking as a phase transition in transformer training

February 6, 2026 · 12 min read

A deep dive into how weight decay triggers delayed generalization in transformer grokking.

Sparsity as a control knob for superposition

February 5, 2026 · 9 min read

A deep dive into how sparsity penalties trigger phase transitions between superposition and monosemantic features.

Evaluating IOI circuits: faithfulness, completeness, minimality

February 4, 2026 · 4 min read

A focused memo on how to evaluate mechanistic explanations in GPT-2 small.

Bootstrapping a weekly research memo

February 4, 2026 · 4 min read

A quick pilot for turning weekly reading notes into a public-facing update.

Research Notes

Topic log