TV Tibo Vanleke AI safety researcher

Research notes

Public thinking from the edge of the research process.

These memos stay intentionally compact. Each one isolates a mechanism, paper, or question so the work remains visible before it becomes polished.

  • Interpretability
  • Optimization
  • Transformer theory
  • Research process
27 published Public memos so far
Feb to Jul 2026 Current public note window
One question each A note format built for clarity over coverage

Format

How these notes are written.

  • 01 Keep it narrow Each memo stays centered on one claim, one paper, or one mechanism.
  • 02 Prefer signal over summary The aim is a strong takeaway, not exhaustive coverage.
  • 03 Leave open questions visible The most useful part is often what still feels unresolved.

Recent focus

A few themes keep resurfacing.

Self-consistency Preference optimization Layer norm placement Truthfulness Length extrapolation Feature geometry Induction heads Truth directions Reversal curse Attention sinks Low-rank adaptation Gated MLPs muP scaling Constitutional AI Tail truncation Compute-optimal scaling Speculative decoding Weight tying RMSNorm

Latest

The newest memo goes first.

Everything else stays browseable below, but the freshest research question should be immediate.

July 1, 2026 / 19 min read

Why RMSNorm works without mean-centering

A longer review of why RMSNorm preserves most of LayerNorm's optimization benefits, what stable activation scale buys, and where mean subtraction still matters.

June 30, 2026 / 23 min read

Why weight tying helps language models

A longer review of why sharing input and output embeddings improves language models, what shared lexical geometry buys, and where the assumption breaks.

Open memo

June 29, 2026 / 24 min read

Why speculative decoding preserves the target distribution

A longer review of why draft-and-verify generation can accelerate LLM inference without changing the target model's sampling distribution.

Open memo

June 28, 2026 / 22 min read

Why Chinchilla scaling prefers more tokens than bigger models

A longer review of why compute-optimal LLM training often favors smaller models trained on more tokens, and what that reveals about undertraining.

Open memo

June 26, 2026 / 20 min read

Why open-ended decoding needs tail truncation

A longer review of why likelihood-maximizing decoding degenerates in open-ended generation, why nucleus sampling works, and what later theory says about the tail.

Open memo

June 25, 2026 / 18 min read

Why Constitutional AI can replace many harmlessness labels

A longer review of why written principles plus AI-generated critiques can improve harmlessness, and where that substitution for human labels breaks.

Open memo

June 24, 2026 / 21 min read

Why muP makes hyperparameters transfer across scale

A longer review of why maximal update parameterization keeps learning-rate and scale-sensitive hyperparameters stable as transformers grow wider.

Open memo

Archive

The full memo archive.

Ordered newest first so the cadence and evolving interests are easy to scan.

July 1, 2026 / Optimization

Why RMSNorm works without mean-centering

Why stable activation scale often matters more than zero-mean activations in deep transformer optimization.

Read memo

June 30, 2026 / Language modeling

Why weight tying helps language models

Why sharing input and output embeddings regularizes lexical geometry and often improves perplexity.

Read memo

June 29, 2026 / Inference

Why speculative decoding preserves the target distribution

Why draft-and-verify generation can speed up autoregressive sampling without changing the large model's distribution.

Read memo

June 28, 2026 / Scaling

Why Chinchilla scaling prefers more tokens than bigger models

Why compute-optimal language-model training often rewards smaller models trained longer on more data.

Read memo

June 26, 2026 / Decoding

Why open-ended decoding needs tail truncation

Why likelihood maximization collapses into repetition in open-ended generation, and why truncating the unreliable tail works better.

Read memo

June 25, 2026 / Alignment

Why Constitutional AI can replace many harmlessness labels

Why written principles plus AI critiques can substitute for much direct harmlessness labeling when the base model already understands the norms.

Read memo

June 24, 2026 / Optimization

Why muP makes hyperparameters transfer across scale

Why width-aware parameter scaling keeps learning rates and related tuning choices stable across transformer scale.

Read memo

June 23, 2026 / Architecture

Why SwiGLU replaced standard Transformer MLPs

Why gated feed-forward blocks beat plain ReLU and GELU MLPs, even under compute-matched comparisons.

Read memo

June 22, 2026 / Adaptation

Why low-rank updates are enough for LLM adaptation

Why downstream specialization often occupies a small weight-space subspace and can be captured with low-rank updates.

Read memo

June 17, 2026 / Mechanism

Why attention sinks stabilize long-context decoding

Why a few early tokens absorb surplus attention mass and become necessary for stable streaming inference.

Read memo

June 16, 2026 / Reasoning

Why self-consistency decoding improves reasoning

Why sampling several chains of thought and aggregating answers approximates marginalization over noisy reasoning paths.

Read memo

June 15, 2026 / Theory

Why next-token prediction creates the reversal curse

Why LLMs often learn directional factual retrieval without learning stable reverse access to the same relation.

Read memo

June 14, 2026 / Alignment

Can hidden states reveal what an LLM thinks is true?

When latent truth directions survive misleading prompts, and where that evidence remains fragile.

Read memo

June 14, 2026 / Interpretability

Why induction heads emerge as a phase change

Why a specific copy circuit appears abruptly during training and tracks the onset of copy-based in-context learning.

Read memo

June 14, 2026 / Alignment

Why TruthfulQA shows inverse scaling

Why larger language models can become less truthful when scaling mostly improves imitation rather than evidence-sensitive abstention.

Read memo

March 2, 2026 / Optimization

DPO as implicit reward optimization under KL constraints

When DPO matches KL-regularized reward optimization, and where the approximation starts to leak.

Read memo

February 17, 2026 / Stability

Pre-LN vs Post-LN

How layer norm placement shapes gradient flow and training stability in deep Transformers.

Read memo

February 16, 2026 / Theory

Hard attention and Turing completeness

Architectural ingredients, formal expressivity, and what they say about real-world attention.

Read memo

February 15, 2026 / Language modeling

The softmax bottleneck in language modeling

Why a low-rank output layer limits expressivity and how mixture-of-softmaxes expands it.

Read memo

February 13, 2026 / Context length

ALiBi vs RoPE

Why ALiBi extrapolates to longer contexts more reliably than RoPE.

Read memo

February 12, 2026 / Memory

FFN layers as key-value memories

Evidence that mid-layer MLPs store and retrieve factual associations.

Read memo

February 11, 2026 / Attention

Attention as modern Hopfield memory

A compact review of the Hopfield-network view of attention, storage capacity, and retrieval.

Read memo

February 7, 2026 / In-context learning

In-context learning as implicit gradient descent

Evidence that Transformers can implement gradient-descent-like procedures in context.

Read memo

February 6, 2026 / Optimization

Grokking as a phase transition in transformer training

How weight decay can trigger a delayed shift from memorization to algorithmic generalization.

Read memo

February 5, 2026 / Interpretability

Sparsity as a control knob for superposition

How sparsity penalties can flip a model between dense superposition and cleaner feature geometry.

Read memo

February 4, 2026 / Mechanistic evaluation

Evaluating IOI circuits

Faithfulness, completeness, and minimality as standards for mechanistic explanations.

Read memo

February 4, 2026 / Research process

Bootstrapping a weekly research memo

A lightweight public writing habit for turning weekly notes into usable research output.

Read memo

Topic log

A running ledger of narrow questions.

Useful as a sanity check against repetition and as a map of what the memo series is actually circling.

  • July 1, 2026: why RMSNorm keeps most of LayerNorm's benefits by stabilizing activation scale even without explicit mean-centering.
  • June 30, 2026: why sharing input and output embeddings helps language models read and predict words in the same lexical space.
  • June 29, 2026: why speculative decoding can draft several tokens ahead yet still sample exactly from the target model after verification.
  • June 28, 2026: why compute-optimal scaling usually favors more training tokens over a much larger but undertrained model.
  • June 26, 2026: why open-ended generation needs dynamic tail truncation instead of likelihood-maximizing decoding.
  • June 25, 2026: why written principles plus AI-generated critiques can replace many direct harmlessness labels in alignment training.
  • June 24, 2026: why maximal update parameterization makes learning-rate and optimizer-scale choices transfer across transformer width.
  • June 23, 2026: why SwiGLU-style gated MLPs outperform standard transformer feed-forward blocks.
  • June 22, 2026: why many downstream LLM adaptations can be captured by low-rank updates instead of dense retraining.
  • June 17, 2026: why long-context decoding depends on early attention sinks that absorb surplus softmax mass.
  • June 16, 2026: why sampling several chains of thought and aggregating answers improves reasoning without changing model weights.
  • June 15, 2026: why next-token prediction learns directional fact retrieval but not stable reverse lookup.