TV Tibo Vanleke AI safety researcher

Research notes

Public thinking from the edge of the research process.

These memos stay intentionally compact. Each one isolates a mechanism, paper, or question so the work remains visible before it becomes polished.

  • Interpretability
  • Optimization
  • Transformer theory
  • Research process
12 published Public memos so far
Feb to Mar 2026 Current public note window
One question each A note format built for clarity over coverage

Format

How these notes are written.

  • 01 Keep it narrow Each memo stays centered on one claim, one paper, or one mechanism.
  • 02 Prefer signal over summary The aim is a strong takeaway, not exhaustive coverage.
  • 03 Leave open questions visible The most useful part is often what still feels unresolved.

Recent focus

A few themes keep resurfacing.

Preference optimization Layer norm placement Hard attention Length extrapolation Feature geometry KV memories

Latest

The newest memo goes first.

Everything else stays browseable below, but the freshest research question should be immediate.

March 2, 2026 / 16 min read

DPO as implicit reward optimization under KL constraints

A deep review of when Direct Preference Optimization is equivalent to KL-regularized reward optimization and where that equivalence breaks.

February 17, 2026 / 13 min read

Pre-LN vs Post-LN

Why layer norm placement controls gradient flow and residual amplification in deep Transformers.

Open memo

February 16, 2026 / 11 min read

Hard attention and Turing completeness

When Transformers become Turing complete and what those formal results do and do not imply.

Open memo

Archive

The full memo archive.

Ordered newest first so the cadence and evolving interests are easy to scan.

March 2, 2026 / Optimization

DPO as implicit reward optimization under KL constraints

When DPO matches KL-regularized reward optimization, and where the approximation starts to leak.

Read memo

February 17, 2026 / Stability

Pre-LN vs Post-LN

How layer norm placement shapes gradient flow and training stability in deep Transformers.

Read memo

February 16, 2026 / Theory

Hard attention and Turing completeness

Architectural ingredients, formal expressivity, and what they say about real-world attention.

Read memo

February 15, 2026 / Language modeling

The softmax bottleneck in language modeling

Why a low-rank output layer limits expressivity and how mixture-of-softmaxes expands it.

Read memo

February 13, 2026 / Context length

ALiBi vs RoPE

Why ALiBi extrapolates to longer contexts more reliably than RoPE.

Read memo

February 12, 2026 / Memory

FFN layers as key-value memories

Evidence that mid-layer MLPs store and retrieve factual associations.

Read memo

February 11, 2026 / Attention

Attention as modern Hopfield memory

A compact review of the Hopfield-network view of attention, storage capacity, and retrieval.

Read memo

February 7, 2026 / In-context learning

In-context learning as implicit gradient descent

Evidence that Transformers can implement gradient-descent-like procedures in context.

Read memo

February 6, 2026 / Optimization

Grokking as a phase transition in transformer training

How weight decay can trigger a delayed shift from memorization to algorithmic generalization.

Read memo

February 5, 2026 / Interpretability

Sparsity as a control knob for superposition

How sparsity penalties can flip a model between dense superposition and cleaner feature geometry.

Read memo

February 4, 2026 / Mechanistic evaluation

Evaluating IOI circuits

Faithfulness, completeness, and minimality as standards for mechanistic explanations.

Read memo

February 4, 2026 / Research process

Bootstrapping a weekly research memo

A lightweight public writing habit for turning weekly notes into usable research output.

Read memo