DPO as implicit reward optimization under KL constraints
A deep review of when Direct Preference Optimization is equivalent to KL-regularized reward optimization and where that equivalence breaks.
Research notes
These memos stay intentionally compact. Each one isolates a mechanism, paper, or question so the work remains visible before it becomes polished.
Format
Recent focus
Latest
Everything else stays browseable below, but the freshest research question should be immediate.
A deep review of when Direct Preference Optimization is equivalent to KL-regularized reward optimization and where that equivalence breaks.
Archive
Ordered newest first so the cadence and evolving interests are easy to scan.
When DPO matches KL-regularized reward optimization, and where the approximation starts to leak.
Read memoHow layer norm placement shapes gradient flow and training stability in deep Transformers.
Read memoArchitectural ingredients, formal expressivity, and what they say about real-world attention.
Read memoWhy a low-rank output layer limits expressivity and how mixture-of-softmaxes expands it.
Read memoWhy ALiBi extrapolates to longer contexts more reliably than RoPE.
Read memoEvidence that mid-layer MLPs store and retrieve factual associations.
Read memoA compact review of the Hopfield-network view of attention, storage capacity, and retrieval.
Read memoEvidence that Transformers can implement gradient-descent-like procedures in context.
Read memoHow weight decay can trigger a delayed shift from memorization to algorithmic generalization.
Read memoHow sparsity penalties can flip a model between dense superposition and cleaner feature geometry.
Read memoFaithfulness, completeness, and minimality as standards for mechanistic explanations.
Read memoA lightweight public writing habit for turning weekly notes into usable research output.
Read memo