TV Tibo Vanleke Research & PhD

PhD candidate / Technical AI safety

Making model behavior easier to trust before it matters.

I am a PhD candidate at Vrije Universiteit Brussel researching subliminal learning, mechanistic interpretability, and the practical question of how to make LLM systems more legible under real-world constraints.

  • Subliminal learning
  • Mechanistic interpretability
  • LLM evaluation
VUB Vrije Universiteit Brussel
Belgium Based in Brussels, collaborating broadly
12 notes Public memos spanning safety, theory, and interpretability
Portrait of Tibo Vanleke smiling in a navy suit and tie.

Portrait

Putting a face next to the work.

Research in technical AI safety, grounded in Brussels and shared here as openly as possible.

Current notebook

Three questions I keep returning to.

  • 01 What hidden signals teach a model more than we notice? Especially when behavior shifts before obvious benchmark failures appear.
  • 02 How do we make mechanistic claims testable? Interpretability becomes more useful when it survives causal evaluation.
  • 03 Which safety checks stay useful in messy deployments? I am interested in methods that remain lightweight enough to use in practice.

Academic home

Built inside VUB's data and AI ecosystem.

My PhD sits at the intersection of technical safety, interpretability, and applied research culture.

Research agenda

Three threads that shape how I think about safer language models.

The common theme is simple: if we want trustworthy systems, we need better ways to surface hidden behavior, explain it, and evaluate it before the stakes rise.

Thread 1

Subliminal learning

Understanding how subtle cues in training data or prompts can quietly steer model behavior long before anyone notices in deployment.

  • Find cheaper proxy tests for hidden behavioral steering.
  • Separate meaningful signals from harmless correlation.

Thread 2

Mechanistic interpretability

Tracing circuits, features, and representational geometry so interpretability is something we can verify, not just narrate.

  • Focus on faithfulness, completeness, and minimality.
  • Use geometry as a clue to what a model has actually learned.

Thread 3

Safety under constraints

Building evaluation habits and guardrails that stay useful outside ideal lab conditions, where time, compute, and attention are always limited.

  • Prefer methods that help real teams make earlier decisions.
  • Keep the work close to deployment reality.

Selected writing

Short research memos instead of waiting for perfect drafts.

I use public notes to sharpen questions, retain ideas, and make my research process easier to follow for collaborators.

Latest note / March 2, 2026

DPO as implicit reward optimization under KL constraints

A memo on when Direct Preference Optimization really matches KL-regularized reward optimization, and where that story breaks.

Recent note

Pre-LN vs Post-LN

Why layer norm placement changes gradient flow, residual amplification, and training stability in deep Transformers.

Open the note

Recent note

Hard attention and Turing completeness

A short guide to when Transformers become Turing complete and what those expressivity results actually buy us.

Open the note

Network

The academic, practical, and personal threads around the work.

This page is part research log, part profile, and part place to show the habits that keep long projects sustainable.

Profiles

Find the work elsewhere

Academic identity, projects, and public profile links in one place.

Supervision

People shaping the PhD

Main promotor: Prof. Vincent Ginis. Co-promotors: Prof. Filip Van Droogenbroeck and Prof. Marie-Anne Guerry.

See full academic background

Outside the lab

Running keeps the horizon long.

Competitive distance running is where I practice patience, recovery, and the kind of consistency research also demands.

View running profile

Contact

A direct form for research, collaboration, or adjacent ideas.

If you want to compare notes, ask about a project, or reach out more directly, the form below forwards to my inbox and email links stay available as a fallback.

Best uses

Interpretability, evaluation, safety, and careful collaboration.

I am especially happy to hear from people working on adjacent technical questions, evaluation tooling, or research ideas that benefit from an early conversation.

Use email instead

The form forwards to tibo@vanleke.com. If forms are blocked in your browser, email still works directly.