Subliminal learning
Understanding how subtle cues in training data or prompts can quietly steer model behavior long before anyone notices in deployment.
- Find cheaper proxy tests for hidden behavioral steering.
- Separate meaningful signals from harmless correlation.