- Recent Papers
- [New!] "Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives" [arXiv (preprint)]
- "Tokenized SAEs: Disentangling SAE Reconstructions" [arXiv] | [LessWrong]
- paper/poster @ ICML 2024 Mechanistic Interpretability Workshop
- Reverse-engineering writeups
- [New!] Max of List puzzle (Bao Lab challenge)
- Cumulative Sum Sign puzzle (ARENA challenge)
- LLM Foundations (in progress)
- In-progress mech-interp intros
- YouTube
- ezinterp [GitHub]: a minimalistic transformer interpretability library for interactive exploration.
- About Me
- [New!] I am an advisor for TARA, an APAC-region research accelerator based on the ARENA curriculum.
- This Site's GitHub
Subscribe to hear when I post new content: