Blog
Notes and longer pieces, mostly on machine learning and interpretability.
What mechanistic interpretability says about whether an LLM commits to a secret in 20 Questions.