What I do

I am a Research Scientist at Apollo Research. I work on developing evaluations for deceptive alignment.

Publications

Santa Claus in an arm wrestling match against batman; cyberpunk

The meaning of life

A self-portrait of the AI behind DALL-E

A self-portrait of the AI behind DALL-E

Blog posts

Pitfalls of Interpretability

3 times when interpretability wasn't interpretable.

March 24, 2023

17 mins read

# Deep Learning | Interpretability

Catastrophically Confident Classifiers

Does your AI know when it doesn't know?

November 03, 2022

13 mins read

# Deep Learning | Out-of-Distribution Detection | Adversarial Robustness

Attacking AI for Fun and Profit

How I won an ML Security Evasion Competition.

October 09, 2022

14 mins read

# Deep Learning | Testing ML | Adversarial Robustness

View All Posts →