I am an assistant professor at Bar-Ilan University's Computer Science Department.
My research interests focus on the science of generative models (e.g., LLMs), developing holistic, causal, and data-centric approaches to study how, when, and why such models work (and when they don't).
I was previously a postdoc at AI2 and at the University of Washington. I completed my PhD in Computer Science in the NLP lab at Bar-Ilan University.
I'm happy to talk about research in general, and my own work in particular.
If you have any questions about one of my papers, or my overall research, feel free to reach out!
If you would like to work with my lab, send me an email, I'm recruiting!
For a full list of my publications, see this page, or my Google Scholar profile.
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Liu et al.,
Best Demo Award 🏆 @ ACL system demonstrations 2025
On Linear Representations and Pretraining Data Frequency in Language Models
Merullo et al.,
ICLR 2025
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
Verma et al.,
preprint
Evaluating \( n \)-Gram Novelty of Language Models Using Rusty-DAWG,
Merrill et al.,
EMNLP 2024
Detection and Measurement of Syntactic Templates in Generated Text
Shaib et al.,
EMNLP 2024
The Bias Amplification Paradox in Text-to-Image Generation
Seshadri et al.,
NAACL 2024
What's In My Big Data?
Elazar et al.,
ICLR 2024 spotlight
Estimating the Causal Effect of Early ArXiving on Paper Acceptance,
*Elazar and *Zhang and *Wadden, et al.,
CLeaR 2024
Measuring and Improving Consistency in Pretrained Language Models
Elazar et al.,
TACL 2021
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
Elazar et al.,
TACL 2021
My experience from meta-reviewing for ARR, and on some reviewer's fallacies
Behind the scences of the interviewing process
My strategy for attending my first virtual conference.
How to setup your environment to seemingly work with remote servers.