Assistant Professor
Bar-Ilan University

Yanai Elazar

I am an assistant professor at Bar-Ilan University's Computer Science Department.
My research interests focus on the science of generative models (e.g., LLMs), developing holistic, causal, and data-centric approaches to study how, when, and why such models work (and when they don't).

I was previously a postdoc at AI2 and at the University of Washington. I completed my PhD in Computer Science in the NLP lab at Bar-Ilan University.

I'm happy to talk about research in general, and my own work in particular. If you have any questions about one of my papers, or my overall research, feel free to reach out!

If you would like to work with my lab, send me an email, I'm recruiting!

News

Started as an assistant professor at Bar-Ilan University

August 2025

Keynote talk at The First Workshop on Large Language Model Memorization (L2M2) @ ACL 2025

August 2025

OLMoTrace received the best demo award at ACL 2025

July 2025

Invited talk at RIKEN, Tokyo

April 2025

Interviewed for the Causal Bandits podcast

October 2024

OLMo and Dolma got best paper awards at ACL 2024

August 2024

Co-Organized the 1st Data Contamination Workshop at ACL 2024

August 2024

Interviewed for an article on Science News

July 2024

Selected Publications

For a full list of my publications, see this page, or my Google Scholar profile.

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Liu et al., Best Demo Award 🏆 @ ACL system demonstrations 2025

On Linear Representations and Pretraining Data Frequency in Language Models
Merullo et al., ICLR 2025

How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
Verma et al., preprint

Evaluating \( n \)-Gram Novelty of Language Models Using Rusty-DAWG,
Merrill et al., EMNLP 2024

Detection and Measurement of Syntactic Templates in Generated Text
Shaib et al., EMNLP 2024

The Bias Amplification Paradox in Text-to-Image Generation
Seshadri et al., NAACL 2024

What's In My Big Data?
Elazar et al., ICLR 2024 spotlight

Estimating the Causal Effect of Early ArXiving on Paper Acceptance,
*Elazar and *Zhang and *Wadden, et al., CLeaR 2024

Measuring and Improving Consistency in Pretrained Language Models
Elazar et al., TACL 2021

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
Elazar et al., TACL 2021

Posts

Meta-Reviewing for ACL-ARR (EMNLP)

My experience from meta-reviewing for ARR, and on some reviewer's fallacies

From Interviewee To Interviewer

Behind the scences of the interviewing process

Attending ACL 2020

My strategy for attending my first virtual conference.

Remote Servers

How to setup your environment to seemingly work with remote servers.