I'm interested in the Science of Generative Models (e.g., LLMs) where I ask how these models work, when, and why.
I develop holistic, causal, and data-centric approaches to study generative models.
These days, I focus on the data on which such models are trained and draw connections between the data and model behavior.
I'm happy to talk about research in general, and my own work in particular.
If you have any questions about one of my papers, or my overall research, feel free to reach out!
Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar,
Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman,
Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
ACL 2024 🏆 Best Theme Paper paperlongcoderesourcemodels
Press: TechCrunchAxiosForbesGeekWireSD TimesVentureBeatFast Company
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson,
Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
ACL 2024 🏆 Best Resource Paper paperlongcoderesource
The Bias Amplification Paradox in Text-to-Image Generation
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
Benjamin Muller, Yanai Elazar, Benoît Sagot and Djamé Seddah
EACL 2021 papershortcode
*Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
Yanai Elazar, Shauli Ravfogel, Alon Jacovi, Yoav Goldberg
TACL 2021
(*) previous version that appeared on arxiv was named: "When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions",
which we changed to the current title to better reflect our contributions. paperjournalcodeslidesvideo
2020
At Your Fingertips: Extracting Piano Fingering Instructions from Videos
Amit Moryossef, Yanai Elazar, Yoav Goldberg
arxiv papercode
It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT
Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, at EMNLP 2020 paperlongcodeposter
The Extraordinary Failure of Complement Coercion Crowdsourcing
Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty
Workshop on Insights from Negative Results in NLP, EMNLP 2020 papershortslidesvideo
Do Language Embeddings Capture Scales?
Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth
Findings of EMNLP 2020 paperlongcode
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
*Shauli Ravfogel, *Yanai Elazar, Jacob Goldberger, Yoav Goldberg
Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, at EMNLP 2020 paperlongcodeslides
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Evaluating Models' Local Decision Boundaries via Contrast Sets
Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Findings of EMNLP 2020 paperlongresource
oLMpics -- On what Language Model Pre-training Captures
Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant
TACL 2020 (presented at EMNLP 2020) paperjournalcodevideo
2019
Adversarial Removal of Demographic Attributes Revisited
Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders Søgaard
EMNLP 2019 papershort
How Large Are Lions? Inducing Distributions over Quantitative Attributes