Research
I think it's more interesting to ask how models _can_ and _should_ behave than how they _do_ behave.
2024 update: I'm mostly thinking about the "full-stack" post-training loop these days - going from qualitative observations of how we want models to behave, to building quantitative measurements, collecting human feedback data related to the change we want, and finetuning models to improve on the quantitative benchmarks.
|
Publications
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus, Karina Nyugen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
arXiv
PDF  / 
arXiv
|
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli*, Amanda Askell*, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robert Lasenby, Robin Larson, Sam Ringer, Sandipan Kundu, Saurav Kadavath, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Christopher Olah, Jack Clark, Samuel R. Bowman, Jared Kaplan
arXiv
PDF  / 
arXiv
|
Ecosystem Graphs: The Social Footprint of Foundation Models
Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
arXiv
PDF  / 
arXiv
|
Why External Validity Matters for Machine Learning Evaluation: Motivation and Open Problems
Thomas I. Liao, Rohan Taori, Ludwig Schmidt
ICLR ML Evaluation Standards Workshop, 2022.
PDF  / 
arXiv TBA  / 
Poster
|
Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning
Thomas I. Liao, Rohan Taori, Inioluwa Deborah Raji, Ludwig Schmidt
NeurIPS, 2021.
PDF  / 
arXiv TBA  / 
GitHub
|
In a forward direction: Analyzing distribution shifts in machine translation test sets over time
Thomas I. Liao, Ben Recht, Ludwig Schmidt
ICML Uncertainty in Deep Learning Workshop, 2020
PDF
|
Data-efficient Learning of Morphology and Controller for a Microrobot
Thomas I. Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, Roberto Calandra
ICRA, 2019
arXiv
|
Writing
Forecasting GPT-4.
I make some predictions about GPT-4: 200-400B; 16k-32k context window; tool use; 10x the human feedback; more data curation; not multimodal.
|
|
Projects
|