Thomas I. Liao

This website is not maintained. My twitter is @distributionat

Email / LinkedIn / Google Scholar / Twitter / Blog

Writing

Why eval startups fail. 2025-05-08. I explore why startups selling model evaluations face unique challenges with talent retention, finding customers, and handling optimization pressure from model developers.
Forecasting GPT-4. 2022-09-15. I make some predictions about GPT-4: 200-400B; 16k-32k context window; tool use; 10x the human feedback; more data curation; not multimodal.

Publications

Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus, Karina Nyugen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
arXiv
PDF / arXiv

The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli*, Amanda Askell*, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robert Lasenby, Robin Larson, Sam Ringer, Sandipan Kundu, Saurav Kadavath, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Christopher Olah, Jack Clark, Samuel R. Bowman, Jared Kaplan
arXiv
PDF / arXiv

Ecosystem Graphs: The Social Footprint of Foundation Models
Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
arXiv
PDF / arXiv

Why External Validity Matters for Machine Learning Evaluation: Motivation and Open Problems
Thomas I. Liao, Rohan Taori, Ludwig Schmidt
ICLR ML Evaluation Standards Workshop, 2022.
PDF / arXiv TBA / Poster

Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning
Thomas I. Liao, Rohan Taori, Inioluwa Deborah Raji, Ludwig Schmidt
NeurIPS, 2021.
PDF / arXiv TBA / GitHub

In a forward direction: Analyzing distribution shifts in machine translation test sets over time
Thomas I. Liao, Ben Recht, Ludwig Schmidt
ICML Uncertainty in Deep Learning Workshop, 2020
PDF

Data-efficient Learning of Morphology and Controller for a Microrobot
Thomas I. Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, Roberto Calandra
ICRA, 2019
arXiv

Website template adapted from Jon Barron's.