Thomas I. Liao

Hi! I'm a machine learning researcher. I do LLM stuff now. I was most recently at Scale AI, where I worked on human-in-the-loop data annotation and internal ML APIs.

I received my B.A. in Computer Science at UC Berkeley, where I was fortunate to have been advised by Ben Recht and Ludwig Schmidt. Previously, I also worked with Roberto Calandra, Sergey Levine, and Kristofer Pister. Outside research, I co-organized the Build the Future speaker series (Fa17, Sp18, Fa18), hosting founders and investors from well-known startups and funds.

Email  /  LinkedIn  /  Google Scholar  /  Twitter

profile photo: fantastical toucan generated by stable diffusion

(This paragraph is out of date as of 2022-09-03). My previous research projects have focused on the evaluation of machine learning systems and questions about training and test data. I am unusually familiar with how ML datasets are labelled due to my experience at Scale. (Legend: ⭐ = my favorite papers).


Foundation Model Tracker. I track releases of LLMs, VLMs, and text-to-image models here.


Why External Validity Matters for Machine Learning Evaluation: Motivation and Open Problems
Thomas I. Liao, Rohan Taori, Ludwig Schmidt
ICLR ML Evaluation Standards Workshop, 2022.
PDF  /  arXiv TBA  /  Poster
⭐ Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning
Thomas I. Liao, Rohan Taori, Inioluwa Deborah Raji, Ludwig Schmidt
NeurIPS, 2021.
PDF  /  arXiv TBA  /  GitHub
In a forward direction: Analyzing distribution shifts in machine translation test sets over time
Thomas I. Liao, Ben Recht, Ludwig Schmidt
ICML Uncertainty in Deep Learning Workshop, 2020
Data-efficient Learning of Morphology and Controller for a Microrobot
Thomas I. Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, Roberto Calandra
ICRA, 2019


Forecasting GPT-4. I make some predictions about GPT-4: 200-400B; 16k-32k context window; tool use; 10x the human feedback; more data curation; not multimodal.

Website template adapted from Jon Barron's.